Fig. 5From: A large TV dataset for speech and music activity detectionThe mean error rate (the lower the better) across all datasets as described in the sed_eval toolbox. The models CRNN-P-Cue and CRNN-P-Pseu are selected for comparison. TVSM-test (music) and Muspeak (music) represent the music evaluation while TVSM-test (speech) represents the speech evaluation. The other test datasets only contain either speech or music labels as described in Section 3Back to article page