Skip to main content

Table 1 Dataset statistics on three subsets of the proposed TVSM dataset and open-sourced datasets with frame-level annotations. Note that % of music/speech is estimated based on the duration labeled as music or speech and the total duration of the audio content

From: A large TV dataset for speech and music activity detection

  % of music % of speech % of overlap Label quality # of instances Duration (h) Usage
TVSM-cuesheet 63% 64% 0.39% Noisy 656 54.6 Training
TVSM-pseudo 61% 57% 0.33% Noisy 2563 1538.5 Training
TVSM-test 43% 43% 0.32% Clean 20 15 Test
OpenBMAT 50% N/A N/A Clean 1647 27.5 Test
AVAspeech N/A 52% N/A Clean 160 45 Test
ORF TV 42% N/A N/A Clean 13 9 Test
Muspeak 76% 24% N/A Clean 214 5 Test