EURASIP Journal on Audio, Speech, and Music Processing

Table 1 Dataset statistics on three subsets of the proposed TVSM dataset and open-sourced datasets with frame-level annotations. Note that % of music/speech is estimated based on the duration labeled as music or speech and the total duration of the audio content

From: A large TV dataset for speech and music activity detection

	% of music	% of speech	% of overlap	Label quality	# of instances	Duration (h)	Usage
TVSM-cuesheet	63%	64%	0.39%	Noisy	656	54.6	Training
TVSM-pseudo	61%	57%	0.33%	Noisy	2563	1538.5	Training
TVSM-test	43%	43%	0.32%	Clean	20	15	Test
OpenBMAT	50%	N/A	N/A	Clean	1647	27.5	Test
AVAspeech	N/A	52%	N/A	Clean	160	45	Test
ORF TV	42%	N/A	N/A	Clean	13	9	Test
Muspeak	76%	24%	N/A	Clean	214	5	Test

Back to article page