Skip to main content

Table 1 Dataset statistics on three subsets of the proposed TVSM dataset and open-sourced datasets with frame-level annotations. Note that % of music/speech is estimated based on the duration labeled as music or speech and the total duration of the audio content

From: A large TV dataset for speech and music activity detection

 

% of music

% of speech

% of overlap

Label quality

# of instances

Duration (h)

Usage

TVSM-cuesheet

63%

64%

0.39%

Noisy

656

54.6

Training

TVSM-pseudo

61%

57%

0.33%

Noisy

2563

1538.5

Training

TVSM-test

43%

43%

0.32%

Clean

20

15

Test

OpenBMAT

50%

N/A

N/A

Clean

1647

27.5

Test

AVAspeech

N/A

52%

N/A

Clean

160

45

Test

ORF TV

42%

N/A

N/A

Clean

13

9

Test

Muspeak

76%

24%

N/A

Clean

214

5

Test