Skip to main content

Advertisement

Table 2 Characteristics of the RTVE database. Number of word occurrences (#occ.), duration (dur.) in minutes (min), number of speakers (#spk.), and average MOS (Ave. MOS)

From: Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation

File ID Data #word occ. dur. (min) #spk. Ave. MOS
LN24H-20151125 dev2 21049 123.50 22 3.37
LN24H-20151201 dev2 19727 112.43 16 3.27
LN24H-20160112 dev2 18617 110.40 19 3.24
LN24H-20160121 dev2 18215 120.33 18 2.93
millennium-20170522 dev2 8330 56.50 9 3.61
millennium-20170529 dev2 8812 57.95 10 3.24
millennium-20170626 dev2 7976 55.68 14 3.55
millennium-20171009 dev2 9863 58.78 12 3.60
millennium-20171106 dev2 8498 59.57 16 3.40
millennium-20171204 dev2 9280 60.25 10 3.29
millennium-20171211 dev2 9502 59.70 12 2.95
millennium-20171218 dev2 9386 55.55 15 2.70
EC-20170513 test 3565 22.13 N/A 3.12
EC-20170520 test 3266 21.25 N/A 3.38
EC-20170527 test 2602 17.87 N/A 3.42
EC-20170603 test 3527 23.87 N/A 3.90
EC-20170610 test 3846 24.22 N/A 3.31
EC-20170617 test 3368 21.55 N/A 3.36
EC-20170624 test 3286 22.60 N/A 3.65
EC-20170701 test 2893 22.52 N/A 3.47
EC-20170708 test 3425 23.15 N/A 3.58
EC-20170715 test 3316 22.55 N/A 3.82
EC-20170722 test 3929 27.40 N/A 3.88
EC-20170729 test 4126 27.45 N/A 3.61
EC-20170909 test 3063 21.05 N/A 3.64
EC-20170916 test 3422 24.60 N/A 3.40
EC-20170923 test 3331 22.02 N/A 3.24
EC-20180113 test 2742 19.02 N/A 3.80
EC-20180120 test 3466 21.97 N/A 3.28
EC-20180127 test 3488 22.52 N/A 3.56
EC-20180203 test 3016 21.60 N/A 3.90
EC-20180210 test 3214 23.20 N/A 3.71
EC-20180217 test 3094 20.33 N/A 3.57
EC-20180224 test 3140 20.78 N/A 3.56
millennium-20170703 test 8714 55.78 N/A 1.10
millennium-20171030 test 8182 57.05 N/A 3.44
ALL train 3729924 27729 N/A 3.04
ALL dev1 545952 3742.88 N/A 2.90
ALL dev2 149255 930.64 N/A 3.25
ALL test 90021 605.48 N/A 3.32
  1. These characteristics are displayed for training (train), development (dev), and testing (test) datasets. Results for train and dev1 are not reported per file due to the large number of files (about 400 for train and about 60 for dev1)