Skip to main content

Table 4 Development and test term list characteristics for MAVIR, RTVE, and COREMAH databases

From: ALBAYZIN 2018 spoken term detection evaluation: a multi-domain international evaluation in Spanish

Term list






#IN-LANG terms (occ.)

354 (959)

307 (1151)

208 (2071)

301 (1082)

153 (1022)

#OUT-LANG terms (occ.)

20 (55)

91 (351)

15 (50)

103 (162)

8 (16)

#SINGLE terms (occ.)

340 (984)

380 (1280)

198 (2093)

383 (1186)

145 (1004)

#MULTI terms (occ.)

34 (30)

18 (222)

25 (28)

21 (58)

16 (34)

#INV terms (occ.)

292 (668)

312 (1263)

192 (1749)

316 (1035)

128 (948)

#OOV terms (occ.)

82 (346)

86 (239)

31 (372)

88 (209)

33 (90)

  1. “dev” stands for development, “IN-LANG” refers to in-language terms, “OUT-LANG” to foreign terms, “SINGLE” to single-word terms, “MULTI” to multi-word terms, “INV” to in-vocabulary terms, “OOV” to out-of-vocabulary terms, and “occ.” stands for occurrences. The term length of the development term lists varies between 4 and 27 graphemes. The term length of the mAVIR and rTVE test term lists varies between 4 and 28 graphemes. The term length of the cOREMAH test term list varies between 3 and 17 graphemes