Classification-based spoken text selection for LVCSR language modeling

EURASIP Journal on Audio, Speech, and Music Processing

Table 3 The detail of each speech corpora set used in this study

Corpus	Data set	Usage	Number of utterances
LOTUS	LOTUS-TRN	Training the classification model	4330
	LOTUS-DEV	Evaluation the classifier performance	557
LOTUS-CELL	CELL-TRN	Training the classification model	40,000
	CELL-DEV	Evaluation the classifier performance	15,475
VoiceTra4U-M	VT-DEV	Optimization the selection of confidence groups	7982
	VT-TST	Evaluation the recognition performance	1917
LOTUS-SOC	SOC	Evaluation the recognition performance	4000