EURASIP Journal on Audio, Speech, and Music Processing

Table 2 The amount of texts in Thai large vocabulary speech corpora

From: Classification-based spoken text selection for LVCSR language modeling

Corpus	Text style	Number	Number of	Vocabulary
		of utterances	word tokens	size
LOTUS	Written	4887	90,336	5112
LOTUS-CELL	Spoken	55,457	284,498	9595
LOTUS-SOC	Spoken/Written	78,264	1,601,230	13,739
VoiceTra4U-M	Spoken	9899	30,876	2141

Back to article page