Skip to main content

Table 2 The amount of texts in Thai large vocabulary speech corpora

From: Classification-based spoken text selection for LVCSR language modeling

Corpus

Text style

Number

Number of

Vocabulary

  

of utterances

word tokens

size

LOTUS

Written

4887

90,336

5112

LOTUS-CELL

Spoken

55,457

284,498

9595

LOTUS-SOC

Spoken/Written

78,264

1,601,230

13,739

VoiceTra4U-M

Spoken

9899

30,876

2141