Skip to main content

Table 3 The detail of each speech corpora set used in this study

From: Classification-based spoken text selection for LVCSR language modeling

Corpus

Data set

Usage

Number of utterances

LOTUS

LOTUS-TRN

Training the classification model

4330

 

LOTUS-DEV

Evaluation the classifier performance

557

LOTUS-CELL

CELL-TRN

Training the classification model

40,000

 

CELL-DEV

Evaluation the classifier performance

15,475

VoiceTra4U-M

VT-DEV

Optimization the selection of confidence groups

7982

 

VT-TST

Evaluation the recognition performance

1917

LOTUS-SOC

SOC

Evaluation the recognition performance

4000