From: Classification-based spoken text selection for LVCSR language modeling
Corpus | Data set | Usage | Number of utterances |
---|---|---|---|
LOTUS | LOTUS-TRN | Training the classification model | 4330 |
LOTUS-DEV | Evaluation the classifier performance | 557 | |
LOTUS-CELL | CELL-TRN | Training the classification model | 40,000 |
CELL-DEV | Evaluation the classifier performance | 15,475 | |
VoiceTra4U-M | VT-DEV | Optimization the selection of confidence groups | 7982 |
VT-TST | Evaluation the recognition performance | 1917 | |
LOTUS-SOC | SOC | Evaluation the recognition performance | 4000 |