Skip to main content

Table 3 Comparison of the most popular speech datasets used for ASR evaluation

From: Performance vs. hardware requirements in state-of-the-art automatic speech recognition

ASR task Speech type Size [h] # of speakers Framework
     K P W R N
LibriSpeech [72] read speech 960 2400
WSJ [73]   80 284    
TED-LIUM2 [74] TED talks 207 1242     
Switchboard [75] conversational telephone speech 300 543     
Fisher [76]   2742 12400     
  1. We compare the type of speech and dataset size, expressed in number of hours of speech and number of speakers. The recipes available in various ASR frameworks: K - Kaldi; P - PaddlePaddle DeepSpeech; W - Wav2Letter; R - RWTH Returnn; N - Nvidia (OpenSeq2Seq & NeMo)