Skip to main content

Table 3 Comparison of the most popular speech datasets used for ASR evaluation

From: Performance vs. hardware requirements in state-of-the-art automatic speech recognition

ASR task

Speech type

Size [h]

# of speakers

Framework

    

K

P

W

R

N

LibriSpeech [72]

read speech

960

∼2400

✓

✓

✓

✓

✓

WSJ [73]

 

80

284

✓

 

✓

✓

 

TED-LIUM2 [74]

TED talks

207

1242

✓

  

✓

 

Switchboard [75]

conversational telephone speech

300

543

✓

  

✓

 

Fisher [76]

 

2742

∼12400

✓

    
  1. We compare the type of speech and dataset size, expressed in number of hours of speech and number of speakers. The recipes available in various ASR frameworks: K - Kaldi; P - PaddlePaddle DeepSpeech; W - Wav2Letter; R - RWTH Returnn; N - Nvidia (OpenSeq2Seq & NeMo)