Skip to main content

Table 6 Comparison of ASR systems in term of performance

From: Performance vs. hardware requirements in state-of-the-art automatic speech recognition

ASR system

WER[%]

 

without LM

n-gram LM

 

test-clean

test-other

test-clean

test-other

Kaldi TDNN [64]

-

-

3.85

9.57

Kaldi CNN-TDNN [65]

-

-

3.87

9.42

PaddlePaddle DeepSpeech2 [66]

10.70

30.00

6.03

20.29

RWTH Returnn [67]

4.71

15.17

4.67

15.16

Facebook CNN-ASG [68]

-

-

4.82

14.54

Facebook TDS-S2S [69]

5.36

15.64

4.21

11.87

Nvidia Jasper [70]

3.86

11.93

3.19

9.03

Nvidia QuartzNet [71]

3.90

11.28

2.98

8.38

  1. Performance is expressed in terms of the word error rate (lower is better). The evaluation is performed on two LibriSpeech subsets: test-clean and test-other. For the frameworks which allow this, the evaluation is performed in two scenarios: with or without an external language model