From: Performance vs. hardware requirements in state-of-the-art automatic speech recognition
ASR system | WER[%] | |||
---|---|---|---|---|
 | without LM | n-gram LM | ||
 | test-clean | test-other | test-clean | test-other |
Kaldi TDNN [64] | - | - | 3.85 | 9.57 |
Kaldi CNN-TDNN [65] | - | - | 3.87 | 9.42 |
PaddlePaddle DeepSpeech2 [66] | 10.70 | 30.00 | 6.03 | 20.29 |
RWTH Returnn [67] | 4.71 | 15.17 | 4.67 | 15.16 |
Facebook CNN-ASG [68] | - | - | 4.82 | 14.54 |
Facebook TDS-S2S [69] | 5.36 | 15.64 | 4.21 | 11.87 |
Nvidia Jasper [70] | 3.86 | 11.93 | 3.19 | 9.03 |
Nvidia QuartzNet [71] | 3.90 | 11.28 | 2.98 | 8.38 |