From: Performance vs. hardware requirements in state-of-the-art automatic speech recognition
Section | Content |
---|---|
1. Introduction | Paper context; paper goals; paper structure |
2. Introduction to ASR systems | Main concepts about the automatic speech recognition field |
2.1 The road from pipeline ASR to end-to-end ASR | Differences between those two categories of systems |
2.2 Feature extraction | Most popular speech features |
2.3 Traditional, HMM-based acoustic modeling | Acoustic modeling using Hidden Markov Models: concepts and systems |
2.4 End-to-end ASR systems | Most common end-to-end approaches |
2.5 Language Modeling | Most common language modeling approaches |
3. State-of-the-art ASR implementations | Detailed description of 8 speech recognition systems |
3.1 Kaldi chain model TDNN | Simple time-delay neural network system |
3.2 Kaldi chain model CNN-TDNN | Convolutional + time-delay neural network system |
3.3 Paddle Paddle implementation of DeepSpeech2 | Simple recurrent neural network system |
3.4 RWTH RETURNN | Attention-based encoder-decoder neural network system |
3.5 Facebook CNN-ASG | Fully convolutional with gated linear units neural network system |
3.6 Facebook TDS-S2S | Convolutional with time-depth separable blocks neural network system |
3.7 Nvidia Jasper | Convolutional neural network with residual connections system |
3.8 Nvidia QuartzNet | Lightweight convolutional neural network with time-channel separable residual |
 | blocks system |
4. ASR comparison and evaluation. Case study on LibriSpeech | Accuracy and hardware requirements of those 8 implementations evaluated on |
 | LibriSpeech task |
4.1 Evaluation of model complexity | Definition of the metrics used for model complexity |
4.2 Comparison of ASR systems in terms of model complexity | Complexity of the models computed as the number of parameters, |
 | operations and activations |
4.3 Comparison of ASR systems in terms of performance | Transcription accuracy on LibriSpeech dataset |
4.4 Trade-offs between ASR performance and hardware | Accuracy vs. hardware requirements: trade-off analysis |
5. Conclusion | Paper summary; achieved goals |
 | Main conclusions emerged from the analysis of those 8 systems |