Skip to main content

Table 1 Paper summary

From: Performance vs. hardware requirements in state-of-the-art automatic speech recognition

Section

Content

1. Introduction

Paper context; paper goals; paper structure

2. Introduction to ASR systems

Main concepts about the automatic speech recognition field

2.1 The road from pipeline ASR to end-to-end ASR

Differences between those two categories of systems

2.2 Feature extraction

Most popular speech features

2.3 Traditional, HMM-based acoustic modeling

Acoustic modeling using Hidden Markov Models: concepts and systems

2.4 End-to-end ASR systems

Most common end-to-end approaches

2.5 Language Modeling

Most common language modeling approaches

3. State-of-the-art ASR implementations

Detailed description of 8 speech recognition systems

3.1 Kaldi chain model TDNN

Simple time-delay neural network system

3.2 Kaldi chain model CNN-TDNN

Convolutional + time-delay neural network system

3.3 Paddle Paddle implementation of DeepSpeech2

Simple recurrent neural network system

3.4 RWTH RETURNN

Attention-based encoder-decoder neural network system

3.5 Facebook CNN-ASG

Fully convolutional with gated linear units neural network system

3.6 Facebook TDS-S2S

Convolutional with time-depth separable blocks neural network system

3.7 Nvidia Jasper

Convolutional neural network with residual connections system

3.8 Nvidia QuartzNet

Lightweight convolutional neural network with time-channel separable residual

 

blocks system

4. ASR comparison and evaluation. Case study on LibriSpeech

Accuracy and hardware requirements of those 8 implementations evaluated on

 

LibriSpeech task

4.1 Evaluation of model complexity

Definition of the metrics used for model complexity

4.2 Comparison of ASR systems in terms of model complexity

Complexity of the models computed as the number of parameters,

 

operations and activations

4.3 Comparison of ASR systems in terms of performance

Transcription accuracy on LibriSpeech dataset

4.4 Trade-offs between ASR performance and hardware

Accuracy vs. hardware requirements: trade-off analysis

5. Conclusion

Paper summary; achieved goals

 

Main conclusions emerged from the analysis of those 8 systems