Fig. 1From: Performance vs. hardware requirements in state-of-the-art automatic speech recognitionPipeline (top) vs. end-to-end (bottom) ASR. In pipeline ASR, the feature extraction is mandatory and during decoding are obtained different output representations. In end-to-end ASR, the raw waveform is directly transformed into text. An additional language model can be used in both cases for rescoringBack to article page