Skip to main content

Table 2 SOTA results over test-other set from LibriSpeech and quantity of data used. Some self-supervised results come from [7] experiments

From: Deep neural networks for automatic speech processing: a survey from large corpora to limited data

Model type

Quantity of data used

WER

 

Pre-training

Training

 

End to end supervised [12]

-

960h

8.29

Hybrid model [11]

-

5.7

Wav2Vec2.0 using conformers and spec augment [13]

60k h

2.6

Wav2Vec using BERT XXL [14]

60k h

2.5