Deep neural networks for automatic speech processing: a survey from large corpora to limited data

EURASIP Journal on Audio, Speech, and Music Processing

Table 2 SOTA results over test-other set from LibriSpeech and quantity of data used. Some self-supervised results come from [7] experiments

Model type	Quantity of data used		WER
	Pre-training	Training
End to end supervised [12]	-	960h	8.29
Hybrid model [11]	-		5.7
Wav2Vec2.0 using conformers and spec augment [13]	60k h		2.6
Wav2Vec using BERT XXL [14]	60k h		2.5