From: Deep neural networks for automatic speech processing: a survey from large corpora to limited data
Model type
Quantity of data used
WER
Pre-training
Training
Pase+ [8]
50h
960h
16.62
Wav2Vec2.0 [9]
4.79
60k h
3.10
HuBERT [10]
2.94
Hybrid model [11]
-
2.7
End to end supervised [12]
2.44
Wav2Vec2.0 using conformers and spec augment [13]
1.4
Wav2Vec using BERT XXL [14]