From: Deep neural networks for automatic speech processing: a survey from large corpora to limited data
Model type
Quantity of data used
WER
Pre-training
Training
End to end supervised [12]
-
960h
8.29
Hybrid model [11]
5.7
Wav2Vec2.0 using conformers and spec augment [13]
60k h
2.6
Wav2Vec using BERT XXL [14]
2.5