Skip to main content

Table 1 SOTA results over test-clean set from LibriSpeech and quantity of data used. Some self-supervised results are provided from [7]

From: Deep neural networks for automatic speech processing: a survey from large corpora to limited data

Model type

Quantity of data used

WER

 

Pre-training

Training

 

Pase+ [8]

50h

960h

16.62

Wav2Vec2.0 [9]

960h

4.79

 

60k h

3.10

HuBERT [10]

960h

4.79

 

60k h

2.94

Hybrid model [11]

-

2.7

End to end supervised [12]

-

2.44

Wav2Vec2.0 using conformers and spec augment [13]

60k h

1.4

Wav2Vec using BERT XXL [14]

60k h

1.4