From: Deep neural networks for automatic speech processing: a survey from large corpora to limited data
Model type
Quantity of data used
Accuracy
Pre-training
Training
Pase+ [8]
50h
12h
57.86
Wav2Vec2.0 [9]
960h
63.43
60k h
65.64
HuBERT [10]
64.92
67.62
Multitask approach [15]
-
+ labels for the other task
81.6
DAAN [16]
1 billion words for lexical
82.7