Skip to main content

Table 4 SOTA results over VoxCeleb1 and quantity of data used. Self-supervised results come from [7] experiments

From: Deep neural networks for automatic speech processing: a survey from large corpora to limited data

Model type

Quantity of data used

Accuracy

 

Pre-training

Training

 

Pase+ [8]

50h

350h

37.99

Wav2Vec2.0 [9]

960h

75.18

 

60k h

86.14

HuBERT [10]

960h

81.42

 

60k h

90.33

AutoSpeech [17]

-

87.66