Skip to main content

Table 3 SOTA results over IEMOCAP using 4 emotions (happiness, neutral, anger, and sadness) and quantity of data used. Self-supervised results come from [7] experiments

From: Deep neural networks for automatic speech processing: a survey from large corpora to limited data

Model type

Quantity of data used

Accuracy

 

Pre-training

Training

 

Pase+ [8]

50h

12h

57.86

Wav2Vec2.0 [9]

960h

63.43

 

60k h

65.64

HuBERT [10]

960h

64.92

 

60k h

67.62

Multitask approach [15]

-

+ labels for the other task

81.6

DAAN [16]

1 billion words for lexical

 

82.7