Skip to main content

Table 4 The performances of the Kaldi baseline and the MESR systems in terms of WER (%) based on three mask estimation methods and different values of αin clean condition

From: Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments

Emotional states

Kaldi baseline

MESR baseline

PNCC-mask

MFCC-mask

α = 0

α = 0.5

α = 0

α = 0.5

Anger

11.31

11.97

4.10

19.34

5.08

Disgust

13.92

13.92

13.19

21.25

15.02

Fear

13.00

14.10

8.97

27.11

11.36

Happiness

12.52

12.34

9.48

29.86

8.94

Sadness

11.60

8.94

9.70

17.30

9.70

Average

12.47

12.25

9.08

22.97

10.02