Skip to main content

Table 5 The recognition performance of the proposed GMM-HMM EASR system (in terms of WER (%)) for CREMA-D. The values of WER are obtained by applying different warping methods to various acoustic features extracted from different emotional utterances. The symbol * shows statistically significant cases (i.e., p value <‚ÄČ0.05).

From: Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

Feature type Warping type Emotional states
Anger Disgust Fear Happy Sad Average WER
MFCC No Warping 8.11 5.54 11.20 6.39 6.50 7.55
Filterbank Warping 7.80 5.09 9.69* 5.40* 6.06 6.81
DCT Warping 7.78* 5.47 9.20* 4.92* 5.71* 6.62
Filterbank & DCT Warping 7.78* 5.44 9.25* 4.67* 5.65* 6.56
M-MFCC No Warping 8.11 6.53 10.33 6.42 6.97 7.67
Filterbank Warping 7.38* 6.26 9.32* 6.06 6.77 7.16
DCT Warping 7.28 6.35 9.50* 5.79 6.09* 7.00
Filterbank & DCT Warping 7.42* 6.16 9.32* 5.90 6.50 7.06
ExpoLog No Warping 8.25 7.25 12.32 7.88 9.63 9.07
Filterbank Warping 7.17* 6.63 11.22* 6.49* 8.75 8.05
DCT Warping 7.47 6.87 11.48* 6.41* 8.73 8.19
Filterbank & DCT Warping 6.97* 7.00 10.56* 6.11* 8.52* 7.83
GFCC No Warping 55.73 23.07 44.39 39.48 27.97 38.13
Filterbank Warping 54.95 22.33 43.28* 38.36* 26.89* 37.16
DCT Warping 55.33 22.68 43.67* 35.85* 27.97* 37.10
Filterbank & DCT Warping 55.99 23.02 44.31 36.82* 29.32 37.89
PNCC No Warping 4.49 1.96 6.25 2.17 2.26 3.43
Filterbank Warping 3.97* 1.83 5.96 1.71* 1.99* 3.09
DCT Warping 4.52 2.03 6.75 3.08 2.37 3.75
Filterbank & DCT Warping 4.44 1.98 6.21 2.56 2.09 3.46