Skip to main content

Table 6 The performance of the proposed DNN-HMM EASR system (in terms of WER (%)) for CREMA-D by applying different warping methods to various acoustic features and emotional states. The average WER values are given in the last column. The symbol * shows statistically significant cases (i.e., p value <‚ÄČ0.05)

From: Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

Feature type Warping type Emotional states
Anger Disgust Fear Happy Sad Average WER
MFCC No Warping 3.23 3.34 3.39 1.84 1.58 2.68
Filterbank Warping 3.54 3.39 4.05* 2.17 1.65 2.96
DCT Warping 2.91 3.22 3.21 1.74 1.58 2.53
Filterbank & DCT Warping 3.05 3.29 2.89 1.56 1.46 2.45
M-MFCC No Warping 3.21 3.54 4.28 1.82 1.99 2.97
Filterbank Warping 3.36 3.68 4.38 2.28* 2.05 3.15
DCT Warping 3.08 3.78 3.58* 1.59 1.94 2.79
Filterbank & DCT Warping 3.10 3.93 3.54* 1.49* 1.77 2.77
ExpoLog No Warping 2.65 4.03 3.86 1.76 1.77 2.81
Filterbank Warping 2.33* 3.91 3.73 1.42* 1.78 2.63
DCT Warping 2.48 3.76 3.96 1.61 1.75 2.71
Filterbank & DCT Warping 2.30 3.73 3.86 1.52 1.92 2.67
GFCC No Warping 22.75 5.93 11.67 6.93 3.03 10.06
Filterbank Warping 22.42 5.79 11.03 6.16* 2.91 9.66
DCT Warping 25.75* 5.69 9.94* 5.47* 3.35 10.04
Filterbank & DCT Warping 25.79* 5.73 9.89* 4.97* 3.45* 9.97
PNCC No Warping 3.31 2.69 2.15 1.37 1.13 2.13
Filterbank Warping 3.21 2.60 2.27 1.05* 1.04 2.03
DCT Warping 2.90 2.89 2.40 1.04 1.03 2.05
Filterbank & DCT Warping 2.65 2.85 2.45 1.19 1.01 2.03