Skip to main content

Table 4 The recognition performance of the proposed GMM-HMM EASR system (in terms of WER (%)) for Persian ESD. The values of WER are obtained by applying different warping methods to various acoustic features extracted from different emotional utterances. The symbol * shows statistically significant cases (i.e., p value <‚ÄČ0.05)

From: Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

Feature type Warping type Emotional states
Anger Disgust Fear Happy Sad Average WER
MFCC No Warping 42.30 24.54 36.08 29.70 26.43 31.81
Filterbank Warping 42.13 23.81 34.25 30.05 21.48* 30.34
DCT Warping 28.20* 22.34 31.32* 18.78* 18.25* 23.78
Filterbank & DCT Warping 27.05* 21.61* 32.42* 20.21* 15.97* 23.45
M-MFCC No Warping 36.07 19.41 27.11 24.15 20.15 25.38
Filterbank Warping 34.26 17.95 25.82* 23.79 20.72 24.51
DCT Warping 17.21* 16.30 21.61* 16.10* 19.01 18.05
Filterbank & DCT Warping 22.79* 15.93 24.73* 17.35* 20.53 20.27
ExpoLog No Warping 37.87 16.12 26.01 26.83 20.34 25.43
Filterbank Warping 30.49* 13.55* 15.02* 22.72* 15.78* 19.51
DCT Warping 37.54 14.65 25.64 27.55 20.34* 25.13
Filterbank & DCT Warping 32.46* 13.92* 16.12* 28.98 19.01* 22.10
GFCC No Warping 40.66 44.51 48.72 44.90 27.38 41.23
Filterbank Warping 39.67 43.77 47.44 44.01 27.19* 40.42
DCT Warping 26.89* 39.74* 43.77 39.53 26.81 35.35
Filterbank & DCT Warping 28.36* 40.84* 45.60 40.79 29.09* 36.94
PNCC No Warping 3.93 5.13 5.31 6.80 5.70 5.37
Filterbank Warping 3.93 5.13 4.58 6.62 5.70 5.19
DCT Warping 4.10 5.13 4.40* 6.26 4.56 4.89
Filterbank & DCT Warping 3.93* 5.31* 4.95 6.44 5.32* 5.19