Skip to main content

Table 3 The effect of modifying the normalized reference frequency, λ0, on the recognition performance of the proposed GMM-HMM EASR system (in terms of WER (%)) for CREMA-D. The values of WER are obtained by applying different warping methods to various acoustic features extracted from different emotional utterances

From: Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

Feature type Warping type Emotional states
Anger Disgust Fear Happy Sad Average WER
MFCC DCT Warping λ0 = 0 8.11 5.54 11.20 6.39 6.50 7.55
λ0 = 0.4 7.78 5.47 9.20 4.92 5.71 6.62
λ0 = 0.7 7.19 5.74 10.06 4.98 6.18 6.83
Filterbank & DCT Warping λ0 = 0 7.80 5.09 9.69 5.40 6.06 6.81
λ0 = 0.4 7.78 5.44 9.25 4.67 5.65 6.56
λ0 = 0.7 7.60 6.70 10.91 5.60 7.20 7.60
M-MFCC DCT Warping λ0 = 0 8.11 6.53 10.33 6.42 6.97 7.67
λ0 = 0.4 7.28 6.35 9.50 5.79 6.09 7.00
λ0 = 0.7 7.67 6.25 9.32 5.84 6.85 7.19
Filterbank & DCT Warping λ0 = 0 7.38 6.26 9.32 6.06 6.77 7.16
λ0 = 0.4 7.42 6.16 9.32 5.90 6.50 7.06
λ0 = 0.7 8.33 6.85 10.19 6.19 7.88 7.89
ExpoLog DCT Warping λ0 = 0 8.25 7.25 12.32 7.88 9.63 9.07
λ0 = 0.4 7.47 6.87 11.48 6.41 8.73 8.19
λ0 = 0.7 7.52 7.17 11.38 5.85 8.53 8.09
Filterbank & DCT Warping λ0 = 0 7.17 6.63 11.22 6.49 8.75 8.05
λ0 = 0.4 6.97 7.00 10.56 6.11 8.52 7.83
λ0 = 0.7 7.22 7.47 10.33 5.94 9.10 8.01
GFCC DCT Warping λ0 = 0 55.73 23.07 44.39 39.48 27.97 38.13
λ0 = 0.4 55.33 22.68 43.67 35.85 27.97 37.10
λ0 = 0.7 56.64 24.46 45.28 37.09 30.07 38.71
Filterbank & DCT Warping λ0 = 0 54.95 22.33 43.28 38.36 26.89 37.16
λ0 = 0.4 55.99 23.02 44.31 36.82 29.32 37.89
λ0 = 0.7 57.76 25.49 46.39 38.69 32.09 40.08
PNCC DCT Warping λ0 = 0 4.49 1.96 6.25 2.17 2.26 3.43
λ0 = 0.4 4.52 2.03 6.75 3.08 2.37 3.75
λ0 = 0.7 4.52 2.03 6.75 3.08 2.37 3.75
Filterbank & DCT Warping λ0 = 0 3.97 1.83 5.96 1.71 1.99 3.09
λ0 = 0.4 4.44 1.98 6.21 2.56 2.09 3.46
λ0 = 0.7 4.44 1.98 6.21 2.56 2.09 3.46