Skip to main content

Table 6 The performance of the proposed DNN-HMM EASR system (in terms of WER (%)) for CREMA-D by applying different warping methods to various acoustic features and emotional states. The average WER values are given in the last column. The symbol * shows statistically significant cases (i.e., p value < 0.05)

From: Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

Feature type

Warping type

Emotional states

Anger

Disgust

Fear

Happy

Sad

Average WER

MFCC

No Warping

3.23

3.34

3.39

1.84

1.58

2.68

Filterbank Warping

3.54

3.39

4.05*

2.17

1.65

2.96

DCT Warping

2.91

3.22

3.21

1.74

1.58

2.53

Filterbank & DCT Warping

3.05

3.29

2.89

1.56

1.46

2.45

M-MFCC

No Warping

3.21

3.54

4.28

1.82

1.99

2.97

Filterbank Warping

3.36

3.68

4.38

2.28*

2.05

3.15

DCT Warping

3.08

3.78

3.58*

1.59

1.94

2.79

Filterbank & DCT Warping

3.10

3.93

3.54*

1.49*

1.77

2.77

ExpoLog

No Warping

2.65

4.03

3.86

1.76

1.77

2.81

Filterbank Warping

2.33*

3.91

3.73

1.42*

1.78

2.63

DCT Warping

2.48

3.76

3.96

1.61

1.75

2.71

Filterbank & DCT Warping

2.30

3.73

3.86

1.52

1.92

2.67

GFCC

No Warping

22.75

5.93

11.67

6.93

3.03

10.06

Filterbank Warping

22.42

5.79

11.03

6.16*

2.91

9.66

DCT Warping

25.75*

5.69

9.94*

5.47*

3.35

10.04

Filterbank & DCT Warping

25.79*

5.73

9.89*

4.97*

3.45*

9.97

PNCC

No Warping

3.31

2.69

2.15

1.37

1.13

2.13

Filterbank Warping

3.21

2.60

2.27

1.05*

1.04

2.03

DCT Warping

2.90

2.89

2.40

1.04

1.03

2.05

Filterbank & DCT Warping

2.65

2.85

2.45

1.19

1.01

2.03