Skip to main content

Table 4 The recognition performance of the proposed GMM-HMM EASR system (in terms of WER (%)) for Persian ESD. The values of WER are obtained by applying different warping methods to various acoustic features extracted from different emotional utterances. The symbol * shows statistically significant cases (i.e., p value < 0.05)

From: Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

Feature type

Warping type

Emotional states

Anger

Disgust

Fear

Happy

Sad

Average WER

MFCC

No Warping

42.30

24.54

36.08

29.70

26.43

31.81

Filterbank Warping

42.13

23.81

34.25

30.05

21.48*

30.34

DCT Warping

28.20*

22.34

31.32*

18.78*

18.25*

23.78

Filterbank & DCT Warping

27.05*

21.61*

32.42*

20.21*

15.97*

23.45

M-MFCC

No Warping

36.07

19.41

27.11

24.15

20.15

25.38

Filterbank Warping

34.26

17.95

25.82*

23.79

20.72

24.51

DCT Warping

17.21*

16.30

21.61*

16.10*

19.01

18.05

Filterbank & DCT Warping

22.79*

15.93

24.73*

17.35*

20.53

20.27

ExpoLog

No Warping

37.87

16.12

26.01

26.83

20.34

25.43

Filterbank Warping

30.49*

13.55*

15.02*

22.72*

15.78*

19.51

DCT Warping

37.54

14.65

25.64

27.55

20.34*

25.13

Filterbank & DCT Warping

32.46*

13.92*

16.12*

28.98

19.01*

22.10

GFCC

No Warping

40.66

44.51

48.72

44.90

27.38

41.23

Filterbank Warping

39.67

43.77

47.44

44.01

27.19*

40.42

DCT Warping

26.89*

39.74*

43.77

39.53

26.81

35.35

Filterbank & DCT Warping

28.36*

40.84*

45.60

40.79

29.09*

36.94

PNCC

No Warping

3.93

5.13

5.31

6.80

5.70

5.37

Filterbank Warping

3.93

5.13

4.58

6.62

5.70

5.19

DCT Warping

4.10

5.13

4.40*

6.26

4.56

4.89

Filterbank & DCT Warping

3.93*

5.31*

4.95

6.44

5.32*

5.19