Skip to main content

Table 5 The performances of MESR system in terms of WER (%) based on PNCC-mask for noisy emotional speech. For comparison, the mean values of the recognition rates for the Kaldi system have been included (see Table 3)

From: Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments

Noise types

Emotional states

SNR

− 5

0

5

10

20

30

Babble

An.

93

89

73

50

13

7

Dis.

92

86

75

59

20

11

Fe.

92

88

72

46

14

8

Ha.

94

90

79

58

26

11

Sad.

93

89

78

52

18

11

Mean

92.8

88.4

75.4

53

18.2

9.6

Kaldi mean

94.26

91.22

84.11

69.63

35.65

20.48

White

An.

92

92

83

62

21

9

Dis.

94

92

90

73

35

16

Fe.

92

89

82

70

27

12

Ha.

93

91

84

69

31

15

Sad.

94

91

86

74

32

14

Mean

93

91

85

69.6

29.2

13.2

Kaldi mean

96.83

92.90

86.65

77.41

51.82

29.69

SSN

An.

93

91

78

57

16

8

Dis.

93

90

86

69

24

13

Fe.

92

90

82

58

16

10

Ha.

94

92

83

66

27

13

Sad.

95

90

85

71

22

13

Mean

93.4

90.6

82.8

64.2

21

11.4

Kaldi mean

98.50

94.58

88.53

76.04

39.93

20.93

Factory

An.

93

90

83

65

21

9

Dis.

91

91

88

76

32

15

Fe.

95

94

89

68

25

11

Ha.

95

93

87

72

33

15

Sad.

92

91

89

79

36

16

Mean

93.2

91.8

87.2

72

29.4

13.2

Kaldi mean

96.67

92.97

86.55

72.98

39.62

21.30