Skip to main content

Table 3 The performance of Kaldi system in terms of WER (%) for noisy emotional speech trained with neutral speech

From: Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments

Noise types

Emotional states

SNR

− 5

0

5

10

20

30

Babble

An.

94.43

91.64

86.23

65.08

35.74

26.56

Dis.

93.04

87.91

75.09

62.27

35.9

17.95

Fe.

95.05

91.76

87.91

75.64

38.46

20.33

Ha.

93.74

90.88

81.4

68.34

34.88

19.32

Sad.

95.06

93.92

89.92

76.81

33.27

18.25

Mean

94.26

91.22

84.11

69.63

35.65

20.48

White

An.

97.38

93.77

83.44

70.82

40

25.74

Dis.

94.69

91.39

86.63

76.92

56.96

29.12

Fe.

98.35

94.51

90.66

85.53

53.48

28.75

Ha.

97.14

90.52

83.54

70.13

43.83

26.65

Sad.

96.58

94.3

88.97

83.65

64.83

38.21

Mean

96.83

92.90

86.65

77.41

51.82

29.69

SSN

An.

98.36

94.75

88.03

71.8

39.02

24.75

Dis.

96.34

89.74

84.62

69.05

36.63

16.12

Fe.

99.27

97.07

91.21

80.4

43.41

23.81

Ha.

99.46

95.53

86.4

75.13

36.49

20.75

Sad.

99.05

95.82

92.4

83.84

44.11

19.2

Mean

98.50

94.58

88.53

76.04

39.93

20.93

Factory

An.

96.56

91.31

84.1

66.07

36.72

25.74

Dis.

94.14

91.58

83.7

68.86

38.83

19.78

Fe.

98.53

95.79

91.94

81.32

44.87

23.44

Ha.

96.78

93.56

84.97

70.3

37.57

16.99

Sad.

97.34

92.59

88.02

78.33

40.11

20.53

Mean

96.67

92.97

86.55

72.98

39.62

21.30