Skip to main content

Advertisement

Table 3 The performance of Kaldi system in terms of WER (%) for noisy emotional speech trained with neutral speech

From: Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments

Noise types Emotional states SNR
− 5 0 5 10 20 30
Babble An. 94.43 91.64 86.23 65.08 35.74 26.56
Dis. 93.04 87.91 75.09 62.27 35.9 17.95
Fe. 95.05 91.76 87.91 75.64 38.46 20.33
Ha. 93.74 90.88 81.4 68.34 34.88 19.32
Sad. 95.06 93.92 89.92 76.81 33.27 18.25
Mean 94.26 91.22 84.11 69.63 35.65 20.48
White An. 97.38 93.77 83.44 70.82 40 25.74
Dis. 94.69 91.39 86.63 76.92 56.96 29.12
Fe. 98.35 94.51 90.66 85.53 53.48 28.75
Ha. 97.14 90.52 83.54 70.13 43.83 26.65
Sad. 96.58 94.3 88.97 83.65 64.83 38.21
Mean 96.83 92.90 86.65 77.41 51.82 29.69
SSN An. 98.36 94.75 88.03 71.8 39.02 24.75
Dis. 96.34 89.74 84.62 69.05 36.63 16.12
Fe. 99.27 97.07 91.21 80.4 43.41 23.81
Ha. 99.46 95.53 86.4 75.13 36.49 20.75
Sad. 99.05 95.82 92.4 83.84 44.11 19.2
Mean 98.50 94.58 88.53 76.04 39.93 20.93
Factory An. 96.56 91.31 84.1 66.07 36.72 25.74
Dis. 94.14 91.58 83.7 68.86 38.83 19.78
Fe. 98.53 95.79 91.94 81.32 44.87 23.44
Ha. 96.78 93.56 84.97 70.3 37.57 16.99
Sad. 97.34 92.59 88.02 78.33 40.11 20.53
Mean 96.67 92.97 86.55 72.98 39.62 21.30