Skip to main content

Advertisement

Table 5 The performances of MESR system in terms of WER (%) based on PNCC-mask for noisy emotional speech. For comparison, the mean values of the recognition rates for the Kaldi system have been included (see Table 3)

From: Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments

Noise types Emotional states SNR
− 5 0 5 10 20 30
Babble An. 93 89 73 50 13 7
Dis. 92 86 75 59 20 11
Fe. 92 88 72 46 14 8
Ha. 94 90 79 58 26 11
Sad. 93 89 78 52 18 11
Mean 92.8 88.4 75.4 53 18.2 9.6
Kaldi mean 94.26 91.22 84.11 69.63 35.65 20.48
White An. 92 92 83 62 21 9
Dis. 94 92 90 73 35 16
Fe. 92 89 82 70 27 12
Ha. 93 91 84 69 31 15
Sad. 94 91 86 74 32 14
Mean 93 91 85 69.6 29.2 13.2
Kaldi mean 96.83 92.90 86.65 77.41 51.82 29.69
SSN An. 93 91 78 57 16 8
Dis. 93 90 86 69 24 13
Fe. 92 90 82 58 16 10
Ha. 94 92 83 66 27 13
Sad. 95 90 85 71 22 13
Mean 93.4 90.6 82.8 64.2 21 11.4
Kaldi mean 98.50 94.58 88.53 76.04 39.93 20.93
Factory An. 93 90 83 65 21 9
Dis. 91 91 88 76 32 15
Fe. 95 94 89 68 25 11
Ha. 95 93 87 72 33 15
Sad. 92 91 89 79 36 16
Mean 93.2 91.8 87.2 72 29.4 13.2
Kaldi mean 96.67 92.97 86.55 72.98 39.62 21.30