EURASIP Journal on Audio, Speech, and Music Processing

Table 3 The performance of Kaldi system in terms of WER (%) for noisy emotional speech trained with neutral speech

From: Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments

Noise types	Emotional states	SNR
Noise types	Emotional states	− 5	0	5	10	20	30
Babble	An.	94.43	91.64	86.23	65.08	35.74	26.56
	Dis.	93.04	87.91	75.09	62.27	35.9	17.95
	Fe.	95.05	91.76	87.91	75.64	38.46	20.33
	Ha.	93.74	90.88	81.4	68.34	34.88	19.32
	Sad.	95.06	93.92	89.92	76.81	33.27	18.25
	Mean	94.26	91.22	84.11	69.63	35.65	20.48
White	An.	97.38	93.77	83.44	70.82	40	25.74
	Dis.	94.69	91.39	86.63	76.92	56.96	29.12
	Fe.	98.35	94.51	90.66	85.53	53.48	28.75
	Ha.	97.14	90.52	83.54	70.13	43.83	26.65
	Sad.	96.58	94.3	88.97	83.65	64.83	38.21
	Mean	96.83	92.90	86.65	77.41	51.82	29.69
SSN	An.	98.36	94.75	88.03	71.8	39.02	24.75
	Dis.	96.34	89.74	84.62	69.05	36.63	16.12
	Fe.	99.27	97.07	91.21	80.4	43.41	23.81
	Ha.	99.46	95.53	86.4	75.13	36.49	20.75
	Sad.	99.05	95.82	92.4	83.84	44.11	19.2
	Mean	98.50	94.58	88.53	76.04	39.93	20.93
Factory	An.	96.56	91.31	84.1	66.07	36.72	25.74
	Dis.	94.14	91.58	83.7	68.86	38.83	19.78
	Fe.	98.53	95.79	91.94	81.32	44.87	23.44
	Ha.	96.78	93.56	84.97	70.3	37.57	16.99
	Sad.	97.34	92.59	88.02	78.33	40.11	20.53
	Mean	96.67	92.97	86.55	72.98	39.62	21.30

Back to article page