Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition

EURASIP Journal on Audio, Speech, and Music Processing

Table 3 Experimental results by using simulated noisy reverberant data (RIR = ‘office’)

NN conf.	RIR	Frame sel. type	Speaker identification rate (%)
			Left context only (L)					Left+right context (L+R)					Left+short right context (L+sR)
			Frame sel.	Training data				Frame sel.	Training data				Frame sel.	Training data
			Frame sel.	1u	5u	10u	15u	Frame sel.	1u	5u	10u	15u	Frame sel.	1u	5u	10u	15u
Multiple NNs	20 dB	Linear	3-1-0	*53.0*	59.0	63.5	61.8	–	–	–	–	–	–	–	–	–	–
			7-1-0	38.9	60.5	62.9	64.6	3-1-3	42.3	*64.9*	*66.6*	65.4	–	–	–	–	–
			15-1-0	15.3	40.7	55.1	60.5	7-1-7	24.7	50.8	61.8	65.7	7-1-3	27.8	58.6	65.8	67.0
		Skip1	3-1-0	48.6	58.8	63.4	62.2	–	–	–	–	–	–	–	–	–	–
			7-1-0	32.1	60.5	61.8	62.9	3-1-3	46.3	63.1	66.0	67.0	–	–	–	–	–
			–	–	–	–	–	7-1-7	22.7	45.8	57.3	62.9	7-1-3	27.6	54.1	66.2	*67.1*
	10 dB	Linear	3-1-0	20.7	34.8	32.0	35.7	–	–	–	–	–	–	–	–	–	–
			7-1-0	18.3	34.1	37.6	38.4	3-1-3	25.6	*37.4*	38.6	41.1	–	–	–	–	–
			15-1-0	3.2	20.4	31.7	33.9	7-1-7	6.1	25.2	36.9	41.0	7-1-3	10.1	32.3	*40.8*	*42.8*
		Skip1	3-1-0	*31.9*	32.1	34.1	35.1	–	–	–	–	–	–	–	–	–	–
			7-1-0	13.2	32.0	36.5	37.0	3-1-3	20.7	37.3	39.8	41.3	–	–	–	–	–
			–	–	–	–	–	7-1-7	6.2	19.8	31.4	37.2	7-1-3	8.1	32.5	37.5	41.2