Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition

EURASIP Journal on Audio, Speech, and Music Processing

Table 2 Experimental results by using simulated reverberant data

NN conf.	RIR	Frame sel. type	Speaker identification rate (%)
			Left context only (L)					Left+right context (L+R)					Left+short right context (L+sR)
			Frame sel.	Training data				Frame sel.	Training data				Frame sel.	Training data
			Frame sel.	1u	5u	10u	15u	Frame sel.	1u	5u	10u	15u	Frame sel.	1u	5u	10u	15u
Multiple NNs	Office	Linear	3-1-0	71.6	76.7	76.8	77.0	–	–	–	–	–	–	–	–	–	–
			7-1-0	59.9	79.6	81.7	81.9	3-1-3	70.0	82.3	82.7	83.1	–	–	–	–	–
			15-1-0	33.4	65.3	78.8	80.9	7-1-7	55.5	76.9	83.3	85.1	7-1-3	56.2	81.3	85.4	85.8
		Skip1	3-1-0	*74.4*	79.5	79.4	80.1	–	–	–	–	–	–	–	–	–	–
			7-1-0	57.1	81.3	82.4	84.0	3-1-3	69.2	*83.8*	*85.8*	86.1	–	–	–	–	–
			–	–	–	–	–	7-1-7	52.7	72.0	82.2	85.0	7-1-3	59.1	83.1	85.7	*87.1*
	Livingroom	Linear	3-1-0	60.1	69.6	70.6	70.8	–	–	–	–	–	–	–	–	–	–
			7-1-0	52.3	76.0	78.1	78.9	3-1-3	52.3	75.4	75.8	75.7	–	–	–	–	–
			15-1-0	23.6	58.4	72.2	76.5	7-1-7	35.5	62.4	74.4	78.4	7-1-3	32.5	74.1	79.4	81.1
		Skip1	3-1-0	*63.2*	74.0	74.1	74.5	–	–	–	–	–	–	–	–	–	–
			7-1-0	39.8	75.5	78.8	79.7	3-1-3	52.4	*77.8*	79.5	79.1	–	–	–	–	–
			–	–	–	–	–	7-1-7	25.6	61.3	74.6	79.2	7-1-3	32.5	72.2	*79.6*	*82.1*
Single NN	Livingroom	Linear	3-1-0	64.9	71.6	70.6	71.3	–	–	–	–	–	–	–	–	–	–
			7-1-0	71.8	75.4	75.4	75.4	3-1-3	72.0	75.4	74.9	74.0	–	–	–	–	–
			15-1-0	70.5	77.0	77.2	77.7	7-1-7	73.6	77.5	79.2	78.4	7-1-3	*76.9*	77.8	78.8	78.6
		Skip1	3-1-0	71.1	73.4	74.3	74.4	–	–	–	–	–	–	–	–	–	–
			7-1-0	72.6	75.9	76.1	76.2	3-1-3	73.7	76.2	76.2	77.0	–	–	–	–	–
			–	–	–	–	–	7-1-7	71.4	79.3	79.5	*79.9*	7-1-3	74.6	*79.6*	*79.7*	79.7