Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition

EURASIP Journal on Audio, Speech, and Music Processing

Table 9 Experimental results by using environment models (GMMs) as NNs selector

Method	Dataset		Speaker identification rate (%)
Method	Dataset		P01	P03	P05	P02	P04	Avg. known	Avg. unknown	Avg. all
GMM32 + Prop. (24 NNs) + CMN	P01/3/5	1s.5u	88.6	90.2	92.8	93.4	90.0	90.5	91.7	91.0
GMM32 + Prop. (24 NNs) + CMN	P01/3/5	3s.15u	89.5	93.3	94.3	95.3	91.0	92.4	93.2	92.7
GMM32 + Prop. (12 NNs) + CMN	P01/3/5	1s.5u	89.4	92.3	92.4	93.5	92.5	91.4	93.0	92.0
GMM32 + Prop. (12 NNs) + CMN	P01/3/5	3s.15u	91.7	93.5	93.8	96.5	93.5	93.0	95.0	93.8
GMM32 + Prop. (6 NNs) + CMN	P01/3/5	1s.5u	90.4	90.9	91.9	94.3	93.7	91.1	94.0	92.2
GMM32 + Prop. (6 NNs) + CMN	P01/3/5	3s.15u	92.0	92.7	92.0	96.7	94.2	92.2	95.4	93.5

The known environments include P01, P03, and P05, while the unknown environments include P02 and P04. The experiments were done by using the first testing scheme and skip1 7-1-0 frame selection. The bold text represents the best average performance for each training data number.