Skip to main content

Advertisement

Table 3 Experimental results by using simulated noisy reverberant data (RIR = ‘office’)

From: Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition

NN conf. RIR Frame sel. type Speaker identification rate (%)
Left context only (L) Left+right context (L+R) Left+short right context (L+sR)
Frame sel. Training data Frame sel. Training data Frame sel. Training data
1u 5u 10u 15u 1u 5u 10u 15u 1u 5u 10u 15u
Multiple NNs 20 dB Linear 3-1-0 53.0 59.0 63.5 61.8
7-1-0 38.9 60.5 62.9 64.6 3-1-3 42.3 64.9 66.6 65.4
15-1-0 15.3 40.7 55.1 60.5 7-1-7 24.7 50.8 61.8 65.7 7-1-3 27.8 58.6 65.8 67.0
Skip1 3-1-0 48.6 58.8 63.4 62.2
7-1-0 32.1 60.5 61.8 62.9 3-1-3 46.3 63.1 66.0 67.0
7-1-7 22.7 45.8 57.3 62.9 7-1-3 27.6 54.1 66.2 67.1
10 dB Linear 3-1-0 20.7 34.8 32.0 35.7
7-1-0 18.3 34.1 37.6 38.4 3-1-3 25.6 37.4 38.6 41.1
15-1-0 3.2 20.4 31.7 33.9 7-1-7 6.1 25.2 36.9 41.0 7-1-3 10.1 32.3 40.8 42.8
Skip1 3-1-0 31.9 32.1 34.1 35.1
7-1-0 13.2 32.0 36.5 37.0 3-1-3 20.7 37.3 39.8 41.3
7-1-7 6.2 19.8 31.4 37.2 7-1-3 8.1 32.5 37.5 41.2