Skip to main content

Advertisement

Table 6 Experimental results by using position-specific training data

From: Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition

Method Dataset Speaker identification rate (%)
P01 P03 P05 P02 P04 Avg. known Avg. unknown Avg. all
Proposed (24 NNs) + CMN P01 1s.5u 85.8* 86.0 85.9 87.1 85.5 85.8 86.1 86.1
3s.15u 91.8* 91.8 92.0 93.2 90.0 91.8 91.8 91.8
P03 1s.5u 84.0 86.8* 85.2 88.1 88.6 86.8 86.5 86.5
3s.15u 90.3 92.7* 93.5 94.8 92.8 92.7 92.9 92.8
P05 1s.5u 88.8 91.2 92.7* 93.3 89.7 92.7 90.8 91.1
3s.15u 90.3 93.0 95.0* 96.2 91.7 95.0 92.8 93.2
Proposed (12 NNs) + CMN P01 1s.5u 88.8* 88.6 88.6 89.9 90.4 88.8 89.4 89.3
3s.15u 93.5* 94.3 94.2 95.0 93.3 93.5 94.2 94.1
P03 1s.5u 87.8 89.6* 89.3 90.7 92.3 89.6 90.0 89.9
3s.15u 91.5 94.7* 94.3 95.0 94.2 94.7 93.8 93.9
P05 1s.5u 89.9 92.5 92.4* 94.2 92.7 92.4 92.3 92.3
3s.15u 91.5 93.2 92.8* 96.5 92.7 92.8 93.5 93.3
Proposed (6 NNs) + CMN P01 1s.5u 89.5* 88.4 87.9 90.9 91.7 89.5 89.7 89.7
3s.15u 92.2* 91.7 91.0 94.7 93.3 92.2 92.7 92.6
P03 1s.5u 89.1 89.7* 88.4 91.0 92.7 89.7 90.3 90.2
3s.15u 91.5 92.0* 92.0 95.0 93.8 92.0 93.0 92.9
P05 1s.5u 90.1 91.4 91.7* 94.7 93.4 91.7 92.4 92.3
3s.15u 92.5 93.2 92.3* 96.3 94.2 92.3 94.0 93.7
  1. The known environments include P01, P03, and P05, while the unknown environments include P02 and P04. The experiments were done by using the first testing scheme and skip1 7-1-0 frame selection. The asterisks (*) indicate known positions (matched conditions). The bold text represents the best average performance for each training data number.