NN conf. | RIR | Frame sel. type | Speaker identification rate (%) | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Left context only (L) | Left+right context (L+R) | Left+short right context (L+sR) | |||||||||||||||
Frame sel. | Training data | Frame sel. | Training data | Frame sel. | Training data | ||||||||||||
1u | 5u | 10u | 15u | 1u | 5u | 10u | 15u | 1u | 5u | 10u | 15u | ||||||
Multiple NNs | 20 dB | Linear | 3-1-0 | 53.0 | 59.0 | 63.5 | 61.8 | – | – | – | – | – | – | – | – | – | – |
7-1-0 | 38.9 | 60.5 | 62.9 | 64.6 | 3-1-3 | 42.3 | 64.9 | 66.6 | 65.4 | – | – | – | – | – | |||
15-1-0 | 15.3 | 40.7 | 55.1 | 60.5 | 7-1-7 | 24.7 | 50.8 | 61.8 | 65.7 | 7-1-3 | 27.8 | 58.6 | 65.8 | 67.0 | |||
Skip1 | 3-1-0 | 48.6 | 58.8 | 63.4 | 62.2 | – | – | – | – | – | – | – | – | – | – | ||
7-1-0 | 32.1 | 60.5 | 61.8 | 62.9 | 3-1-3 | 46.3 | 63.1 | 66.0 | 67.0 | – | – | – | – | – | |||
– | – | – | – | – | 7-1-7 | 22.7 | 45.8 | 57.3 | 62.9 | 7-1-3 | 27.6 | 54.1 | 66.2 | 67.1 | |||
10 dB | Linear | 3-1-0 | 20.7 | 34.8 | 32.0 | 35.7 | – | – | – | – | – | – | – | – | – | – | |
7-1-0 | 18.3 | 34.1 | 37.6 | 38.4 | 3-1-3 | 25.6 | 37.4 | 38.6 | 41.1 | – | – | – | – | – | |||
15-1-0 | 3.2 | 20.4 | 31.7 | 33.9 | 7-1-7 | 6.1 | 25.2 | 36.9 | 41.0 | 7-1-3 | 10.1 | 32.3 | 40.8 | 42.8 | |||
Skip1 | 3-1-0 | 31.9 | 32.1 | 34.1 | 35.1 | – | – | – | – | – | – | – | – | – | – | ||
7-1-0 | 13.2 | 32.0 | 36.5 | 37.0 | 3-1-3 | 20.7 | 37.3 | 39.8 | 41.3 | – | – | – | – | – | |||
– | – | – | – | – | 7-1-7 | 6.2 | 19.8 | 31.4 | 37.2 | 7-1-3 | 8.1 | 32.5 | 37.5 | 41.2 |