From: Enhancement of speech dynamics for voice activity detection using DNN
AUC (%)—mean ± standard deviation | ||||||
---|---|---|---|---|---|---|
Noise | SNR (dB) | Proposed | Ramirez | Kinnunen | Sohn | Segbroeck |
Clean | 99.06 ±0.13 | 71.03 ±1.02 | 95.65 ±0.43 | 88.48 ±1.45 | 84.57 ±1.21 | |
White | 10 | 97.91 ±0.28 | 74.09 ±1.50 | 93.65 ±1.05 | 93.32 ±0.50 | 79.63 ±0.24 |
5 | 97.44 ±0.43 | 73.57 ±1.36 | 93.02 ±1.31 | 87.84 ±0.50 | 77.75 ±0.34 | |
0 | 96.59 ±0.50 | 72.33 ±1.24 | 91.06 ±1.59 | 77.34 ±1.14 | 75.21 ±0.67 | |
− 5 | 94.69 ±0.66 | 68.97 ±1.07 | 83.85 ±1.28 | 66.79 ±1.85 | 71.89 ±0.77 | |
Babble | 10 | 96.84 ±0.60 | 68.84 ±1.32 | 87.71 ±0.91 | 87.56 ±0.87 | 81.25 ±0.35 |
5 | 95.19 ±0.71 | 67.28 ±0.71 | 84.19 ±0.80 | 79.97 ±0.70 | 79.05 ±0.69 | |
0 | 91.30 ±0.74 | 63.62 ±0.91 | 76.59 ±0.99 | 70.05 ±0.94 | 72.99 ±1.13 | |
− 5 | 83.20 ±0.87 | 59.37 ±1.01 | 66.73 ±1.52 | 60.33 ±0.93 | 62.71 ±1.25 | |
Factory | 10 | 97.25 ±0.39 | 70.35 ±1.85 | 88.12 ±1.73 | 88.04 ±0.67 | 81.19 ±0.96 |
5 | 95.96 ±0.43 | 67.54 ±1.57 | 84.42 ±1.67 | 79.55 ±0.85 | 78.99 ±1.06 | |
0 | 93.18 ±0.46 | 62.78 ±1.53 | 77.70 ±1.07 | 67.15 ±0.82 | 74.67 ±0.87 | |
− 5 | 85.91 ±0.29 | 57.81 ±1.71 | 66.38 ±0.76 | 56.28 ±0.56 | 67.23 ±0.52 | |
Car | 10 | 99.02 ±0.11 | 69.06 ±1.60 | 94.62 ±0.36 | 91.56 ±1.29 | 84.46 ±0.96 |
5 | 98.94 ±0.11 | 68.27 ±1.32 | 93.64 ±0.80 | 92.15 ±0.83 | 84.38 ±0.96 | |
0 | 98.79 ±0.09 | 68.50 ±1.02 | 92.42 ±0.40 | 92.41 ±0.34 | 84.08 ±0.91 | |
− 5 | 98.40 ±0.05 | 68.77 ±1.87 | 90.16 ±0.17 | 91.81 ±0.10 | 83.49 ±1.06 | |
Pink | 10 | 97.79 ±0.39 | 73.11 ±1.67 | 90.37 ±1.33 | 90.54 ±0.40 | 81.00 ±0.94 |
5 | 96.82 ±0.59 | 72.36 ±1.63 | 88.87 ±1.85 | 82.51 ±1.39 | 78.96 ±1.21 | |
0 | 95.26 ±0.70 | 70.32 ±1.53 | 84.13 ±1.76 | 71.70 ±1.70 | 76.10 ±1.31 | |
− 5 | 91.56 ±1.03 | 65.69 ±1.40 | 74.46 ±1.36 | 62.81 ±1.82 | 71.78 ±1.04 |