Skip to main content

Table 5 AUC (%) comparison between the proposed method and other methods (Ramirez et al. [8], Kinnunen et al. [11], Sohn et al. [12], and Segbroeck et al. [44])

From: Enhancement of speech dynamics for voice activity detection using DNN

   AUC (%)—mean ± standard deviation
Noise SNR (dB) Proposed Ramirez Kinnunen Sohn Segbroeck
Clean   99.06 ±0.13 71.03 ±1.02 95.65 ±0.43 88.48 ±1.45 84.57 ±1.21
White 10 97.91 ±0.28 74.09 ±1.50 93.65 ±1.05 93.32 ±0.50 79.63 ±0.24
  5 97.44 ±0.43 73.57 ±1.36 93.02 ±1.31 87.84 ±0.50 77.75 ±0.34
  0 96.59 ±0.50 72.33 ±1.24 91.06 ±1.59 77.34 ±1.14 75.21 ±0.67
  − 5 94.69 ±0.66 68.97 ±1.07 83.85 ±1.28 66.79 ±1.85 71.89 ±0.77
Babble 10 96.84 ±0.60 68.84 ±1.32 87.71 ±0.91 87.56 ±0.87 81.25 ±0.35
  5 95.19 ±0.71 67.28 ±0.71 84.19 ±0.80 79.97 ±0.70 79.05 ±0.69
  0 91.30 ±0.74 63.62 ±0.91 76.59 ±0.99 70.05 ±0.94 72.99 ±1.13
  − 5 83.20 ±0.87 59.37 ±1.01 66.73 ±1.52 60.33 ±0.93 62.71 ±1.25
Factory 10 97.25 ±0.39 70.35 ±1.85 88.12 ±1.73 88.04 ±0.67 81.19 ±0.96
  5 95.96 ±0.43 67.54 ±1.57 84.42 ±1.67 79.55 ±0.85 78.99 ±1.06
  0 93.18 ±0.46 62.78 ±1.53 77.70 ±1.07 67.15 ±0.82 74.67 ±0.87
  − 5 85.91 ±0.29 57.81 ±1.71 66.38 ±0.76 56.28 ±0.56 67.23 ±0.52
Car 10 99.02 ±0.11 69.06 ±1.60 94.62 ±0.36 91.56 ±1.29 84.46 ±0.96
  5 98.94 ±0.11 68.27 ±1.32 93.64 ±0.80 92.15 ±0.83 84.38 ±0.96
  0 98.79 ±0.09 68.50 ±1.02 92.42 ±0.40 92.41 ±0.34 84.08 ±0.91
  − 5 98.40 ±0.05 68.77 ±1.87 90.16 ±0.17 91.81 ±0.10 83.49 ±1.06
Pink 10 97.79 ±0.39 73.11 ±1.67 90.37 ±1.33 90.54 ±0.40 81.00 ±0.94
  5 96.82 ±0.59 72.36 ±1.63 88.87 ±1.85 82.51 ±1.39 78.96 ±1.21
  0 95.26 ±0.70 70.32 ±1.53 84.13 ±1.76 71.70 ±1.70 76.10 ±1.31
  − 5 91.56 ±1.03 65.69 ±1.40 74.46 ±1.36 62.81 ±1.82 71.78 ±1.04
  1. The numbers in italics indicate the best results