Skip to main content

Table 4 AUC (%) comparison between DNN-based VAD methods using log power spectra and MFCCs

From: Enhancement of speech dynamics for voice activity detection using DNN

   AUC (%)—mean ± standard deviation
Noise SNR (dB) Log power spectra MFCCs MFCCs +Δ+ΔΔ
Clean   98.72 ±0.20 98.18 ±0.08 97.79 ±0.41
White 10 97.51 ±0.49 96.10 ±0.59 96.91 ±0.57
  5 97.27 ±0.48 93.99 ±0.85 95.11 ±0.97
  0 96.14 ±0.76 89.58 ±1.46 90.85 ±1.73
  − 5 93.88 ±1.10 81.43 ±1.11 82.42 ±1.53
Babble 10 96.50 ±0.55 92.71 ±0.92 93.51 ±0.77
  5 94.26 ±0.66 87.24 ±1.03 87.73 ±0.89
  0 88.88 ±0.74 77.78 ±0.99 77.86 ±0.82
  − 5 78.10 ±1.10 65.72 ±1.40 65.59 ±1.40
Factory 10 96.80 ±0.60 95.16 ±0.79 96.04 ±0.74
  5 95.14 ±0.72 91.60 ±1.23 92.55 ±1.06
  0 91.17 ±0.45 84.19 ±1.23 84.81 ±1.13
  − 5 80.49 ±1.54 72.40 ±1.14 72.70 ±0.72
Car 10 98.83 ±0.15 98.34 ±0.23 98.26 ±0.32
  5 98.75 ±0.16 98.22 ±0.34 98.23 ±0.35
  0 98.56 ±0.16 97.91 ±0.44 98.08 ±0.40
  − 5 98.06 ±0.02 97.27 ±0.54 97.70 ±0.46
Pink 10 97.20 ±0.66 95.91 ±0.81 96.64 ±0.62
  5 96.28 ±0.79 93.31 ±1.00 94.28 ±0.99
  0 94.06 ±0.95 87.96 ±1.54 88.91 ±1.40
  − 5 88.01 ±1.54 78.30 ±1.81 79.02 ±1.22
  1. The numbers in italics indicate the best results