Skip to main content

Table 2 AUC (%) comparison between the proposed method and DNN-based VAD methods using speech period candidates and log power spectra as the baseline for unknown SNR environments (7, 3, − 3, and − 7 dB)

From: Enhancement of speech dynamics for voice activity detection using DNN

   AUC (%)—mean ± standard deviation
Noise SNR (dB) Proposed Log power spectra Speech period candidates
White 7 97.57 ±0.41 97.44 ±0.53 96.82 ±0.44
  3 97.10 ±0.56 96.92 ±0.53 96.25 ±0.55
  − 3 95.72 ±0.69 95.04 ±0.89 94.36 ±0.88
  − 7 93.31 ±0.81 92.43 ±1.01 91.77 ±0.69
Babble 7 95.87 ±0.58 95.36 ±0.73 95.20 ±0.68
  3 93.77 ±0.54 92.49 ±0.62 93.26 ±0.69
  − 3 86.90 ±0.95 83.37 ±1.14 86.10 ±1.01
  − 7 78.49 ±1.00 73.11 ±0.86 77.55 ±0.81
Factory 7 96.50 ±0.50 96.11 ±0.53 95.88 ±0.73
  3 95.00 ±0.54 94.11 ±0.59 94.43 ±0.71
  − 3 89.05 ±0.34 85.00 ±0.56 88.40 ±0.48
  − 7 80.49 ±0.72 72.66 ±1.54 79.45 ±0.80
Car 7 98.99 ±0.15 98.81 ±0.17 97.51 ±0.40
  3 98.92 ±0.16 98.71 ±0.18 97.29 ±0.42
  − 3 98.66 ±0.19 98.39 ±0.23 96.69 ±0.37
  − 7 98.10 ±0.42 97.74 ±0.54 95.89 ±0.43
Pink 7 97.20 ±0.50 96.64 ±0.66 96.31 ±0.71
  3 96.21 ±0.60 95.48 ±0.67 95.46 ±0.60
  − 3 93.57 ±0.84 91.34 ±0.74 92.22 ±0.69
  − 7 89.32 ±0.65 84.92 ±0.66 86.93 ±0.50
  1. The numbers in italics indicate the best results