Skip to main content

Table 2 AUC (%) comparison between the proposed method and DNN-based VAD methods using speech period candidates and log power spectra as the baseline for unknown SNR environments (7, 3, − 3, and − 7 dB)

From: Enhancement of speech dynamics for voice activity detection using DNN

  

AUC (%)—mean ± standard deviation

Noise

SNR (dB)

Proposed

Log power spectra

Speech period candidates

White

7

97.57 ±0.41

97.44 ±0.53

96.82 ±0.44

 

3

97.10 ±0.56

96.92 ±0.53

96.25 ±0.55

 

− 3

95.72 ±0.69

95.04 ±0.89

94.36 ±0.88

 

− 7

93.31 ±0.81

92.43 ±1.01

91.77 ±0.69

Babble

7

95.87 ±0.58

95.36 ±0.73

95.20 ±0.68

 

3

93.77 ±0.54

92.49 ±0.62

93.26 ±0.69

 

− 3

86.90 ±0.95

83.37 ±1.14

86.10 ±1.01

 

− 7

78.49 ±1.00

73.11 ±0.86

77.55 ±0.81

Factory

7

96.50 ±0.50

96.11 ±0.53

95.88 ±0.73

 

3

95.00 ±0.54

94.11 ±0.59

94.43 ±0.71

 

− 3

89.05 ±0.34

85.00 ±0.56

88.40 ±0.48

 

− 7

80.49 ±0.72

72.66 ±1.54

79.45 ±0.80

Car

7

98.99 ±0.15

98.81 ±0.17

97.51 ±0.40

 

3

98.92 ±0.16

98.71 ±0.18

97.29 ±0.42

 

− 3

98.66 ±0.19

98.39 ±0.23

96.69 ±0.37

 

− 7

98.10 ±0.42

97.74 ±0.54

95.89 ±0.43

Pink

7

97.20 ±0.50

96.64 ±0.66

96.31 ±0.71

 

3

96.21 ±0.60

95.48 ±0.67

95.46 ±0.60

 

− 3

93.57 ±0.84

91.34 ±0.74

92.22 ±0.69

 

− 7

89.32 ±0.65

84.92 ±0.66

86.93 ±0.50

  1. The numbers in italics indicate the best results