Skip to main content

Table 1 AUC (%) comparison between the proposed method and DNN-based VAD methods using speech period candidates and log power spectra as the baseline

From: Enhancement of speech dynamics for voice activity detection using DNN

  

AUC (%)—mean ± standard deviation

Noise

SNR (dB)

Proposed

Log power spectra

Speech period candidates

Clean

 

99.06 ±0.13

98.72 ±0.20

98.10 ±0.39

White

10

97.91 ±0.28

97.51 ±0.49

97.06 ±0.54

 

5

97.44 ±0.43

97.27 ±0.48

96.64 ±0.46

 

0

96.59 ±0.50

96.14 ±0.76

95.44 ±0.57

 

− 5

94.69 ±0.66

93.88 ±1.10

93.40 ±0.60

Babble

10

96.84 ±0.60

96.50 ±0.55

96.19 ±0.68

 

5

95.19 ±0.71

94.26 ±0.66

94.59 ±0.92

 

0

91.30 ±0.74

88.88 ±0.74

90.42 ±0.53

 

− 5

83.20 ±0.87

78.10 ±1.10

81.85 ±0.85

Factory

10

97.25 ±0.39

96.80 ±0.60

96.60 ±0.56

 

5

95.96 ±0.43

95.14 ±0.72

95.48 ±0.77

 

0

93.18 ±0.46

91.17 ±0.45

92.53 ±0.67

 

− 5

85.91 ±0.29

80.49 ±1.54

84.57 ±0.83

Car

10

99.02 ±0.11

98.83 ±0.15

97.60 ±0.45

 

5

98.94 ±0.11

98.75 ±0.16

97.37 ±0.45

 

0

98.79 ±0.09

98.56 ±0.16

97.02 ±0.41

 

− 5

98.40 ±0.05

98.06 ±0.02

96.36 ±0.32

Pink

10

97.79 ±0.39

97.20 ±0.66

96.86 ±0.73

 

5

96.82 ±0.59

96.28 ±0.79

95.98 ±0.74

 

0

95.26 ±0.70

94.06 ±0.95

94.26 ±0.89

 

− 5

91.56 ±1.03

88.01 ±1.54

89.91 ±1.20

  1. The numbers in italics indicate the best results