Skip to main content

Table 4 AUC (%) comparison between DNN-based VAD methods using log power spectra and MFCCs

From: Enhancement of speech dynamics for voice activity detection using DNN

  

AUC (%)—mean ± standard deviation

Noise

SNR (dB)

Log power spectra

MFCCs

MFCCs +Δ+ΔΔ

Clean

 

98.72 ±0.20

98.18 ±0.08

97.79 ±0.41

White

10

97.51 ±0.49

96.10 ±0.59

96.91 ±0.57

 

5

97.27 ±0.48

93.99 ±0.85

95.11 ±0.97

 

0

96.14 ±0.76

89.58 ±1.46

90.85 ±1.73

 

− 5

93.88 ±1.10

81.43 ±1.11

82.42 ±1.53

Babble

10

96.50 ±0.55

92.71 ±0.92

93.51 ±0.77

 

5

94.26 ±0.66

87.24 ±1.03

87.73 ±0.89

 

0

88.88 ±0.74

77.78 ±0.99

77.86 ±0.82

 

− 5

78.10 ±1.10

65.72 ±1.40

65.59 ±1.40

Factory

10

96.80 ±0.60

95.16 ±0.79

96.04 ±0.74

 

5

95.14 ±0.72

91.60 ±1.23

92.55 ±1.06

 

0

91.17 ±0.45

84.19 ±1.23

84.81 ±1.13

 

− 5

80.49 ±1.54

72.40 ±1.14

72.70 ±0.72

Car

10

98.83 ±0.15

98.34 ±0.23

98.26 ±0.32

 

5

98.75 ±0.16

98.22 ±0.34

98.23 ±0.35

 

0

98.56 ±0.16

97.91 ±0.44

98.08 ±0.40

 

− 5

98.06 ±0.02

97.27 ±0.54

97.70 ±0.46

Pink

10

97.20 ±0.66

95.91 ±0.81

96.64 ±0.62

 

5

96.28 ±0.79

93.31 ±1.00

94.28 ±0.99

 

0

94.06 ±0.95

87.96 ±1.54

88.91 ±1.40

 

− 5

88.01 ±1.54

78.30 ±1.81

79.02 ±1.22

  1. The numbers in italics indicate the best results