AUC optimization for deep learning-based voice activity detection

EURASIP Journal on Audio, Speech, and Music Processing

Table 3 AUC results of the comparison VADs with the BLSTM model and STFT acoustic feature on the English Noisy-CHiME-4 test dataset

Noise type	SNR	MCE	MMSE	MaxAUC_sigm	MaxAUC_hinge
Babble	− 10 dB	0.5163	0.5270	0.5428	0.5383
	− 5 dB	0.5636	0.5761	0.6010	0.5940
	0 dB	0.6491	0.6567	0.6867	0.6787
	5 dB	0.7466	0.7499	0.7716	0.7641
	10 dB	0.8227	0.8241	0.8362	0.8283
	15 dB	0.8703	0.8696	0.8765	0.8699
	20 dB	0.8977	0.8974	0.9003	0.8978
Factory	− 10 dB	0.6024	0.6031	0.6066	0.6089
	− 5 dB	0.6864	0.6830	0.6898	0.6923
	0 dB	0.7659	0.7610	0.7653	0.7685
	5 dB	0.8243	0.8196	0.8204	0.8240
	10 dB	0.8617	0.8580	0.8573	0.8599
	15 dB	0.8862	0.8826	0.8811	0.8824
	20 dB	0.9033	0.8995	0.8977	0.8984
Volvo	− 10 dB	0.8562	0.8432	0.8752	0.8702
	− 5 dB	0.8871	0.8780	0.8996	0.8961
	0 dB	0.9062	0.9010	0.9137	0.9107
	5 dB	0.9166	0.9136	0.9223	0.9182
	10 dB	0.9220	0.9194	0.9261	0.9214
	15 dB	0.9248	0.9227	0.9277	0.9229
	20 dB	0.9264	0.9248	0.9287	0.9241