AUC optimization for deep learning-based voice activity detection

EURASIP Journal on Audio, Speech, and Music Processing

Table 7 AUC results of the comparison VADs with the MaxAUC_hinge, MCE and hybrid losses on the Noisy-CHiME-4 test dataset, where the Chinese Noisy-THCHS-30 dataset was used as the training set

Noise type	SNR	MCE	MaxAUC_hinge	Hybrid loss
Babble	− 10 dB	0.5752	0.5799	0.5808
	− 5 dB	0.6442	0.6565	0.6579
	0 dB	0.7272	0.7441	0.7422
	5 dB	0.7900	0.8057	0.8030
	10 dB	0.8246	0.8390	0.8403
	15 dB	0.8420	0.8467	0.8616
	20 dB	0.8487	0.8628	0.8715
Factory	− 10 dB	0.5992	0.6011	0.6041
	− 5 dB	0.6743	0.6822	0.6799
	0 dB	0.7340	0.7474	0.7350
	5 dB	0.7791	0.7929	0.7806
	10 dB	0.8142	0.8285	0.8175
	15 dB	0.8373	0.8503	0.8488
	20 dB	0.8474	0.8646	0.8663
Volvo	− 10 dB	0.7571	0.7858	0.7862
	− 5 dB	0.7933	0.8270	0.8305
	0 dB	0.8244	0.8534	0.8572
	5 dB	0.8350	0.8602	0.8595
	10 dB	0.8374	0.8602	0.8613
	15 dB	0.8423	0.8589	0.8620
	20 dB	0.8476	0.8593	0.8638