AUC optimization for deep learning-based voice activity detection

EURASIP Journal on Audio, Speech, and Music Processing

Table 4 AUC results of the comparison VADs with the BLSTM model and STFT acoustic feature on the Chinese Noisy-THCHS-30 test dataset

Noise type	SNR	MCE	MMSE	MaxAUC_sigm	MaxAUC_hinge
Babble	− 10 dB	0.5226	0.5268	0.5315	0.5308
	− 5 dB	0.5826	0.5918	0.5944	0.5944
	0 dB	0.6800	0.6901	0.6897	0.6943
	5 dB	0.7787	0.7834	0.7853	0.7915
	10 dB	0.8484	0.8481	0.8520	0.8563
	15 dB	0.8870	0.8864	0.8893	0.8928
	20 dB	0.9096	0.9099	0.9116	0.9140
Factory	− 10 dB	0.6247	0.6238	0.6300	0.6420
	− 5 dB	0.7177	0.7168	0.7216	0.7314
	0 dB	0.7962	0.7948	0.7976	0.8030
	5 dB	0.8483	0.8471	0.8483	0.8511
	10 dB	0.8805	0.8798	0.8810	0.8828
	15 dB	0.9031	0.9020	0.9033	0.9053
	20 dB	0.9198	0.9175	0.9188	0.9223
Volvo	− 10 dB	0.8851	0.8753	0.8848	0.8845
	− 5 dB	0.9077	0.8984	0.9095	0.9101
	0 dB	0.9208	0.9145	0.9234	0.9252
	5 dB	0.9292	0.9257	0.9313	0.9332
	10 dB	0.9352	0.9328	0.9353	0.9382
	15 dB	0.9384	0.9361	0.9368	0.9412
	20 dB	0.9398	0.9374	0.9375	0.9425