Skip to main content

Table 2 AUC results of the comparison VADs with the feedforward neural network and STFT acoustic feature on the English Noisy-CHiME-4 test dataset. We use the names of the objectives of the VADs to represent the VADs for short

From: AUC optimization for deep learning-based voice activity detection

Noise type

SNR

MCE

MMSE

MaxAUCsigm

MaxAUChinge

Babble

− 10 dB

0.5319

0.5381

0.5631

0.5561

 

− 5 dB

0.6006

0.6097

0.6450

0.6359

 

0 dB

0.7092

0.7109

0.7431

0.7363

 

5 dB

0.8036

0.8046

0.8226

0.8187

 

10 dB

0.8652

0.8673

0.8762

0.8726

 

15 dB

0.9028

0.9021

0.9071

0.9044

 

20 dB

0.9208

0.9191

0.9214

0.9204

Factory

− 10 dB

0.6321

0.6303

0.6399

0.6400

 

− 5 dB

0.7275

0.7260

0.7314

0.7341

 

0 dB

0.8078

0.8072

0.8071

0.8114

 

5 dB

0.8616

0.8611

0.8587

0.8628

 

10 dB

0.8967

0.8955

0.8936

0.8968

 

15 dB

0.9162

0.9139

0.9132

0.9151

 

20 dB

0.9263

0.9235

0.9236

0.9247

Volvo

− 10 dB

0.8910

0.8793

0.9002

0.8968

 

− 5 dB

0.9109

0.9042

0.9136

0.9132

 

0 dB

0.9217

0.9177

0.9214

0.9218

 

5 dB

0.9276

0.9242

0.9260

0.9260

 

10 dB

0.9311

0.9275

0.9285

0.9280

 

15 dB

0.9329

0.9292

0.9299

0.9292

 

20 dB

0.9338

0.9302

0.9306

0.9301