EURASIP Journal on Audio, Speech, and Music Processing

Table 7 The effect of VAD and speech separation

From: Anchor voiceprint recognition in live streaming via RawNet-SA and gated recurrent unit

Models	CN-Celeb-T			CN-Celeb-T-VAD
Models	EER	DCF08	DCF10	EER	DCF08	DCF10
RawNet2	17.25%	0.58	0.89	16.28%	0.60	0.86
RawNet2*	17.30%	0.60	0.90	16.51%	0.61	0.87
RawNet-MHSA	15.34%	0.56	0.86	15.16%	0.57	0.85
RawNet-all-SA	15.51%	0.57	0.91	15.18%	0.59	0.87
RawNet-origin-SA*	16.14%	0.58	0.87	15.89%	0.60	0.87
RawNet-SA	15.04%	0.56	0.87	14.81%	0.58	0.86

“*” denotes that the network is initialized with the trained RawNet2 parameters

Back to article page