EURASIP Journal on Audio, Speech, and Music Processing

Table 7 Speech recognition accuracy of the converted speech using different VC methods in terms of WER and CER

From: W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision

Method	WER (\(\downarrow\))	CER (\(\downarrow\))
VQ-VAE	20.21	10.30
CTC-VQ-VAE	2.99	1.08
FragmentVC	72.85	46.10
W2VC	1.63	0.57
Ground truth	1.30	0.48

Back to article page