Skip to main content

Table 7 Speech recognition accuracy of the converted speech using different VC methods in terms of WER and CER

From: W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision

Method

WER (\(\downarrow\))

CER (\(\downarrow\))

VQ-VAE

20.21

10.30

CTC-VQ-VAE

2.99

1.08

FragmentVC

72.85

46.10

W2VC

1.63

0.57

Ground truth

1.30

0.48