Skip to main content

Table 2 The MOS results with 95% confidence intervals showing the impact of VQ-VAE, CTC-VQ-VAE, FragmentVC, and the proposed method on speech similarity

From: W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision

Method

Intra-gender

Inter-gender

Average

VQ-VAE

1.97 ± 0.2736

1.91 ± 0.2576

1.94 ± 0.2113

CTC-VQ-VAE

3.22 ± 0.2897

3.23 ± 0.2812

3.22 ± 0.2476

FragmentVC

2.34 ± 0.3666

2.42 ± 0.3878

2.38 ± 0.2635

W2VC

3.46 ± 0.2571

3.78 ± 0.2859

3.62 ± 0.2153