Skip to main content

Table 4 The MOS results with 95% confidence intervals in naturalness, similarity and MCD of the generated speech using WLMR-based HiFi-GAN and that using the MelGAN vocoder

From: W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision

Vocoder

Nat.

Sim.

MCD

Intra

Inter

Avg

Intra

Inter

Avg

MelGAN

2.27 ± 0.2364

1.98 ± 0.3334

2.13 ± 0.2692

2.45 ± 0.3133

1.83 ± 0.4296

2.14 ± 0.2589

9.508

HiFi-GAN

4.42 ± 0.1737

4.48 ± 0.1632

4.45 ± 0.1970

3.46 ± 0.2571

3.78 ± 0.2859

3.62 ± 0.2153

8.901