Skip to main content

Table 5 The MOS with 95% confidence intervals shows the impact of different modules in network on naturalness and similarity

From: W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision

Method

Nat.

Sim.

Intra

Inter

Avg

Intra

Inter

Avg

W2VC

4.42 ± 0.1737

4.48 ± 0.1632

4.45 ± 0.1970

3.46 ± 0.2571

3.78 ± 0.2859

3.62 ± 0.2153

w/o CTC

4.32 ± 0.1925

4.27 ± 0.2437

4.30 ± 0.2512

3.43 ± 0.3270

3.62 ± 0.3727

3.53 ± 0.1470

w/o GRL

4.22 ± 0.2431

4.32 ± 0.1930

4.27 ± 0.2281

3.41 ± 0.2372

3.77 ± 0.2202

3.59 ± 0.2881

w/o CTC+GRL

3.45 ± 0.2856

3.48 ± 0.2849

3.47 ± 0.2763

2.96  ±0.3016

2.96 ± 0.2962

2.96 ± 0.2978