Skip to main content

Table 7 Subjective comparison results with standard deviation (std dev) of mono-lingual conversion in seen-to-seen scenario. The results are listd as MOS/std dev

From: U2-VC: one-shot voice conversion using two-level nested U-structure

  MOS (similarity)/std dev MOS (naturalness)/std dev
  SF2TF SF2TM SM2TF SM2TM Average SF2TF SF2TM SM2TF SM2TM Average
AdaIN-VC 2.00/0.30 2.04/0.37 2.04/0.30 2.19/0.40 2.07 2.07/0.77 2.01/0.35 2.10/0.32 2.21/0.46 2.10
AGAIN-VC 2.92/0.46 2.76/0.40 2.87/0.45 3.40/0.59 2.99 3.38/0.65 3.18/0.49 3.12/0.41 3.62/0.43 3.33
U2-VC 3.30/0.41 3.24/0.40 3.28/0.38 4.02/0.42 3.46 3.91/0.56 3.92/0.47 3.78/0.31 4.16/0.32 3.94