Skip to main content

Table 4 Subjective evaluation results of the ablation study on architecture in seen-to-seen conversion scenario. “AGAIN-VC” represents the network has neither U2-Net structure nor SaAdaIN. “U2-VC” represents the network has both U2-Net structure and SaAdaIN

From: U2-VC: one-shot voice conversion using two-level nested U-structure

  MOS (similarity) MOS (naturalness)
  SF2TF SF2TM SM2TF SM2TM Average SF2TF SF2TM SM2TF SM2TM Average
AGAIN-VC 3.50 3.38 3.00 3.30 3.30 3.25 3.38 3.00 3.63 3.32
w/o SaAdaIN, with U2-Net 3.13 3.15 3.13 3.13 3.14 3.25 3.75 3.13 3.70 3.44
w/o U2-Net, with SaAdaIN 3.25 3.25 3.00 3.25 3.19 3.50 3.37 3.10 3.67 3.41
U2-VC 3.63 3.38 3.39 3.69 3.53 4.00 4.13 3.69 3.91 3.93