Skip to main content

Table 5 Subjective evaluation results of the ablation study architecture in unseen-to-unseen conversion scenario. “AGAIN-VC” represents the network has neither U2-Net structure nor SaAdaIN. “U2-VC” represents the network has both U2-Net structure and SaAdaIN

From: U2-VC: one-shot voice conversion using two-level nested U-structure

  MOS (similarity) MOS (naturalness)
  SF2TF SF2TM SM2TF SM2TM Average SF2TF SF2TM SM2TF SM2TM Average
AGAIN-VC 3.13 2.89 2.75 3.00 2.94 3.00 3.00 3.13 3.50 3.16
w/o SaAdaIN, with U2-Net 3.25 3.00 2.63 3.13 3.00 3.50 3.13 3.50 3.62 3.44
w/o U2-Net, with SaAdaIN 3.13 3.13 3.00 3.10 3.09 3.25 3.00 3.15 3.62 3.26
U2-VC 3.25 3.25 3.23 3.13 3.22 3.75 3.88 3.80 3.88 3.83