Skip to main content

Table 2 Objective evaluation results of the ablation study on architecture in seen-to-seen conversion scenario. “AGAIN-VC” represents the network has neither U2-Net structure nor SaAdaIN. “U2-VC” represents the network has both U2-Net structure and SaAdaIN

From: U2-VC: one-shot voice conversion using two-level nested U-structure

  MCD (dB) Predicted MOS by NISQA
  SF2TF SF2TM SM2TF SM2TM Average SF2TF SF2TM SM2TF SM2TM Average
AGAIN-VC 6.33 6.07 6.32 6.33 6.26 3.87 3.63 3.93 4.02 3.86
w/o SaAdaIN, with U2-Net 6.35 6.13 6.36 6.42 6.32 3.97 3.88 3.96 4.02 3.96
w/o U2-Net, with SaAdaIN 6.34 6.04 6.23 6.31 6.23 4.01 3.83 3.99 3.99 3.96
U2-VC 6.36 6.11 6.32 6.39 6.29 4.13 3.93 4.14 4.05 4.06