Skip to main content

Table 5 Subjective evaluation results of the ablation study architecture in unseen-to-unseen conversion scenario. “AGAIN-VC” represents the network has neither U2-Net structure nor SaAdaIN. “U2-VC” represents the network has both U2-Net structure and SaAdaIN

From: U2-VC: one-shot voice conversion using two-level nested U-structure

 

MOS (similarity)

MOS (naturalness)

 

SF2TF

SF2TM

SM2TF

SM2TM

Average

SF2TF

SF2TM

SM2TF

SM2TM

Average

AGAIN-VC

3.13

2.89

2.75

3.00

2.94

3.00

3.00

3.13

3.50

3.16

w/o SaAdaIN, with U2-Net

3.25

3.00

2.63

3.13

3.00

3.50

3.13

3.50

3.62

3.44

w/o U2-Net, with SaAdaIN

3.13

3.13

3.00

3.10

3.09

3.25

3.00

3.15

3.62

3.26

U2-VC

3.25

3.25

3.23

3.13

3.22

3.75

3.88

3.80

3.88

3.83