Skip to main content

Table 4 Subjective evaluation results of the ablation study on architecture in seen-to-seen conversion scenario. “AGAIN-VC” represents the network has neither U2-Net structure nor SaAdaIN. “U2-VC” represents the network has both U2-Net structure and SaAdaIN

From: U2-VC: one-shot voice conversion using two-level nested U-structure

 

MOS (similarity)

MOS (naturalness)

 

SF2TF

SF2TM

SM2TF

SM2TM

Average

SF2TF

SF2TM

SM2TF

SM2TM

Average

AGAIN-VC

3.50

3.38

3.00

3.30

3.30

3.25

3.38

3.00

3.63

3.32

w/o SaAdaIN, with U2-Net

3.13

3.15

3.13

3.13

3.14

3.25

3.75

3.13

3.70

3.44

w/o U2-Net, with SaAdaIN

3.25

3.25

3.00

3.25

3.19

3.50

3.37

3.10

3.67

3.41

U2-VC

3.63

3.38

3.39

3.69

3.53

4.00

4.13

3.69

3.91

3.93