Skip to main content

Table 2 Objective evaluation results of the ablation study on architecture in seen-to-seen conversion scenario. “AGAIN-VC” represents the network has neither U2-Net structure nor SaAdaIN. “U2-VC” represents the network has both U2-Net structure and SaAdaIN

From: U2-VC: one-shot voice conversion using two-level nested U-structure

 

MCD (dB)

Predicted MOS by NISQA

 

SF2TF

SF2TM

SM2TF

SM2TM

Average

SF2TF

SF2TM

SM2TF

SM2TM

Average

AGAIN-VC

6.33

6.07

6.32

6.33

6.26

3.87

3.63

3.93

4.02

3.86

w/o SaAdaIN, with U2-Net

6.35

6.13

6.36

6.42

6.32

3.97

3.88

3.96

4.02

3.96

w/o U2-Net, with SaAdaIN

6.34

6.04

6.23

6.31

6.23

4.01

3.83

3.99

3.99

3.96

U2-VC

6.36

6.11

6.32

6.39

6.29

4.13

3.93

4.14

4.05

4.06