Skip to main content

Table 3 Objective evaluation results of the ablation study on architecture in unseen-to-unseen conversion scenario. “AGAIN-VC” represents the network has neither U2-Net structure nor SaAdaIN. “U2-VC” represents the network has both U2-Net structure and SaAdaIN

From: U2-VC: one-shot voice conversion using two-level nested U-structure

 

MCD (dB)

Predicted MOS by NISQA

 

SF2TF

SF2TM

SM2TF

SM2TM

Average

SF2TF

SF2TM

SM2TF

SM2TM

Average

AGAIN-VC

5.95

6.03

5.96

6.02

5.99

3.71

3.75

3.82

3.93

3.80

w/o SaAdaIN, with U2-Net

6.11

6.16

6.19

6.20

6.17

3.85

3.91

3.81

3.97

3.89

w/o U2-Net, with SaAdaIN

6.01

6.02

5.96

6.01

6.05

3.88

3.74

3.83

3.89

3.84

U2-VC

6.01

6.09

6.02

6.03

6.04

4.00

3.95

3.85

3.97

3.94