From: U2-VC: one-shot voice conversion using two-level nested U-structure
MOS (similarity) | MOS (naturalness) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
SF2TF | SF2TM | SM2TF | SM2TM | Average | SF2TF | SF2TM | SM2TF | SM2TM | Average | |
AGAIN-VC | 3.50 | 3.38 | 3.00 | 3.30 | 3.30 | 3.25 | 3.38 | 3.00 | 3.63 | 3.32 |
w/o SaAdaIN, with U2-Net | 3.13 | 3.15 | 3.13 | 3.13 | 3.14 | 3.25 | 3.75 | 3.13 | 3.70 | 3.44 |
w/o U2-Net, with SaAdaIN | 3.25 | 3.25 | 3.00 | 3.25 | 3.19 | 3.50 | 3.37 | 3.10 | 3.67 | 3.41 |
U2-VC | 3.63 | 3.38 | 3.39 | 3.69 | 3.53 | 4.00 | 4.13 | 3.69 | 3.91 | 3.93 |