From: U2-VC: one-shot voice conversion using two-level nested U-structure
MOS (similarity) | MOS (naturalness) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
SF2TF | SF2TM | SM2TF | SM2TM | Average | SF2TF | SF2TM | SM2TF | SM2TM | Average | |
AGAIN-VC | 3.13 | 2.89 | 2.75 | 3.00 | 2.94 | 3.00 | 3.00 | 3.13 | 3.50 | 3.16 |
w/o SaAdaIN, with U2-Net | 3.25 | 3.00 | 2.63 | 3.13 | 3.00 | 3.50 | 3.13 | 3.50 | 3.62 | 3.44 |
w/o U2-Net, with SaAdaIN | 3.13 | 3.13 | 3.00 | 3.10 | 3.09 | 3.25 | 3.00 | 3.15 | 3.62 | 3.26 |
U2-VC | 3.25 | 3.25 | 3.23 | 3.13 | 3.22 | 3.75 | 3.88 | 3.80 | 3.88 | 3.83 |