Skip to main content
Fig. 10 | EURASIP Journal on Audio, Speech, and Music Processing

Fig. 10

From: Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation

Fig. 10

MUSHRA naturalness scores for all single-speaker and multi-speaker models. M-MN: TTS model trained with 12 h of target language data; M-MN 30: TTS model trained from scratch with only 30 min of target language data; M SEJ: sequentially trained single-speaker model; M MEJ: simultaneously trained multi-speaker model; TL: cross-lingual transfer learning; DA: data augmentation; DA D: data augmentation method with additional fine-tuning

Back to article page