Skip to main content
Fig. 2 | EURASIP Journal on Audio, Speech, and Music Processing

Fig. 2

From: A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept

Fig. 2

Example of a song excerpt synthesised with transposed scores S0, S4, and S7. The phonemes from the lyrics phonetic transcription are represented below the input score S, together with their durations, which are (i) predicted from the lyrics by the NLP module when computing the singing prosodic target for the US block (see Fig. 1), or; (ii) those of the retrieved speech units when generating the expression controls. At the bottom, the phoneme durations have been time-scaled to fit the note durations. The crosses represent the F0 values of the singing prosodic targets obtained from S0, S4, and S7. The pitch contours (time-scaled) of the retrieved speech units are depicted as dashed grey lines. Finally, the solid blue lines represent the singing pitch contours generated by the expression control generation module. The score-driven US configuration Sxp and T=100 bpm have been used for this example

Back to article page