Skip to main content


Figure 7 | EURASIP Journal on Audio, Speech, and Music Processing

Figure 7

From: Acoustic-visual synthesis technique using bimodal unit-selection

Figure 7

Visual trajectories. First visual principal component (in z-scored units) for the sentence ‘Le caractère de cette femme est moins calme’ when only acoustic join costs is minimized (a), only visual cost minimized (b); both acoustic and visual costs minimized using non-optimized weights (c); then using optimized weights without processing at the visual joins (d) and when synthesized using the optimized weights, after processing visual joins (e). Note the corrected details are marked with circles. (f) Original recorded trajectory (dashed) compared to the synthesized trajectory (solid) in (e). In (f), the duration of the diphones were adjusted to be able to make such comparison. Horizontal axes denote time in seconds. The boundaries between diphones are marked. Dashed lines indicate that the combination of the two diphones exists consecutively in the corpus and is extracted ‘as is’ from it, solid lines otherwise. SAMPA labels for diphones are shown.

Back to article page