Skip to main content
Figure 4 | EURASIP Journal on Audio, Speech, and Music Processing

Figure 4

From: Lip-Synching Using Speaker-Specific Articulation, Shape and Appearance Models

Figure 4

The phasing model of the PHMM predicts phasing relations between acoustic onsets of the phones (bottom) and onsets of context-dependent phone HMM that generate the frames of the gestural score (top). In this example, onsets of gestures characterizing the two last sounds are in advance compared to effective acoustics onsets. For instance an average delay between observed gestural and acoustic onset is computed and stored for each context-dependent phone HMM. This delay is optimized with an iterative procedure described in Section 4.3 and illustrated in Figure 5.

Back to article page