Skip to main content
Figure 5 | EURASIP Journal on Audio, Speech, and Music Processing

Figure 5

From: Lip-Synching Using Speaker-Specific Articulation, Shape and Appearance Models

Figure 5

Training consists in iteratively refining the context-dependent phasing model and HMMs (plain lines and dark blocks). The phasing model computes the average delay between acoustic boundaries and HMM boundaries obtained by aligning current context-dependent HMMs with training utterances. Synthesis simply consists in forced alignment of selected HMMs with boundaries predicted by the phasing model (dotted lines and light blocks).

Back to article page