EURASIP Journal on Audio, Speech, and Music Processing

Table 3 Comparison between the SpeechDat telephone quality (TF), SpeeCon narrow-band (NB) and SpeeCon wide-band (WB) recognisers. Results are given in terms of correct frames % for phonemes (ph) and visemes (vi), and accuracy.

From: SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support

Database	SpeechDat	SpeeCon
Data size (ca. hours)	200	40
Speakers (#)	5000	550
Speech quality	TF	NB	WB
Sampling (kHz)	8	8	16
Correct frames (% ph)	54.2	65.2	68.7
Correct frames (% vi)	59.3	69.0	74.5
Accuracy (% ph)	56.5	62.2	63.2

Back to article page