EURASIP Journal on Audio, Speech, and Music Processing

Table 1 This table shows the UERs for the different language models: without FP loops, with FP loops and with FP positions, and different acoustic models: trained on native speech (triphone) and retrained on non-native speech (triphone and monophone). All setups used the baseline canonical lexicon. The columns 0, 5, 10, 15 indicate at what phonetic distance to the reference transcription the decoding result is classified as correct.

From: Optimizing Automatic Speech Recognition for Low-Proficient Non-Native Speakers

AM	LM	0	5	10	15
Native (tri)	without loops	28.9	28.4	26.1	24.6
Native (tri)	with loops	14.9	14.6	12.6	11.0
Native (tri)	with positions	14.7	14.4	13.1	12.0
Non-native(tri)	without loops	22.4	22.0	19.9	18.4
Non-native(tri)	with loops	10.0	9.7	7.9	6.9
Non-native(tri)	with positions	9.4	9.1	7.8	7.1
Non-native(mono)	with loops	11.9	11.5	9.3	8.1

Back to article page