Skip to main content

Table 1 This table shows the UERs for the different language models: without FP loops, with FP loops and with FP positions, and different acoustic models: trained on native speech (triphone) and retrained on non-native speech (triphone and monophone). All setups used the baseline canonical lexicon. The columns 0, 5, 10, 15 indicate at what phonetic distance to the reference transcription the decoding result is classified as correct.

From: Optimizing Automatic Speech Recognition for Low-Proficient Non-Native Speakers

AM LM 0 5 10 15
Native (tri) without loops 28.9 28.4 26.1 24.6
Native (tri) with loops 14.9 14.6 12.6 11.0
Native (tri) with positions 14.7 14.4 13.1 12.0
Non-native(tri) without loops 22.4 22.0 19.9 18.4
Non-native(tri) with loops 10.0 9.7 7.9 6.9
Non-native(tri) with positions 9.4 9.1 7.8 7.1
Non-native(mono) with loops 11.9 11.5 9.3 8.1