Skip to main content

Table 1 This table shows the UERs for the different language models: without FP loops, with FP loops and with FP positions, and different acoustic models: trained on native speech (triphone) and retrained on non-native speech (triphone and monophone). All setups used the baseline canonical lexicon. The columns 0, 5, 10, 15 indicate at what phonetic distance to the reference transcription the decoding result is classified as correct.

From: Optimizing Automatic Speech Recognition for Low-Proficient Non-Native Speakers

AM

LM

0

5

10

15

Native (tri)

without loops

28.9

28.4

26.1

24.6

Native (tri)

with loops

14.9

14.6

12.6

11.0

Native (tri)

with positions

14.7

14.4

13.1

12.0

Non-native(tri)

without loops

22.4

22.0

19.9

18.4

Non-native(tri)

with loops

10.0

9.7

7.9

6.9

Non-native(tri)

with positions

9.4

9.1

7.8

7.1

Non-native(mono)

with loops

11.9

11.5

9.3

8.1