Skip to main content

Table 3 Test WER results of different models for online recognition on WSJ dataset

From: Segment boundary detection directed attention for online end-to-end speech recognition

Model

#Param (M)

WER (%)

Hard alignment with RL [14]

2.7

27.0

CTC [52]

–

22.7

Hard monotonic attention [16]

2.9

17.4

MoChA [17]

3.0

15.0 ±0.6

Soft attention* [17]

2.9

14.6 ±0.3

Proposed method (BPE)

6.6

15.5 ±0.4

Soft attention (BPE)*

5.4

15.3 ±0.3

Soft attention bigger-E (BPE)*

6.7

15.1 ±0.3

Soft attention bigger-D (BPE)*

6.7

15.0 ±0.3

Proposed method (Char)

6.6

22.2 ±0.5

Soft attention bigger-D (Char)*

6.7

16.3 ±0.3

  1. All the models use a unidirectional encoder and * indicates offline attention model