From: Segment boundary detection directed attention for online end-to-end speech recognition
Model | #Param (M) | WER (%) |
---|---|---|
Hard alignment with RL [14] | 2.7 | 27.0 |
CTC [52] | – | 22.7 |
Hard monotonic attention [16] | 2.9 | 17.4 |
MoChA [17] | 3.0 | 15.0 ±0.6 |
Soft attention* [17] | 2.9 | 14.6 ±0.3 |
Proposed method (BPE) | 6.6 | 15.5 ±0.4 |
Soft attention (BPE)* | 5.4 | 15.3 ±0.3 |
Soft attention bigger-E (BPE)* | 6.7 | 15.1 ±0.3 |
Soft attention bigger-D (BPE)* | 6.7 | 15.0 ±0.3 |
Proposed method (Char) | 6.6 | 22.2 ±0.5 |
Soft attention bigger-D (Char)* | 6.7 | 16.3 ±0.3 |