From: A new joint CTC-attention-based speech recognition model with multi-level multi-head attention
 | Methods | PER (%) |
---|---|---|
Referenced traditional systems | Kaldi’s DNN-HMM | 18.5 |
Hierarchical maxout CNN [26] | 16.5 | |
Referenced end-to-end systems | Hierarchical CNNs with CTC [27] | 18.2 |
RNN transducer initialized with CTC + weight noise [28] | 17.7 | |
fMLLR + attention + weight noise [3] | 17.6 | |
fMLLR + RNN + CRF [29] | 17.3 | |
Our end-to-end systems | Transferred high-level features + joint CTC-attention + RNN-LM (P0) [17] | 16.59 |
P0 + multi-level location-based attention (P1) | 16.42 | |
P0 + multi-head location-based attention (P2) | 16.51 | |
P0 + multi-level multi-head location-based attention (P3) | 16.34 |