Skip to main content

Table 2 Comparisons on the TIMIT core test set

From: A new joint CTC-attention-based speech recognition model with multi-level multi-head attention

 

Methods

PER (%)

Referenced traditional systems

Kaldi’s DNN-HMM

18.5

Hierarchical maxout CNN [26]

16.5

Referenced end-to-end systems

Hierarchical CNNs with CTC [27]

18.2

RNN transducer initialized with CTC + weight noise [28]

17.7

fMLLR + attention + weight noise [3]

17.6

fMLLR + RNN + CRF [29]

17.3

Our end-to-end systems

Transferred high-level features + joint CTC-attention + RNN-LM (P0) [17]

16.59

P0 + multi-level location-based attention (P1)

16.42

P0 + multi-head location-based attention (P2)

16.51

P0 + multi-level multi-head location-based attention (P3)

16.34