Skip to main content

Table 6 Comparisons on LibriSpeech “test clean”

From: A new joint CTC-attention-based speech recognition model with multi-level multi-head attention

 

Methods

WER (%)

Referenced traditional systems

IBM CAPIO [40]

3.19

17-layer TDNN + iVectors [41]

3.80

Referenced end-to-end systems

End-to-end CNN on the waveform + conv LM [42]

3.44

Deep Speech 2 (extra 11,940 h of labeled English data) [34]

5.33

Our end-to-end systems

VGG + BLSTM + add attention + word-LM (baseline/J0)

4.3

High-level features + joint CTC-attention + word-LM (J1)

4.0

J1 + multi-level location-based attention (J2)

3.8

J1 + multi-head location-based attention (J3)

3.8

J1 + multi-level multi-head location-based attention (J4)

3.6