Skip to main content

Table 4 PERs of different speech recognition systems on TIMIT core set

From: Towards end-to-end speech recognition with transfer learning

System

PER (%)

Kaldi’s DNN-HMM

18.5

Hierarchical maxout CNN [30]

16.5

Raw speech + WaveNet [19]

18.8

filterbank + CTC + weight noise [31]

18.4

hierarchical CNNs with CTC [32]

18.2

Raw speech + complex ConvNets [33]

18.0

RNN transducer initialized with CTC + weight noise [31]

17.7

fMLLR + Attention + weight noise [6]

17.6

fMLLR + RNN + CRF [14]

17.3

filterbank + CTC5

18.66

filterbank + att4

20.49

filterbank + CTC4 + att4

19.85

filterbank + CTC5 + att4

19.14

4langAdaptBN + CTC2 + att2

18.63

4langAdaptBN + CTC3 + att2

18.28

4langAdaptCNMF + CTC2 + att2

17.70

4langAdaptCNMF + CTC3 + att2

16.96

4langAdaptCNMF + CTC3 + att2 + RNN-LM

16.59