From: Towards end-to-end speech recognition with transfer learning
System | PER (%) |
---|---|
Kaldi’s DNN-HMM | 18.5 |
Hierarchical maxout CNN [30] | 16.5 |
Raw speech + WaveNet [19] | 18.8 |
filterbank + CTC + weight noise [31] | 18.4 |
hierarchical CNNs with CTC [32] | 18.2 |
Raw speech + complex ConvNets [33] | 18.0 |
RNN transducer initialized with CTC + weight noise [31] | 17.7 |
fMLLR + Attention + weight noise [6] | 17.6 |
fMLLR + RNN + CRF [14] | 17.3 |
filterbank + CTC5 | 18.66 |
filterbank + att4 | 20.49 |
filterbank + CTC4 + att4 | 19.85 |
filterbank + CTC5 + att4 | 19.14 |
4langAdaptBN + CTC2 + att2 | 18.63 |
4langAdaptBN + CTC3 + att2 | 18.28 |
4langAdaptCNMF + CTC2 + att2 | 17.70 |
4langAdaptCNMF + CTC3 + att2 | 16.96 |
4langAdaptCNMF + CTC3 + att2 + RNN-LM | 16.59 |