From: Advanced recurrent network-based hybrid acoustic models for low resource speech recognition
Model | WER | ||||||
---|---|---|---|---|---|---|---|
101 Cantonese | 104 Pashto | 107 Vietnamese | 202 Swahili | 204 Tamil | 302 Kazakh | 404 Georgian | |
DNN-MBN | 36.1 | 44.2 | 44.7 | 38.9 | 61.3 | 48.8 | 45 |
LSTM-MBN | 35.7 | 44.9 | 45 | 39.6 | 61.3 | 49 | 45.2 |
BLSTM-MBN | 34.7 | 43.8 | 43.6 | 38 | 60.6 | 47.8 | 44 |
LW-BLSTM-MBN | 34.7 | 43.9 | 43.7 | 38 | 60.6 | 47.7 | 44 |
LW-BrLSTM-MBN | 34.4 | 43.5 | 43.1 | 37.4 | 60.1 | 47.2 | 43.3 |
LW-BGRU-MBN | 34.1 | 43.1 | 42.7 | 37 | 59.7 | 46.7 | 42.9 |
LW-BrGRU-MBN | 34.1 | 42.7 | 42.7 | 36.8 | 59.2 | 46.2 | 41.7 |
DNN-fbank | 44.8 | 51.2 | 53.1 | 46.2 | 66.7 | 54.1 | 50.5 |
LSTM-fbank | 40.7 | 50.5 | 47.8 | 42.5 | 65 | 52.9 | 48.9 |
BLSTM-fbank | 39.5 | 48.3 | 45.8 | 41 | 63.7 | 50.3 | 46.6 |
LW-BLSTM-fbank | 39.6 | 48.3 | 45.9 | 41.1 | 63.7 | 50.2 | 46.7 |
LW-BrLSTM-fbank | 39.2 | 47.9 | 45.3 | 40.6 | 62.9 | 49.5 | 45.7 |
LW-BGRU-fbank | 38.7 | 47.4 | 44.8 | 40 | 62.8 | 49 | 45.4 |
LW-BrGRU-fbank | 38.5 | 47 | 44.3 | 39.5 | 62.2 | 48.5 | 44.1 |
CNN-fbank [21] | 43.6 | 51.5 | 52.5 | – | 67.2 | – | – |
CMNN-fbank [21] | 41.7 | 49.3 | 49.9 | – | 64.2 | – | – |
RMNN-fbank [21] | 39 | 48.1 | 45.7 | – | 63.4 | – | – |
DNN + LW-BLSTM LW-BGRU + LW-BrGRU | 33 | 41.5 | 41.3 | 35.7 | 58.2 | 44.6 | 40.6 |
DNN + LW-BLSTM LW-BrLSTM + LW-BGRU + LW-BrGRU | 32.8 | 41.2 | 41 | 35.5 | 57.9 | 44.3 | 40.2 |