Skip to main content

Table 13 WER (%) across all languages

From: Advanced recurrent network-based hybrid acoustic models for low resource speech recognition

Model WER  
  101 Cantonese 104 Pashto 107 Vietnamese 202 Swahili 204 Tamil 302 Kazakh 404 Georgian
DNN-MBN 36.1 44.2 44.7 38.9 61.3 48.8 45
LSTM-MBN 35.7 44.9 45 39.6 61.3 49 45.2
BLSTM-MBN 34.7 43.8 43.6 38 60.6 47.8 44
LW-BLSTM-MBN 34.7 43.9 43.7 38 60.6 47.7 44
LW-BrLSTM-MBN 34.4 43.5 43.1 37.4 60.1 47.2 43.3
LW-BGRU-MBN 34.1 43.1 42.7 37 59.7 46.7 42.9
LW-BrGRU-MBN 34.1 42.7 42.7 36.8 59.2 46.2 41.7
DNN-fbank 44.8 51.2 53.1 46.2 66.7 54.1 50.5
LSTM-fbank 40.7 50.5 47.8 42.5 65 52.9 48.9
BLSTM-fbank 39.5 48.3 45.8 41 63.7 50.3 46.6
LW-BLSTM-fbank 39.6 48.3 45.9 41.1 63.7 50.2 46.7
LW-BrLSTM-fbank 39.2 47.9 45.3 40.6 62.9 49.5 45.7
LW-BGRU-fbank 38.7 47.4 44.8 40 62.8 49 45.4
LW-BrGRU-fbank 38.5 47 44.3 39.5 62.2 48.5 44.1
CNN-fbank [21] 43.6 51.5 52.5 67.2
CMNN-fbank [21] 41.7 49.3 49.9 64.2
RMNN-fbank [21] 39 48.1 45.7 63.4
DNN + LW-BLSTM LW-BGRU + LW-BrGRU 33 41.5 41.3 35.7 58.2 44.6 40.6
DNN + LW-BLSTM LW-BrLSTM + LW-BGRU + LW-BrGRU 32.8 41.2 41 35.5 57.9 44.3 40.2
\