Skip to main content

Table 14 Training time per epoch (hours) and decoding real time factor (RTF) of all models for all languages

From: Advanced recurrent network-based hybrid acoustic models for low resource speech recognition

Model Training time per epoch (hours)   Decoding RTF
  101 Cantonese 104 Pashto 107 Vietnamese 202 Swahili 204 Tamil 302 Kazakh 404 Georgian  
DNN-fbank 2.35 1.32 1.22 0.73 1.04 0.69 0.70 1.65
LSTM-fbank 5.33 3.17 3.22 2.03 2.93 1.99 1.99 2.02
BLSTM-fbank 12.79 8.40 7.69 4.92 7.00 4.85 4.88 2.77
LW-BLSTM-fbank 8.01 4.71 4.83 3.06 4.37 3.03 3.04 2.31
LW-BrLSTM-fbank 9.08 5.26 5.47 3.40 4.99 3.43 3.48 2.41
LW-BGRU-fbank 7.10 4.11 4.21 2.75 3.83 2.69 2.72 2.27
LW-BrGRU-fbank 8.03 4.68 4.92 3.13 4.34 3.06 3.10 2.36
CNN-fbank [21] 7.07 4.61 4.22 3.70
CMNN-fbank [21] 7.09 4.62 4.30 3.13
RMNN-fbank [21] 4.78 2.93 3.00 2.19