Skip to main content

Table 14 Training time per epoch (hours) and decoding real time factor (RTF) of all models for all languages

From: Advanced recurrent network-based hybrid acoustic models for low resource speech recognition

Model

Training time per epoch (hours)

 

Decoding RTF

 

101 Cantonese

104 Pashto

107 Vietnamese

202 Swahili

204 Tamil

302 Kazakh

404 Georgian

 

DNN-fbank

2.35

1.32

1.22

0.73

1.04

0.69

0.70

1.65

LSTM-fbank

5.33

3.17

3.22

2.03

2.93

1.99

1.99

2.02

BLSTM-fbank

12.79

8.40

7.69

4.92

7.00

4.85

4.88

2.77

LW-BLSTM-fbank

8.01

4.71

4.83

3.06

4.37

3.03

3.04

2.31

LW-BrLSTM-fbank

9.08

5.26

5.47

3.40

4.99

3.43

3.48

2.41

LW-BGRU-fbank

7.10

4.11

4.21

2.75

3.83

2.69

2.72

2.27

LW-BrGRU-fbank

8.03

4.68

4.92

3.13

4.34

3.06

3.10

2.36

CNN-fbank [21]

7.07

4.61

4.22

–

3.70

–

–

–

CMNN-fbank [21]

7.09

4.62

4.30

–

3.13

–

–

–

RMNN-fbank [21]

4.78

2.93

3.00

–

2.19

–

–

–