Skip to main content

Table 13 WER (%) across all languages

From: Advanced recurrent network-based hybrid acoustic models for low resource speech recognition

Model

WER

 
 

101 Cantonese

104 Pashto

107 Vietnamese

202 Swahili

204 Tamil

302 Kazakh

404 Georgian

DNN-MBN

36.1

44.2

44.7

38.9

61.3

48.8

45

LSTM-MBN

35.7

44.9

45

39.6

61.3

49

45.2

BLSTM-MBN

34.7

43.8

43.6

38

60.6

47.8

44

LW-BLSTM-MBN

34.7

43.9

43.7

38

60.6

47.7

44

LW-BrLSTM-MBN

34.4

43.5

43.1

37.4

60.1

47.2

43.3

LW-BGRU-MBN

34.1

43.1

42.7

37

59.7

46.7

42.9

LW-BrGRU-MBN

34.1

42.7

42.7

36.8

59.2

46.2

41.7

DNN-fbank

44.8

51.2

53.1

46.2

66.7

54.1

50.5

LSTM-fbank

40.7

50.5

47.8

42.5

65

52.9

48.9

BLSTM-fbank

39.5

48.3

45.8

41

63.7

50.3

46.6

LW-BLSTM-fbank

39.6

48.3

45.9

41.1

63.7

50.2

46.7

LW-BrLSTM-fbank

39.2

47.9

45.3

40.6

62.9

49.5

45.7

LW-BGRU-fbank

38.7

47.4

44.8

40

62.8

49

45.4

LW-BrGRU-fbank

38.5

47

44.3

39.5

62.2

48.5

44.1

CNN-fbank [21]

43.6

51.5

52.5

–

67.2

–

–

CMNN-fbank [21]

41.7

49.3

49.9

–

64.2

–

–

RMNN-fbank [21]

39

48.1

45.7

–

63.4

–

–

DNN + LW-BLSTM LW-BGRU + LW-BrGRU

33

41.5

41.3

35.7

58.2

44.6

40.6

DNN + LW-BLSTM LW-BrLSTM + LW-BGRU + LW-BrGRU

32.8

41.2

41

35.5

57.9

44.3

40.2