Skip to main content

Table 12 Comparing the WER and model size of each subword tokenization method, at n-gram \(=\) 3. The relative reduction with respect to the baseline word model is also shown in percentage

From: Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling

Segmentation

WER (%)

Model size (MB)

Word (baseline)

27.4

 

123

 

Morfessor

11.7

\(\downarrow 57\%\)

104

\(\downarrow 15\%\)

BPE

13.7

\(\downarrow 50\%\)

90

\(\downarrow 26\%\)

Unigram

12.6

\(\downarrow 54\%\)

108

\(\downarrow 12\%\)

Syllable

14.7

\(\downarrow 46\%\)

94

\(\downarrow 23\%\)

S-BPE

11.4

\(\downarrow 58\%\)

110

\(\downarrow 11\%\)