From: Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling
Segmentation
Lexicon size
Word
79,947
Morfessor
10,545
BPE
9986
Unigram
19,564
Syllable
6279
S-BPE
15,926