From: Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling
Tokenization
Minimum
Maximum
Mean
Word
5
14
6.4
Morfessor
6
29
11.7
BPE
26
8.5
Unigram
10.1
Syllable
8
49
19.9
S-BPE
25
8.1