Skip to main content

Advertisement

Table 7 Details of the phonemic language corpus content

From: Statistical analysis of orthographic and phonemic language corpus for word-based and phoneme-based Polish language modelling

No. Component type No. of unique No. of components
   components in the corpus
1 single phonemes 37 1,263,248,497
2 2-phoneme sequences 1096 1,032,922,921
3 3-phoneme sequences 17,340 823,393,519
4 4-phoneme sequences 128,766 644,597,673
5 5-phoneme sequences 402,529 483,987,550