Skip to main content

Table 7 Details of the phonemic language corpus content

From: Statistical analysis of orthographic and phonemic language corpus for word-based and phoneme-based Polish language modelling

No.

Component type

No. of unique

No. of components

  

components

in the corpus

1

single phonemes

37

1,263,248,497

2

2-phoneme sequences

1096

1,032,922,921

3

3-phoneme sequences

17,340

823,393,519

4

4-phoneme sequences

128,766

644,597,673

5

5-phoneme sequences

402,529

483,987,550