Skip to main content

Table 1 Structure of the NCP coprus [20]

From: Statistical analysis of orthographic and phonemic language corpus for word-based and phoneme-based Polish language modelling

Type of a text source

Percentage of the NCP corpus size

Daily newspapers

50.0%

Classic literature

16.0%

Non-fiction literature

5.5%

Specialized periodicals and journals

5.5%

Scientific and educational texts

2.0%

Other written texts

3.0%

Other books

1.0%

Transcripts of conversations

10.0%

Internet texts

7.0%