Skip to main content

Advertisement

Table 1 Structure of the NCP coprus [20]

From: Statistical analysis of orthographic and phonemic language corpus for word-based and phoneme-based Polish language modelling

Type of a text source Percentage of the NCP corpus size
Daily newspapers 50.0%
Classic literature 16.0%
Non-fiction literature 5.5%
Specialized periodicals and journals 5.5%
Scientific and educational texts 2.0%
Other written texts 3.0%
Other books 1.0%
Transcripts of conversations 10.0%
Internet texts 7.0%