From: Classification of heterogeneous text data for robust domain-specific language modeling
Period | |||||
---|---|---|---|---|---|
Dec 2011 | Jul 2012 | Dec 2012 | Apr 2013 | May 2013 | |
No. of pronunciation variants | 475,156 | 475,357 | 474,456 | 474,453 | 474,453 |
No. of unique word forms | 326,299 | 326,295 | 325,555 | 325,555 | 325,555 |
No. of words under classes | 97,471 | 97,680 | 97,678 | 97,678 | 97,678 |
No. of classes of words | 20 | 22 | 22 | 22 | 22 |
No. of transparent words | 4 | 5 | 5 | 5 | 5 |
Vocabulary extension | • | • | • | • | - |
Word classes extension | • | • | - | - | - |
Adding new text data | • | - | - | • | • |
Additional text processing | • | - | • | • | • |
Filled pause modeling | - | • | • | • | • |
New text classification | • | - | - | - | • |