Skip to main content

Table 6 Differences in the text processing and language modeling during the recent time periods

From: Classification of heterogeneous text data for robust domain-specific language modeling

 

Period

 

Dec 2011

Jul 2012

Dec 2012

Apr 2013

May 2013

No. of pronunciation variants

475,156

475,357

474,456

474,453

474,453

No. of unique word forms

326,299

326,295

325,555

325,555

325,555

No. of words under classes

97,471

97,680

97,678

97,678

97,678

No. of classes of words

20

22

22

22

22

No. of transparent words

4

5

5

5

5

Vocabulary extension

-

Word classes extension

-

-

-

Adding new text data

-

-

Additional text processing

-

Filled pause modeling

-

New text classification

-

-

-

  1. • Change was performed.