Skip to main content

Table 5 Model perplexity for particular language models computed on development data

From: Classification of heterogeneous text data for robust domain-specific language modeling

Similarity/weighting

tf-idf

Okapi

Ltu

In-domain data set

   

Bhattacharyya coefficient

14.1223

15.7542

17.2876

Jaccard correlation index

14.0815

14.8402

17.2872

Jensen-Shannon divergence

15.0343

15.4863

17.2878

Out-of-domain data set

   

Bhattacharyya coefficient

90.6770

25.7417

183.670

Jaccard correlation index

75.0398

20.7094

162.901

Jensen-Shannon divergence

99.8450

24.3595

187.167