Skip to main content

Advertisement

Table 5 Model perplexity for particular language models computed on development data

From: Classification of heterogeneous text data for robust domain-specific language modeling

Similarity/weighting tf-idf Okapi Ltu
In-domain data set    
Bhattacharyya coefficient 14.1223 15.7542 17.2876
Jaccard correlation index 14.0815 14.8402 17.2872
Jensen-Shannon divergence 15.0343 15.4863 17.2878
Out-of-domain data set    
Bhattacharyya coefficient 90.6770 25.7417 183.670
Jaccard correlation index 75.0398 20.7094 162.901
Jensen-Shannon divergence 99.8450 24.3595 187.167