From: Classification of heterogeneous text data for robust domain-specific language modeling
Similarity/weighting | tf-idf | Okapi | Ltu |
---|---|---|---|
In-domain data set | |||
Bhattacharyya coefficient | 14.1223 | 15.7542 | 17.2876 |
Jaccard correlation index | 14.0815 | 14.8402 | 17.2872 |
Jensen-Shannon divergence | 15.0343 | 15.4863 | 17.2878 |
Out-of-domain data set | |||
Bhattacharyya coefficient | 90.6770 | 25.7417 | 183.670 |
Jaccard correlation index | 75.0398 | 20.7094 | 162.901 |
Jensen-Shannon divergence | 99.8450 | 24.3595 | 187.167 |