From: Classification of heterogeneous text data for robust domain-specific language modeling
APD1+APD2 250 h | APD1+APD2 250 h | APD1+APD2 | APD1+APD2 | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Text | (table mic.) | (close-talk mic.) | +PAR 340 h | +PAR+BN 520 h | ||||||
PPL | classification | sp. adapt.: no | sp. adapt.: no | sp. adapt.: no | sp. adapt.: no | |||||
eval. set: gender-bal. | eval. set: gender-bal. | eval. set: gender-bal. | eval. set: gender-bal. | |||||||
Weighting | Similarity | Acc % | Corr % | Acc % | Corr % | Acc % | Corr % | Acc % | Corr % | |
40.4302 | Reference language model | 91.84 | 93.08 | 93.61 | 94.51 | 94.36 | 95.13 | 94.06 | 94.89 | |
36.0428 | tf-idf | Bhattacharyya | 92.44 | 93.64 | 93.99 | 94.85 | 94.70 | 95.46 | 94.36 | 95.13 |
35.9444 | Jaccard index | 92.46 | 93.65 | 93.97 | 94.85 | 94.72 | 95.47 | 94.37 | 95.16 | |
38.1756 | Jensen-Shannon | 92.23 | 93.39 | 93.78 | 94.70 | 94.50 | 95.25 | 94.21 | 94.99 | |
38.1289 | Okapi | Bhattacharyya | 92.17 | 93.34 | 93.77 | 94.65 | 94.61 | 95.34 | 94.27 | 95.02 |
39.9782 | Jaccard index | 92.10 | 93.31 | 93.60 | 94.54 | 94.48 | 95.21 | 94.11 | 94.89 | |
39.2267 | Jensen-Shannon | 92.27 | 93.42 | 93.77 | 94.67 | 94.61 | 95.36 | 94.18 | 94.95 | |
40.1325 | Ltu | Bhattacharyya | 91.86 | 93.12 | 93.57 | 94.51 | 94.42 | 95.16 | 94.05 | 94.87 |
40.1439 | Jaccard index | 91.87 | 93.12 | 93.56 | 94.50 | 94.40 | 95.16 | 94.04 | 94.87 | |
40.1319 | Jensen-Shannon | 91.87 | 93.12 | 93.57 | 94.51 | 94.42 | 95.16 | 94.05 | 94.87 |