From: Room-localized speech activity detection in multi-microphone smart homes
Method | DIRHA-sim | DIRHA-real | |||||
---|---|---|---|---|---|---|---|
Recall | Precision | F-score | Recall | Precision | F-score | ||
Single-stage | Best RI | 92.22 | 19.49 | 32.18 | 92.71 | 16.25 | 27.66 |
MFCC/GMM | 89.87 | 41.60 | 56.87 | 88.02 | 57.06 | 69.24 | |
Sohn’s | 73.17 | 17.33 | 28.02 | 73.40 | 17.71 | 28.53 | |
Two-stage | MFCC/GMM | 72.07 | 61.08 | 66.12 | 78.94 | 76.87 | 77.89 |
baselines | Sohn’s | 43.14 | 21.96 | 29.11 | 46.39 | 22.26 | 30.08 |
Proposed | Seg (R=5) | 82.16 | 77.35 | 79.68 | 88.27 | 89.30 | 88.78 |
Win (R=5) | 83.09 | 78.96 | 80.98 | 86.51 | 88.87 | 87.68 | |
Win (R=4) | 84.65 | 86.10 | 85.37 | 86.51 | 94.03 | 90.11 |