From: Frequency-dependent auto-pooling function for weakly supervised sound event detection
Method | Parameters | Audio tagging | Sound event detection | Error rate | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
10 k | F-score | AUC | mAP | F-score | AUC | mAP | ER | D | I | ||
MIL | Attention [20] | 54.15 | 0.671 | 0.923 | 0.723 | 0.341 | 0.861 | 0.348 | 1.574 | 0.885 | 0.689 |
TALNet [19] | 94.06 | 0.646 | 0.911 | 0.687 | 0.397 | 0.849 | 0.390 | 1.339 | 0.865 | 0.474 | |
Source | VGG-GWRP [24] | 58.76 | 0.572 | 0.923 | 0.635 | 0.429 | 0.803 | 0.372 | 1.991 | 0.780 | 1.210 |
separation | VGG-AP | 58.76 | 0.538 | 0.909 | 0.639 | 0.352 | 0.823 | 0.362 | 1.886 | 0.844 | 1.061 |
-based | VGG-FAP | 58.76 | 0.590 | 0.923 | 0.672 | 0.407 | 0.848 | 0.385 | 1.776 | 0.823 | 0.952 |
DDC-GWRP | 28.84 | 0.626 | 0.931 | 0.689 | 0.468 | 0.808 | 0.404 | 1.850 | 0.813 | 1.037 | |
DDC-AP | 28.84 | 0.573 | 0.919 | 0.684 | 0.382 | 0.845 | 0.398 | 1.831 | 0.853 | 0.978 | |
DDC-FAP | 29.10 | 0.633 | 0.931 | 0.719 | 0.446 | 0.868 | 0.427 | 1.689 | 0.845 | 0.844 |