From: Exploiting spectro-temporal locality in deep learning based acoustic event detection
System | Frame-score | AED-acc |
---|---|---|
Best single resolution DNN model [8] | ||
20 frames/patch, 10 ms frames | 69.80 % | 54.82 % |
Multi-resolution DNN models | ||
10 frames/patch | 71.80 % | 56.95 % |
20 frames/patch | 72.54 % | 57.03 % |
30 frames/patch | 70.15 % | 54.01 % |
Best performing CNN models | ||
No pool., 9× 9 filters (40), 30 fr./patch | 76.41 % | 61.38 % |
1× 2 pool., 9× 9 filters (20), 30 fr./patch | 75.20 % | 60.85 % |
1× 2 pool., 5× 5 filters (20), 30 fr./patch | 75.11 % | 60.85 % |