From: Frequency-dependent auto-pooling function for weakly supervised sound event detection
Block (filter, channels, dilated) | Output shape (channels × frequency × frame) |
---|---|
Input log-mel spectrogram | 1×64×311 |
DDC-block(3×3, 32, 2) | 32×64×311 |
DDC-block(3×3, 64, 2) | 64×64×311 |
DDC-block(3×3, 128, 2) | 128×64×311 |
DDC-block(3×3, 128, 2) | 128×64×311 |
CNN (1×1, K, 1) | K×64×311 |
Global pooling function | K×1 |