Skip to main content

Table 1 Model architecture

From: Frequency-dependent auto-pooling function for weakly supervised sound event detection

Block (filter, channels, dilated) Output shape (channels × frequency × frame)
Input log-mel spectrogram 1×64×311
DDC-block(3×3, 32, 2) 32×64×311
DDC-block(3×3, 64, 2) 64×64×311
DDC-block(3×3, 128, 2) 128×64×311
DDC-block(3×3, 128, 2) 128×64×311
CNN (1×1, K, 1) K×64×311
Global pooling function K×1