Frequency-dependent auto-pooling function for weakly supervised sound event detection

EURASIP Journal on Audio, Speech, and Music Processing

Table 1 Model architecture

Block (filter, channels, dilated)	Output shape (channels × frequency × frame)
Input log-mel spectrogram	1×64×311
DDC-block(3×3, 32, 2)	32×64×311
DDC-block(3×3, 64, 2)	64×64×311
DDC-block(3×3, 128, 2)	128×64×311
DDC-block(3×3, 128, 2)	128×64×311
CNN (1×1, K, 1)	K×64×311
Global pooling function	K×1