Skip to main content

Table 1 Model architecture

From: Frequency-dependent auto-pooling function for weakly supervised sound event detection

Block (filter, channels, dilated)

Output shape (channels × frequency × frame)

Input log-mel spectrogram

1×64×311

DDC-block(3×3, 32, 2)

32×64×311

DDC-block(3×3, 64, 2)

64×64×311

DDC-block(3×3, 128, 2)

128×64×311

DDC-block(3×3, 128, 2)

128×64×311

CNN (1×1, K, 1)

K×64×311

Global pooling function

K×1