Skip to main content

Advertisement

Table 1 Experimental setup parameters

From: Exploiting spectro-temporal locality in deep learning based acoustic event detection

Deep neural networks settings
FFT resolutions 10 ms (129 bins), 20 ms (257 bins)
(multi-resolution) 30 ms (257 bins), 40 ms (513 bins)
  50 ms (513 bins), 60 ms (513 bins)
Patch lengths 10, 20, and 30 frames
Convolutional neural networks settings
Filter shapes (CNN) 5× 5, 7× 7, 9× 9 (bins × frames)
Number of filters (CNN) 10, 20, and 40 filters
Pooling (CNN) 1×1 (no pooling)
  2× 1 (frequency pooling)
  1× 2 (time pooling)
  2× 2 (both axes)