Skip to main content

Table 1 Experimental setup parameters

From: Exploiting spectro-temporal locality in deep learning based acoustic event detection

Deep neural networks settings

FFT resolutions

10 ms (129 bins), 20 ms (257 bins)

(multi-resolution)

30 ms (257 bins), 40 ms (513 bins)

 

50 ms (513 bins), 60 ms (513 bins)

Patch lengths

10, 20, and 30 frames

Convolutional neural networks settings

Filter shapes (CNN)

5× 5, 7× 7, 9× 9 (bins × frames)

Number of filters (CNN)

10, 20, and 40 filters

Pooling (CNN)

1×1 (no pooling)

 

2× 1 (frequency pooling)

 

1× 2 (time pooling)

 

2× 2 (both axes)