Skip to main content
Fig. 6 | EURASIP Journal on Audio, Speech, and Music Processing

Fig. 6

From: Improved capsule routing for weakly labeled sound event detection

Fig. 6

The proposed neural network structure, which consists of three parts. (1) Feature extraction: parallel convolution layer with different kernel sizes. (2) Capsule layer: the outputs of convolutional layers are fed into two capsule layers. (3) Recurrent layer: a bidirectional GRU and one FC layer are used to learn temporal context information and estimate event activity probabilities

Back to article page