Fig. 6From: Improved capsule routing for weakly labeled sound event detectionThe proposed neural network structure, which consists of three parts. (1) Feature extraction: parallel convolution layer with different kernel sizes. (2) Capsule layer: the outputs of convolutional layers are fed into two capsule layers. (3) Recurrent layer: a bidirectional GRU and one FC layer are used to learn temporal context information and estimate event activity probabilitiesBack to article page