From: Sound event triage: detecting sound events considering priority of classes
SED model | |
Network architecture | 3 CNN + 1 BiGRU + 2 FC |
# channels of CNN layers | 64, 64, 64 |
Filter size (\(T\times F\)) | 3\(\times\)3 |
Pooling size (\(T\times F\)) | 8\(\times\)1, 2\(\times\)1, 2\(\times\)1 (max pooling) |
# of units in BiGRU layer | 64 |
# of units in FC layers | 32 |
# of units in output layer | 10 |
MLPs for each \(\varvec{\mu }\) and \(\varvec{\sigma }\) | |
Network architecture | 3 FC |
# of units in FC layers | 64, 256, 128 |
# of units in output layer | 64 |
Optimizer | Adam [29] |
Activation functions | leaky ReLU |