Skip to main content

Table 1 Different MIL pooling methods for FG localization where, \(\bar {y} = [y_{1}, y_{2},..., y_{N}]\)

From: Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices

Pooling method Pooling operation Instance label
Max pooling y=max(yi) yi
Average pooling \(y = 1/N \sum _{i} y_{i}\) yi
Attention (softmax) \(y = \sum _{i} a_{i} y_{i} a_{i} = softmax(w^{T} f(\bar {y})) f(\bar {y}) = tanh(V \bar {y}^{T})\odot sigmoid(U \bar {y}^{T})\) aiyi
Attention (sigmoid) \(y = 1/N \sum _{i} a_{i} y_{i} a_{i} = sigmoid(w^{T} f(\bar {y}))\) aiyi
Hybrid (attention + max pooling) \(y = max (a_{i} y_{i}) a_{i} = sigmoid(w^{T} f(\bar {y}))\) aiyi