Skip to main content

Table 1 Different MIL pooling methods for FG localization where, \(\bar {y} = [y_{1}, y_{2},..., y_{N}]\)

From: Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices

Pooling method

Pooling operation

Instance label

Max pooling

y=max(yi)

yi

Average pooling

\(y = 1/N \sum _{i} y_{i}\)

yi

Attention (softmax)

\(y = \sum _{i} a_{i} y_{i} a_{i} = softmax(w^{T} f(\bar {y})) f(\bar {y}) = tanh(V \bar {y}^{T})\odot sigmoid(U \bar {y}^{T})\)

aiyi

Attention (sigmoid)

\(y = 1/N \sum _{i} a_{i} y_{i} a_{i} = sigmoid(w^{T} f(\bar {y}))\)

aiyi

Hybrid (attention + max pooling)

\(y = max (a_{i} y_{i}) a_{i} = sigmoid(w^{T} f(\bar {y}))\)

aiyi