Skip to main content

Table 4 Foreground localization results on frame (instance) level approaches on DS

From: Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices

MIL model

% speech detected at 1% FAR

Bag-level F1 score

Max pooling

93.7

0.76

Average pooling

88.0

0.74

Attention (softmax)

12.7

0.78

Attention (sigmoid)

68.5

0.78

Hybrid

90.1

0.73