Skip to main content

Table 4 Foreground localization results on frame (instance) level approaches on DS

From: Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices

MIL model % speech detected at 1% FAR Bag-level F1 score
Max pooling 93.7 0.76
Average pooling 88.0 0.74
Attention (softmax) 12.7 0.78
Attention (sigmoid) 68.5 0.78
Hybrid 90.1 0.73