Skip to main content

Table 3 Foreground detection results on utterance (bag) level approaches

From: Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices

 

Divorce study

Aging study

 

Precision

Recall

F1 score

AUC

Precision

Recall

F1 score

AUC

VGGish slimmer [11]

0.34

0.96

0.5

0.71

0.38

0.94

0.54

0.66

Log-Mels

0.78

0.68

0.72

0.8

0.67

0.73

0.7

0.81

SAD embeddings

0.82

0.82

0.82

0.87

0.77

0.81

0.79

0.87

MIL (Log-Mels)

0.71

0.7

0.7

0.79

0.64

0.66

0.65

0.77

MIL (SAD emb)

0.83*

0.79

0.81

0.86

0.66

0.83

0.73

0.85

  1. *McNemar’s Test, p ≪0.01