Skip to main content

Table 3 Foreground detection results on utterance (bag) level approaches

From: Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices

  Divorce study Aging study
  Precision Recall F1 score AUC Precision Recall F1 score AUC
VGGish slimmer [11] 0.34 0.96 0.5 0.71 0.38 0.94 0.54 0.66
Log-Mels 0.78 0.68 0.72 0.8 0.67 0.73 0.7 0.81
SAD embeddings 0.82 0.82 0.82 0.87 0.77 0.81 0.79 0.87
MIL (Log-Mels) 0.71 0.7 0.7 0.79 0.64 0.66 0.65 0.77
MIL (SAD emb) 0.83* 0.79 0.81 0.86 0.66 0.83 0.73 0.85
  1. *McNemar’s Test, p 0.01