EURASIP Journal on Audio, Speech, and Music Processing

Table 3 Foreground detection results on utterance (bag) level approaches

From: Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices

	Divorce study				Aging study
	Precision	Recall	F1 score	AUC	Precision	Recall	F1 score	AUC
VGGish slimmer [11]	0.34	0.96	0.5	0.71	0.38	0.94	0.54	0.66
Log-Mels	0.78	0.68	0.72	0.8	0.67	0.73	0.7	0.81
SAD embeddings	0.82	0.82	0.82	0.87	0.77	0.81	0.79	0.87
MIL (Log-Mels)	0.71	0.7	0.7	0.79	0.64	0.66	0.65	0.77
MIL (SAD emb)	0.83*	0.79	0.81	0.86	0.66	0.83	0.73	0.85

*McNemar’s Test, p ≪0.01

Back to article page