Room-localized speech activity detection in multi-microphone smart homes

Giannoulis, Panagiotis; Potamianos, Gerasimos; Maragos, Petros

doi:10.1186/s13636-019-0158-8

EURASIP Journal on Audio, Speech, and Music Processing

Table 4 Performance of the room discriminant features of Section 5.1 and their combinations, in conjunction with inter-room fusion (Section 5.2) and SVM modeling (Section 5.3) for the room-inside vs. room-outside speech classification task of the second stage of the proposed algorithm

From: Room-localized speech activity detection in multi-microphone smart homes

Set	SVM	Feature	Recall			Precision			F-score
	models	(∙)	\({{f}}_{\,r,{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)	\({{f}}_{\,r,\,{\text {avg}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)	\({{f}}_{\,{\text {home}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)	\({{f}}_{\,r,{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)	\({{f}}_{\,r,\,{\text {avg}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)	\({{f}}_{\,{\text {home}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)	\({{f}}_{\,r,{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)	\({{f}}_{\,r,\,{\text {avg}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)	\({{f}}_{\,{\text {home}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)
DIRHA-sim		(en)	63.97	37.93	40.06	50.51	86.03	86.92	56.45	52.65	54.84
		(coh)	47.46	87.41	88.66	67.90	77.01	76.05	55.87	81.88	81.87
		(ev)	82.89	90.81	90.38	78.01	74.85	76.28	80.37	82.06	82.74
		(ts)	71.91	86.00	89.35	52.21	74.46	79.28	60.50	79.82	84.01
	Room-	(srp)	76.76	79.85	79.25	53.94	56.44	60.94	63.36	66.13	68.90
	specific	(ts,srp)	80.67	89.33	90.58	66.72	79.37	82.97	73.03	84.05	86.61
		(ts,srp,ev)	91.74	90.74	91.86	85.20	83.26	85.27	88.35	86.84	88.44
		(ts,srp,ev,coh)	90.62	90.42	92.27	83.65	84.96	85.80	86.99	87.61	88.92
		(en,coh,ev)[34]	89.48	87.65	90.37	78.90	81.16	81.69	83.86	84.28	85.81
		(all)	91.14	89.65	91.40	83.93	85.30	85.40	87.39	87.42	88.30
	Global		91.12	92.21	n/a	78.49	79.63	n/a	84.34	85.46	n/a
DIRHA-real		(en)	63.65	24.39	27.68	55.30	100.00	100.00	59.18	39.22	43.36
		(coh)	5.61	71.35	78.99	100.00	61.67	57.22	10.62	66.16	66.71
		(ev)	99.02	99.73	99.73	97.40	98.07	98.21	98.21	98.89	98.96
		(ts)	68.94	97.44	97.94	81.42	95.25	93.41	74.67	96.33	95.62
	Room-	(srp)	85.36	87.91	80.75	75.50	77.98	75.29	80.13	82.65	77.93
	specific	(ts,srp)	90.28	94.52	97.33	91.58	95.32	86.76	90.92	94.92	91.74
		(ts,srp,ev)	99.90	98.82	97.81	99.82	97.87	97.24	99.86	98.34	97.53
		(ts,srp,ev,coh)	98.52	98.99	98.11	99.94	98.37	87.09	99.23	98.68	92.27
		(en,coh,ev)[34]	98.25	99.73	99.50	99.60	98.64	90.84	98.92	99.18	94.98
		(all)	98.89	98.85	95.68	99.94	98.46	80.21	99.42	98.66	87.26
	Global		99.33	100.00	n/a	100.00	99.84	n/a	99.66	99.92	n/a

Results are reported on R=4 rooms of the DIRHA smart home (excluding the corridor) on the DIRHA-sim (top) and DIRHA-real (bottom) test sets using ground-truth speech segment boundaries. All SVMs operate over entire segments

Back to article page