Room-localized speech activity detection in multi-microphone smart homes

EURASIP Journal on Audio, Speech, and Music Processing

Table 3 Effect of the various choices in the design of the system’s first stage (discussed in Section 4.4) to the room-localized SAD performance on the DIRHA-sim test set

Oper	\({\mathcal {M}}\)	Classes \({\mathcal {J}}\)	Recall	Precision	F-score
RI	\({\mathcal {M}}_{\mathrm {\,all}}\!\!\!\!\)	{ sp_all,sil_all}	72.30	56.63	63.51
RL	\({\mathcal {M}}_{\,r}\)	{ sp_r,sil_all}	72.07	61.08	66.12
		{ sp_r,sil_r}	71.20	60.39	65.35
		\(\{\,{\text {sp}}_{\,r\,},{\text {sp}}_{\,{\bar {r}}\,},{\text {sil}}_{\,\text {all}\,}\}\)	71.00	62.40	66.43

For consistency, the first stage is always followed by the second stage of the MFCC/GMM baseline of Section 6.1. RI denotes room-independent operation (“oper”) of the first stage and RL room-localized one