Skip to main content

Table 4 Performance of the room discriminant features of Section 5.1 and their combinations, in conjunction with inter-room fusion (Section 5.2) and SVM modeling (Section 5.3) for the room-inside vs. room-outside speech classification task of the second stage of the proposed algorithm

From: Room-localized speech activity detection in multi-microphone smart homes

Set SVM Feature Recall Precision F-score
  models (∙) \({{f}}_{\,r,{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\) \({{f}}_{\,r,\,{\text {avg}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\) \({{f}}_{\,{\text {home}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\) \({{f}}_{\,r,{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\) \({{f}}_{\,r,\,{\text {avg}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\) \({{f}}_{\,{\text {home}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\) \({{f}}_{\,r,{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\) \({{f}}_{\,r,\,{\text {avg}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\) \({{f}}_{\,{\text {home}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)
DIRHA-sim   (en) 63.97 37.93 40.06 50.51 86.03 86.92 56.45 52.65 54.84
   (coh) 47.46 87.41 88.66 67.90 77.01 76.05 55.87 81.88 81.87
   (ev) 82.89 90.81 90.38 78.01 74.85 76.28 80.37 82.06 82.74
   (ts) 71.91 86.00 89.35 52.21 74.46 79.28 60.50 79.82 84.01
  Room- (srp) 76.76 79.85 79.25 53.94 56.44 60.94 63.36 66.13 68.90
  specific (ts,srp) 80.67 89.33 90.58 66.72 79.37 82.97 73.03 84.05 86.61
   (ts,srp,ev) 91.74 90.74 91.86 85.20 83.26 85.27 88.35 86.84 88.44
   (ts,srp,ev,coh) 90.62 90.42 92.27 83.65 84.96 85.80 86.99 87.61 88.92
   (en,coh,ev)[34] 89.48 87.65 90.37 78.90 81.16 81.69 83.86 84.28 85.81
   (all) 91.14 89.65 91.40 83.93 85.30 85.40 87.39 87.42 88.30
  Global   91.12 92.21 n/a 78.49 79.63 n/a 84.34 85.46 n/a
DIRHA-real   (en) 63.65 24.39 27.68 55.30 100.00 100.00 59.18 39.22 43.36
   (coh) 5.61 71.35 78.99 100.00 61.67 57.22 10.62 66.16 66.71
   (ev) 99.02 99.73 99.73 97.40 98.07 98.21 98.21 98.89 98.96
   (ts) 68.94 97.44 97.94 81.42 95.25 93.41 74.67 96.33 95.62
  Room- (srp) 85.36 87.91 80.75 75.50 77.98 75.29 80.13 82.65 77.93
  specific (ts,srp) 90.28 94.52 97.33 91.58 95.32 86.76 90.92 94.92 91.74
   (ts,srp,ev) 99.90 98.82 97.81 99.82 97.87 97.24 99.86 98.34 97.53
   (ts,srp,ev,coh) 98.52 98.99 98.11 99.94 98.37 87.09 99.23 98.68 92.27
   (en,coh,ev)[34] 98.25 99.73 99.50 99.60 98.64 90.84 98.92 99.18 94.98
   (all) 98.89 98.85 95.68 99.94 98.46 80.21 99.42 98.66 87.26
  Global   99.33 100.00 n/a 100.00 99.84 n/a 99.66 99.92 n/a
  1. Results are reported on R=4 rooms of the DIRHA smart home (excluding the corridor) on the DIRHA-sim (top) and DIRHA-real (bottom) test sets using ground-truth speech segment boundaries. All SVMs operate over entire segments