Skip to main content

Table 4 Performance of the room discriminant features of Section 5.1 and their combinations, in conjunction with inter-room fusion (Section 5.2) and SVM modeling (Section 5.3) for the room-inside vs. room-outside speech classification task of the second stage of the proposed algorithm

From: Room-localized speech activity detection in multi-microphone smart homes

Set

SVM

Feature

Recall

Precision

F-score

 

models

(∙)

\({{f}}_{\,r,{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)

\({{f}}_{\,r,\,{\text {avg}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)

\({{f}}_{\,{\text {home}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)

\({{f}}_{\,r,{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)

\({{f}}_{\,r,\,{\text {avg}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)

\({{f}}_{\,{\text {home}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)

\({{f}}_{\,r,{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)

\({{f}}_{\,r,\,{\text {avg}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)

\({{f}}_{\,{\text {home}},{\mathcal {T}}}^{\,{\mathrm {(\bullet)}}}\)

DIRHA-sim

 

(en)

63.97

37.93

40.06

50.51

86.03

86.92

56.45

52.65

54.84

  

(coh)

47.46

87.41

88.66

67.90

77.01

76.05

55.87

81.88

81.87

  

(ev)

82.89

90.81

90.38

78.01

74.85

76.28

80.37

82.06

82.74

  

(ts)

71.91

86.00

89.35

52.21

74.46

79.28

60.50

79.82

84.01

 

Room-

(srp)

76.76

79.85

79.25

53.94

56.44

60.94

63.36

66.13

68.90

 

specific

(ts,srp)

80.67

89.33

90.58

66.72

79.37

82.97

73.03

84.05

86.61

  

(ts,srp,ev)

91.74

90.74

91.86

85.20

83.26

85.27

88.35

86.84

88.44

  

(ts,srp,ev,coh)

90.62

90.42

92.27

83.65

84.96

85.80

86.99

87.61

88.92

  

(en,coh,ev)[34]

89.48

87.65

90.37

78.90

81.16

81.69

83.86

84.28

85.81

  

(all)

91.14

89.65

91.40

83.93

85.30

85.40

87.39

87.42

88.30

 

Global

 

91.12

92.21

n/a

78.49

79.63

n/a

84.34

85.46

n/a

DIRHA-real

 

(en)

63.65

24.39

27.68

55.30

100.00

100.00

59.18

39.22

43.36

  

(coh)

5.61

71.35

78.99

100.00

61.67

57.22

10.62

66.16

66.71

  

(ev)

99.02

99.73

99.73

97.40

98.07

98.21

98.21

98.89

98.96

  

(ts)

68.94

97.44

97.94

81.42

95.25

93.41

74.67

96.33

95.62

 

Room-

(srp)

85.36

87.91

80.75

75.50

77.98

75.29

80.13

82.65

77.93

 

specific

(ts,srp)

90.28

94.52

97.33

91.58

95.32

86.76

90.92

94.92

91.74

  

(ts,srp,ev)

99.90

98.82

97.81

99.82

97.87

97.24

99.86

98.34

97.53

  

(ts,srp,ev,coh)

98.52

98.99

98.11

99.94

98.37

87.09

99.23

98.68

92.27

  

(en,coh,ev)[34]

98.25

99.73

99.50

99.60

98.64

90.84

98.92

99.18

94.98

  

(all)

98.89

98.85

95.68

99.94

98.46

80.21

99.42

98.66

87.26

 

Global

 

99.33

100.00

n/a

100.00

99.84

n/a

99.66

99.92

n/a

  1. Results are reported on R=4 rooms of the DIRHA smart home (excluding the corridor) on the DIRHA-sim (top) and DIRHA-real (bottom) test sets using ground-truth speech segment boundaries. All SVMs operate over entire segments