Skip to main content

Table 6 Experimental results by using position-specific training data

From: Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition

Method

Dataset

Speaker identification rate (%)

P01

P03

P05

P02

P04

Avg. known

Avg. unknown

Avg. all

Proposed (24 NNs) + CMN

P01

1s.5u

85.8*

86.0

85.9

87.1

85.5

85.8

86.1

86.1

3s.15u

91.8*

91.8

92.0

93.2

90.0

91.8

91.8

91.8

P03

1s.5u

84.0

86.8*

85.2

88.1

88.6

86.8

86.5

86.5

3s.15u

90.3

92.7*

93.5

94.8

92.8

92.7

92.9

92.8

P05

1s.5u

88.8

91.2

92.7*

93.3

89.7

92.7

90.8

91.1

3s.15u

90.3

93.0

95.0*

96.2

91.7

95.0

92.8

93.2

Proposed (12 NNs) + CMN

P01

1s.5u

88.8*

88.6

88.6

89.9

90.4

88.8

89.4

89.3

3s.15u

93.5*

94.3

94.2

95.0

93.3

93.5

94.2

94.1

P03

1s.5u

87.8

89.6*

89.3

90.7

92.3

89.6

90.0

89.9

3s.15u

91.5

94.7*

94.3

95.0

94.2

94.7

93.8

93.9

P05

1s.5u

89.9

92.5

92.4*

94.2

92.7

92.4

92.3

92.3

3s.15u

91.5

93.2

92.8*

96.5

92.7

92.8

93.5

93.3

Proposed (6 NNs) + CMN

P01

1s.5u

89.5*

88.4

87.9

90.9

91.7

89.5

89.7

89.7

3s.15u

92.2*

91.7

91.0

94.7

93.3

92.2

92.7

92.6

P03

1s.5u

89.1

89.7*

88.4

91.0

92.7

89.7

90.3

90.2

3s.15u

91.5

92.0*

92.0

95.0

93.8

92.0

93.0

92.9

P05

1s.5u

90.1

91.4

91.7*

94.7

93.4

91.7

92.4

92.3

3s.15u

92.5

93.2

92.3*

96.3

94.2

92.3

94.0

93.7

  1. The known environments include P01, P03, and P05, while the unknown environments include P02 and P04. The experiments were done by using the first testing scheme and skip1 7-1-0 frame selection. The asterisks (*) indicate known positions (matched conditions). The bold text represents the best average performance for each training data number.