Skip to main content

Table 3 Experimental results by using simulated noisy reverberant data (RIR = ‘office’)

From: Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition

NN conf.

RIR

Frame sel. type

Speaker identification rate (%)

Left context only (L)

Left+right context (L+R)

Left+short right context (L+sR)

Frame sel.

Training data

Frame sel.

Training data

Frame sel.

Training data

1u

5u

10u

15u

1u

5u

10u

15u

1u

5u

10u

15u

Multiple NNs

20 dB

Linear

3-1-0

53.0

59.0

63.5

61.8

–

–

–

–

–

–

–

–

–

–

7-1-0

38.9

60.5

62.9

64.6

3-1-3

42.3

64.9

66.6

65.4

–

–

–

–

–

15-1-0

15.3

40.7

55.1

60.5

7-1-7

24.7

50.8

61.8

65.7

7-1-3

27.8

58.6

65.8

67.0

Skip1

3-1-0

48.6

58.8

63.4

62.2

–

–

–

–

–

–

–

–

–

–

7-1-0

32.1

60.5

61.8

62.9

3-1-3

46.3

63.1

66.0

67.0

–

–

–

–

–

–

–

–

–

–

7-1-7

22.7

45.8

57.3

62.9

7-1-3

27.6

54.1

66.2

67.1

10 dB

Linear

3-1-0

20.7

34.8

32.0

35.7

–

–

–

–

–

–

–

–

–

–

7-1-0

18.3

34.1

37.6

38.4

3-1-3

25.6

37.4

38.6

41.1

–

–

–

–

–

15-1-0

3.2

20.4

31.7

33.9

7-1-7

6.1

25.2

36.9

41.0

7-1-3

10.1

32.3

40.8

42.8

Skip1

3-1-0

31.9

32.1

34.1

35.1

–

–

–

–

–

–

–

–

–

–

7-1-0

13.2

32.0

36.5

37.0

3-1-3

20.7

37.3

39.8

41.3

–

–

–

–

–

–

–

–

–

–

7-1-7

6.2

19.8

31.4

37.2

7-1-3

8.1

32.5

37.5

41.2