Skip to main content

Table 4 Comparison of the average PESQ, STOI, and SDR for test datasets with and without reverberation

From: Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement

 

Test dataset

No RIR

RIR

Training dataset

Metrics

PESQ

STOI (%)

SDR

PESQ

STOI (%)

SDR

No RIR

Unprocessed

2.08

91.7

14.81

2.24

88.5

14.81

CRN

2.55

93.8

19.29

2.18

88.7

15.76

MSTCN

2.77

94.3

16.82

2.52

90.1

14.36

LSTM-IRM

2.90

95.2

19.90

2.71

91.6

16.81

GCRN

2.85

94.4

20.82

2.37

89.1

16.13

GaGNet

2.98

94.9

21.04

2.47

89.5

16.55

Conv-TasNet

2.99

95.0

21.50

2.44

89.3

16.31

DCCRN

3.22

95.7

21.48

2.49

90.4

16.43

DPCRN

3.19

95.6

21.53

2.71

91.6

17.53

SA-MSTCN\(^{1}\)

3.38

96.1

21.45

2.74

91.4

17.21

SA-MSTCN\(^{2}\)

3.41

96.2

21.95

2.71

91.3

17.24

RIR

CRN

2.43

93.3

18.75

2.59

90.7

18.39

MSTCN

2.59

93.6

16.19

2.75

91.6

15.93

LSTM-IRM

2.83

95.0

19.70

3.02

93.2

19.31

GCRN

2.68

93.6

19.75

2.84

91.8

19.08

GaGNet

2.69

93.8

19.87

2.86

91.6

19.49

Conv-TasNet

2.93

94.8

21.08

3.03

92.5

20.22

DCCRN

3.00

94.9

21.16

3.15

93.0

20.30

DPCRN

2.98

94.9

20.58

3.24

93.3

20.14

SA-MSTCN\(^{1}\)

3.24

95.7

20.99

3.44

94.3

20.61

SA-MSTCN\(^{2}\)

3.26

95.8

21.30

3.47

94.3

20.83

Half RIR and half no RIR

CRN

2.50

93.5

19.02

2.58

90.7

18.40

MSTCN

2.69

94.0

16.54

2.75

91.6

15.87

LSTM-IRM

2.92

95.2

19.90

3.01

93.1

19.33

GCRN

2.72

94.0

20.09

2.84

91.4

19.14

GaGNet

2.91

94.0

20.87

2.84

91.3

19.22

Conv-TasNet

2.94

94.8

21.23

3.02

92.4

20.18

DCCRN

3.16

95.2

21.35

3.15

92.9

20.11

DPCRN

3.09

95.2

20.93

3.20

93.1

20.03

SA-MSTCN\(^{1}\)

3.32

95.9

21.26

3.42

94.3

20.55

SA-MSTCN\(^{2}\)

3.36

96.0

21.41

3.46

94.3

20.79