EURASIP Journal on Audio, Speech, and Music Processing

Table 4 Comparison of the average PESQ, STOI, and SDR for test datasets with and without reverberation

From: Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement

	Test dataset	No RIR			RIR
Training dataset	Metrics	PESQ	STOI (%)	SDR	PESQ	STOI (%)	SDR
No RIR	Unprocessed	2.08	91.7	14.81	2.24	88.5	14.81
	CRN	2.55	93.8	19.29	2.18	88.7	15.76
	MSTCN	2.77	94.3	16.82	2.52	90.1	14.36
	LSTM-IRM	2.90	95.2	19.90	2.71	91.6	16.81
	GCRN	2.85	94.4	20.82	2.37	89.1	16.13
	GaGNet	2.98	94.9	21.04	2.47	89.5	16.55
	Conv-TasNet	2.99	95.0	21.50	2.44	89.3	16.31
	DCCRN	3.22	95.7	21.48	2.49	90.4	16.43
	DPCRN	3.19	95.6	21.53	2.71	91.6	17.53
	SA-MSTCN\(^{1}\)	3.38	96.1	21.45	2.74	91.4	17.21
	SA-MSTCN\(^{2}\)	3.41	96.2	21.95	2.71	91.3	17.24
RIR	CRN	2.43	93.3	18.75	2.59	90.7	18.39
	MSTCN	2.59	93.6	16.19	2.75	91.6	15.93
	LSTM-IRM	2.83	95.0	19.70	3.02	93.2	19.31
	GCRN	2.68	93.6	19.75	2.84	91.8	19.08
	GaGNet	2.69	93.8	19.87	2.86	91.6	19.49
	Conv-TasNet	2.93	94.8	21.08	3.03	92.5	20.22
	DCCRN	3.00	94.9	21.16	3.15	93.0	20.30
	DPCRN	2.98	94.9	20.58	3.24	93.3	20.14
	SA-MSTCN\(^{1}\)	3.24	95.7	20.99	3.44	94.3	20.61
	SA-MSTCN\(^{2}\)	3.26	95.8	21.30	3.47	94.3	20.83
Half RIR and half no RIR	CRN	2.50	93.5	19.02	2.58	90.7	18.40
	MSTCN	2.69	94.0	16.54	2.75	91.6	15.87
	LSTM-IRM	2.92	95.2	19.90	3.01	93.1	19.33
	GCRN	2.72	94.0	20.09	2.84	91.4	19.14
	GaGNet	2.91	94.0	20.87	2.84	91.3	19.22
	Conv-TasNet	2.94	94.8	21.23	3.02	92.4	20.18
	DCCRN	3.16	95.2	21.35	3.15	92.9	20.11
	DPCRN	3.09	95.2	20.93	3.20	93.1	20.03
	SA-MSTCN\(^{1}\)	3.32	95.9	21.26	3.42	94.3	20.55
	SA-MSTCN\(^{2}\)	3.36	96.0	21.41	3.46	94.3	20.79

Back to article page