From: Nonlinear residual echo suppression based on dual-stream DPRNN
Echo | Model | PESQ | SDR | STOI |
---|---|---|---|---|
Artificial speech | LAEC | 1.48 | −2.60 | 0.622 |
LSTM | 2.14 | 6.33 | 0.780 | |
MSTasNet | 2.54 | 11.6 | 0.857 | |
DSDPRNN_ty | 2.61 | 12.3 | 0.866 | |
DSDPRNN_tx | 2.66 | 12.8 | 0.876 | |
DSDPRNN_fy | 2.75 | 12.4 | 0.880 | |
DSDPRNN_fx | 2.74 | 12.5 | 0.882 | |
Artificial music | LAEC | 1.48 | −2.90 | 0.634 |
LSTM | 2.08 | 5.46 | 0.755 | |
MSTasNet | 2.43 | 10.7 | 0.830 | |
DSDPRNN_ty | 2.50 | 11.5 | 0.842 | |
DSDPRNN_tx | 2.61 | 12.6 | 0.865 | |
DSDPRNN_fy | 2.62 | 11.4 | 0.857 | |
DSDPRNN_fx | 2.64 | 11.6 | 0.863 | |
ER speech | LAEC | 16.1 | −2.05 | 0.697 |
LSTM | 2.13 | 4.85 | 0.799 | |
MSTasNet | 2.66 | 11.6 | 0.890 | |
DSDPRNN_ty | 2.68 | 11.7 | 0.892 | |
DSDPRNN_tx | 2.62 | 11.5 | 0.887 | |
DSDPRNN_fy | 2.77 | 11.3 | 0.904 | |
DSDPRNN_fx | 2.66 | 10.6 | 0.895 | |
ER music | LAEC | 1.70 | −1.12 | 0.730 |
LSTM | 2.25 | 5.95 | 0.826 | |
MSTasNet | 2.72 | 12.2 | 0.898 | |
DSDPRNN_ty | 2.75 | 12.6 | 0.900 | |
DSDPRNN_tx | 2.68 | 12.3 | 0.897 | |
DSDPRNN_fy | 2.79 | 11.9 | 0.907 | |
DSDPRNN_fx | 2.76 | 11.7 | 0.907 | |
LL speech | LAEC | 1.95 | 1.67 | 0.806 |
LSTM | 2.55 | 9.23 | 0.884 | |
MSTasNet | 2.99 | 15.0 | 0.932 | |
DSDPRNN_ty | 3.00 | 15.6 | 0.932 | |
DSDPRNN_tx | 2.87 | 14.9 | 0.920 | |
DSDPRNN_fy | 3.02 | 15.3 | 0.938 | |
DSDPRNN_fx | 3.04 | 15.7 | 0.938 | |
LL music | LAEC | 1.97 | 2.16 | 0.820 |
LSTM | 2.60 | 9.07 | 0.889 | |
MSTasNet | 3.04 | 15.6 | 0.934 | |
DSDPRNN_ty | 3.07 | 16.0 | 0.935 | |
DSDPRNN_tx | 2.89 | 14.8 | 0.921 | |
DSDPRNN_fy | 3.12 | 15.8 | 0.944 | |
DSDPRNN_fx | 3.13 | 16.0 | 0.943 |