From: Nonlinear residual echo suppression based on dual-stream DPRNN
Echo | Model | PESQ | SDR | STOI |
---|---|---|---|---|
Artificial speech | LAEC | 1.48 | −2.60 | 0.622 |
 | LSTM | 2.14 | 6.33 | 0.780 |
 | MSTasNet | 2.54 | 11.6 | 0.857 |
 | DSDPRNN_ty | 2.61 | 12.3 | 0.866 |
 | DSDPRNN_tx | 2.66 | 12.8 | 0.876 |
 | DSDPRNN_fy | 2.75 | 12.4 | 0.880 |
 | DSDPRNN_fx | 2.74 | 12.5 | 0.882 |
Artificial music | LAEC | 1.48 | −2.90 | 0.634 |
 | LSTM | 2.08 | 5.46 | 0.755 |
 | MSTasNet | 2.43 | 10.7 | 0.830 |
 | DSDPRNN_ty | 2.50 | 11.5 | 0.842 |
 | DSDPRNN_tx | 2.61 | 12.6 | 0.865 |
 | DSDPRNN_fy | 2.62 | 11.4 | 0.857 |
 | DSDPRNN_fx | 2.64 | 11.6 | 0.863 |
ER speech | LAEC | 16.1 | −2.05 | 0.697 |
 | LSTM | 2.13 | 4.85 | 0.799 |
 | MSTasNet | 2.66 | 11.6 | 0.890 |
 | DSDPRNN_ty | 2.68 | 11.7 | 0.892 |
 | DSDPRNN_tx | 2.62 | 11.5 | 0.887 |
 | DSDPRNN_fy | 2.77 | 11.3 | 0.904 |
 | DSDPRNN_fx | 2.66 | 10.6 | 0.895 |
ER music | LAEC | 1.70 | −1.12 | 0.730 |
 | LSTM | 2.25 | 5.95 | 0.826 |
 | MSTasNet | 2.72 | 12.2 | 0.898 |
 | DSDPRNN_ty | 2.75 | 12.6 | 0.900 |
 | DSDPRNN_tx | 2.68 | 12.3 | 0.897 |
 | DSDPRNN_fy | 2.79 | 11.9 | 0.907 |
 | DSDPRNN_fx | 2.76 | 11.7 | 0.907 |
LL speech | LAEC | 1.95 | 1.67 | 0.806 |
 | LSTM | 2.55 | 9.23 | 0.884 |
 | MSTasNet | 2.99 | 15.0 | 0.932 |
 | DSDPRNN_ty | 3.00 | 15.6 | 0.932 |
 | DSDPRNN_tx | 2.87 | 14.9 | 0.920 |
 | DSDPRNN_fy | 3.02 | 15.3 | 0.938 |
 | DSDPRNN_fx | 3.04 | 15.7 | 0.938 |
LL music | LAEC | 1.97 | 2.16 | 0.820 |
 | LSTM | 2.60 | 9.07 | 0.889 |
 | MSTasNet | 3.04 | 15.6 | 0.934 |
 | DSDPRNN_ty | 3.07 | 16.0 | 0.935 |
 | DSDPRNN_tx | 2.89 | 14.8 | 0.921 |
 | DSDPRNN_fy | 3.12 | 15.8 | 0.944 |
 | DSDPRNN_fx | 3.13 | 16.0 | 0.943 |