Skip to main content

Table 3 Performance of the pre-trained model and the fine-tuned models with ER recorded echo

From: Nonlinear residual echo suppression based on dual-stream DPRNN

Echo Model PESQ SDR STOI
Artificial speech LAEC 1.48 −2.60 0.622
  Time 2.61 12.3 0.866
  Time_1 2.56 12.2 0.865
  Time_2 2.57 12.1 0.864
  TF 2.75 12.4 0.880
  TF_1 2.70 12.4 0.875
  TF_2 2.69 12.3 0.875
Artificial music LAEC 1.48 −2.90 0.634
  Time 2.50 11.5 0.842
  Time_1 2.44 11.4 0.841
  Time_2 2.46 11.3 0.841
  TF 2.62 11.4 0.857
  TF_1 2.58 11.3 0.853
  TF_2 2.57 11.3 0.852
ER speech LAEC 1.61 −2.05 0.697
  Time 2.68 11.7 0.892
  Time_1 blue2.70 blue12.0 blue0.894
  Time_2 blue2.75 blue12.5 blue0.899
  TF 2.77 11.3 0.904
  TF_1 blue2.80 blue11.9 blue0.905
  TF_2 blue2.88 blue12.4 blue0.912
ER music LAEC 1.70 −1.12 0.730
  Time 2.75 12.6 0.900
  Time_1 blue2.76 blue12.8 blue0.901
  Time_2 blue2.80 blue13.0 blue0.906
  TF 2.79 11.9 0.907
  TF_1 blue2.83 blue12.3 blue0.908
  TF_2 blue2.91 blue12.6 blue0.914
LL speech LAEC 1.95 1.67 0.806
  Time 3.00 15.6 0.932
  Time_1 3.00 15.8 0.933
  Time_2 3.03 16.1 0.935
  TF 3.02 15.3 0.938
  TF_1 3.08 15.8 0.939
  TF_2 3.13 16.1 0.943
LL music LAEC 1.97 2.16 0.820
  Time 3.07 16.0 0.935
  Time_1 3.03 16.1 0.936
  Time_2 3.04 16.2 0.937
  TF 3.12 15.8 0.944
  TF_1 3.17 16.1 0.944
  TF_2 3.18 16.2 0.946