Skip to main content

Table 4 Performance of the pre-trained model and the fine-tuned models with LL recorded echo

From: Nonlinear residual echo suppression based on dual-stream DPRNN

Echo Model PESQ SDR STOI
Artificial speech LAEC 1.48 −2.60 0.622
  Time 2.61 12.3 0.866
  Time_1 2.59 12.1 0.866
  Time_2 2.60 12.0 0.864
  TF 2.75 12.4 0.880
  TF_1 2.73 12.4 0.879
  TF_2 2.70 12.2 0.875
Artificial music LAEC 1.48 −2.90 0.634
  Time 2.50 11.5 0.842
  Time_1 2.47 11.4 0.842
  Time_2 2.47 11.2 0.840
  TF 2.62 11.4 0.857
  TF_1 2.61 11.4 0.856
  TF_2 2.58 11.2 0.852
ER speech LAEC 1.61 −2.05 0.697
  Time 2.68 11.7 0.892
  Time_1 2.70 11.9 0.894
  Time_2 2.72 11.8 0.894
  TF 2.77 11.3 0.904
  TF_1 2.81 11.7 0.906
  TF_2 2.83 11.6 0.908
ER music LAEC 1.70 −1.12 0.730
  Time 2.75 12.6 0.900
  Time_1 2.75 12.7 0.900
  Time_2 2.77 12.8 0.902
  TF 2.79 11.9 0.907
  TF_1 2.83 12.2 0.909
  TF_2 2.86 12.3 0.911
LL speech LAEC 1.95 1.67 0.806
  Time 3.00 15.6 0.932
  Time_1 blue3.02 blue15.9 blue0.934
  Time_2 blue3.07 blue16.3 blue0.936
  TF 3.02 15.3 0.938
  TF_1 blue3.08 blue15.8 blue0.940
  TF_2 blue3.22 blue16.5 blue0.947
LL music LAEC 1.97 2.16 0.820
  Time 3.07 16.0 0.935
  Time_1 blue3.07 blue16.2 blue0.936
  Time_2 blue3.10 blue16.4 blue0.939
  TF 3.12 15.8 0.944
  TF_1 blue3.17 blue16.1 blue0.945
  TF_2 blue3.24 blue16.5 blue0.948