Skip to main content

Table 5 SDRi/SI-SNRi (dB) performance of Conv-TasNet and DPCCN on Aishell2Mix test set under different SCT configurations

From: Heterogeneous separation consistency training for adaptation of unsupervised speech separation

SCT

System

#Iter

CPS-1

CPS-2

Oracle

SCT-1

Conv-TasNet

1

5.14/4.63

5.47/4.90

5.98/5.39

  

2

5.45/4.94

5.99/5.39

6.18/5.57

 

DPCCN

1

5.98/5.32

5.90/5.25

6.00/5.31

  

2

6.17/5.50

6.03/5.39

6.10/5.44

SCT-2

Conv-TasNet

1

5.14/4.63

5.47/4.90

5.98/5.39

  

2

5.36/4.89

6.15/5.52

6.21/5.65

 

DPCCN

1

6.05/5.52

6.48/5.82

6.79/6.19

  

2

5.49/5.05

6.43/5.81

6.45/5.91

SCT-3

Conv-TasNet

1

5.14/4.63

5.47/4.90

-

  

2

5.43/4.93

5.77/5.24

-

 

DPCCN

1

6.14/5.58

6.22/5.65

-

  

2

6.02/5.52

6.10/5.56

-

  1. “Oracle” means using ground-truth as reference to calculate SI-SNR of separation outputs for selecting the pseudo ground-truth. All source models are well pre-trained on Libri2Mix. The best setup of \(\{\alpha ,\beta \}\) in CPS-2 are \(\{5,5\}\), \(\{8,5\}\) in the 1st and 2nd iteration for all SCT variants, respectively. \(\eta\) is set to 5 for “Oracle selection”