EURASIP Journal on Audio, Speech, and Music Processing

Table 7 Overall SDRi/SI-SNRi(dB) performance with different configurations

From: Heterogeneous separation consistency training for adaptation of unsupervised speech separation

Dataset	System	Baseline	SCT	Supervised
Aishell2Mix	Conv-TasNet	2.57/2.08	6.15/5.52	9.00/8.32
	DPCCN	5.78/5.09	6.48/5.82	8.86/8.14
WHAMR!	Conv-TasNet	6.83/6.45	8.48/8.06	11.03/10.59
	DPCCN	8.99/8.50	9.26/8.81	11.01/10.56

“Baseline” means model trained on source domain Libri2Mix while evaluated on target domain Aishell2Mix and WHAMR!. “SCT” is the best adaptation configuration, i.e. SCT-2 with CPS-2. “Supervised” means model trained with ground-truth labels

Back to article page