DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models

EURASIP Journal on Audio, Speech, and Music Processing

Table 1 Overview of different options for various components of the system considered throughout the experiments. If nothing else is explicitly stated, the underlined default is used

DOA estimates	Oracle, or estimated using the DNN approach from [43]
Initial masks	(proposed) DOA-based (Eq. 19), or oracle (Eq. 34), or random
STAs (Eq. 31)	Extracted from (proposed) DOA-based initial masks (Eq. 19), or extracted from oracle initial masks (Eq. 34), or none
DTAs	One mixture component for each direction(K=D+1: DTAs are available, see Sec. 4.3), or one mixture component for each speaker (K=J+1: DTAs are not available)
Permutation alignment	No manual alignment, or oracle alignment (as explained in Section 5.2.1)
Speaker separation	Mask-based MVDR beamforming (Eq. 12), or direct application of the masks (Eq. 7)