Dynamically localizing multiple speakers based on the time-frequency domain

EURASIP Journal on Audio, Speech, and Music Processing

Table 1 The TF-DOAnet multi-speaker localization algorithm

∙ Compute the iRTF features from the multi-microphone recordings.
∙ Apply the U-net network to classify each TF bin to one of the possible DOAs.
∙ Based on the U-net results, decide the locations of the active speakers at each time frame.