The energy-labeled mask for the speech and crowd noise with music mixture. (a) Cochleagram of a female utterance showing the energy of each T-F units. The brighter pixel indicates stronger energy. (b) Ideal binary mask, which is computed by target and intrusion before mixing. (c) Cochleagram of the mixture. (d) The mask labeled by the conventional threshold. (e) The mask labeled by the proposed threshold selection method.