Skip to main content
Fig. 5 | EURASIP Journal on Audio, Speech, and Music Processing

Fig. 5

From: Explicit-memory multiresolution adaptive framework for speech and music separation

Fig. 5

Median signal-to-distortion ratio (SDR) for the MUSDB18 database using the proposed audio separation system. a, b, c, and d show the median SDR (in dB) for drums, bass, others, and vocals, respectively. L1 streams consist of the parallel paths \(L_{11}\), \(L_{12}\), \(L_{13}\), and stream integrator \(L_{1-SI}\). L2 streams consist of the parallel paths \(L_{21}\), \(L_{22}\), \(L_{23}\), and stream integrator \(L_{2-SI}\). The integrated system \(L_{1+2-SI}\) combines the complementary information in both levels 1 and 2 after stream integration and systematically performs better than \(L_{1-SI}\) or \(L_{2-SI}\). Top-down feedback or self-feedback during inference is shown in \(TD_{1+2-SI}\) and shows improvement on all tracks

Back to article page