Predominant audio source separation in polyphonic music

EURASIP Journal on Audio, Speech, and Music Processing

Table 2 Proposed architecture for generator

Input size	Description
3 × 256 × 256	Input spectrogram
64 × 256 × 256	7 × 7 Conv, 64 filters, stride 1, pad 3
64 × 256 × 256	Instance normalization
64 × 256 × 256	ReLU
128 × 128 × 128	3 × 3 Conv, 128 filters, stride 2, pad 1
128 × 128 × 128	Instance normalization
128 × 128 × 128	ReLU
256 × 64 × 64	3 × 3 Conv, 256 filters, stride 2, pad 1
256 × 64 × 64	Instance normalization
256 × 64 × 64	ReLU
256 × 64 × 64	9 consecutive Resnet blocks, 256 filters
128 × 128 × 128	3 × 3 Conv, 128 filters, stride 2, pad 1
128 × 128 × 128	Instance normalization
128 × 128 × 128	ReLU
64 × 256 × 256	3 × 3 Conv, 64 filters, stride 1, pad 3
64 × 256 × 256	Instance normalization
64 × 256 × 256	ReLU
3 × 256 × 256	7 × 7 Conv, stride 1, pad 3
3 × 256 × 256	Instance normalization
3 × 256 × 256	Tanh