Skip to main content

Table 2 Proposed architecture for generator

From: Predominant audio source separation in polyphonic music

Input size

Description

3 × 256 × 256

Input spectrogram

64 × 256 × 256

7 × 7 Conv, 64 filters, stride 1, pad 3

64 × 256 × 256

Instance normalization

64 × 256 × 256

ReLU

128 × 128 × 128

3 × 3 Conv, 128 filters, stride 2, pad 1

128 × 128 × 128

Instance normalization

128 × 128 × 128

ReLU

256 × 64 × 64

3 × 3 Conv, 256 filters, stride 2, pad 1

256 × 64 × 64

Instance normalization

256 × 64 × 64

ReLU

256 × 64 × 64

9 consecutive Resnet blocks, 256 filters

128 × 128 × 128

3 × 3 Conv, 128 filters, stride 2, pad 1

128 × 128 × 128

Instance normalization

128 × 128 × 128

ReLU

64 × 256 × 256

3 × 3 Conv, 64 filters, stride 1, pad 3

64 × 256 × 256

Instance normalization

64 × 256 × 256

ReLU

3 × 256 × 256

7 × 7 Conv, stride 1, pad 3

3 × 256 × 256

Instance normalization

3 × 256 × 256

Tanh