From: Predominant audio source separation in polyphonic music
Input size | Description |
---|---|
3× 256 × 256 | Input spectrogram |
64 × 128 × 128 | 4 × 4 Conv, 64 filters, stride 2, pad 1 |
64 × 128 × 128 | Leaky ReLU (\(\alpha\)=0.2) |
128 × 64 × 64 | 4 × 4 Conv, 64 filters, stride 2, pad 1 |
128 × 64 × 64 | Instance normalization |
128 × 64 × 64 | Leaky ReLU (\(\alpha\)= 0.2) |
256 × 32 × 32 | 4 × 4 Conv, 64 filters, stride 2, pad 1 |
256 × 32 × 32 | Instance normalization |
256 × x32 × 32 | Leaky ReLU (\(\alpha\)= 0.2) |
512 × 31 × 31 | 4 × 4 Conv, 512 filters, stride 1, pad 1 |
1 × 4 × 4 | 4 × 4 Conv, stride 1, pad 1 |