From: Time-domain adaptive attention network for single-channel speech separation
Module | Layers | Input size | Output size |
---|---|---|---|
Encoder | Conv1d | [b, 1, 32000] | [b, 256, 15999] |
GroupNorm | [b, 256, 15999] | [b, 256, 15999] | |
Conv1d | [b, 256, 15999] | [b, 64, 15999] | |
Segmentation | [b, 64, 15999] | [64, 200, b*162] | |
Separator | LocalAttention | [64, 200, b*162] | [64, 200, b*162] |
GlobalAttention | [64, 200, b*162] | [64, 200, b*162] | |
Decoder | OverlapAdd | [64, 200, b*162] | [b*C, 256, 15999] |
Conv1d-Transpose | [b*C, 256, 15999] | [b, C, 32000] |