From: Comparison of semi-supervised deep learning algorithms for audio classification
Layer | Architecture |
input | Log mel spectrogram |
conv1 | BasicBlock(32) |
Max pool | |
block1 | \(\left [\qquad \begin {array}{c} \text {BasicBlock(32)}\\ \text {BasicBlock(32)} \end {array}\qquad \right ] \times 4\) |
block2 | \(\left [\qquad \begin {array}{c} \text {BasicBlock(64)}\\ \text {BasicBlock(64)} \end {array}\qquad \right ] \times 4\) |
block3 | \(\left [\qquad \begin {array}{c} \text {BasicBlock(128)}\\ \text {BasicBlock(128)} \end {array}\qquad \right ] \times 4\) |
Avg pool | |
ReLU | |
Linear |