Skip to main content

Table 2 Architecture of Wide ResNet28-2. Downsampling is performed by the first layers in block2 and block3

From: Comparison of semi-supervised deep learning algorithms for audio classification

Layer

Architecture

input

Log mel spectrogram

conv1

BasicBlock(32)

 

Max pool

block1

\(\left [\qquad \begin {array}{c} \text {BasicBlock(32)}\\ \text {BasicBlock(32)} \end {array}\qquad \right ] \times 4\)

block2

\(\left [\qquad \begin {array}{c} \text {BasicBlock(64)}\\ \text {BasicBlock(64)} \end {array}\qquad \right ] \times 4\)

block3

\(\left [\qquad \begin {array}{c} \text {BasicBlock(128)}\\ \text {BasicBlock(128)} \end {array}\qquad \right ] \times 4\)

 

Avg pool

 

ReLU

 

Linear