Skip to main content

Table 2 Results (all-accuracy: %, train/test) of various versions of Audio-SegNet network (kernel_size: 3\(\times\)3, maxpooling_size: 2\(\times\)2, upsampling_size: 2\(\times\)2. 64 × 2 represents 2 convolution layers with 64 output mappings)

From: Deep semantic learning for acoustic scene classification

Audio-SegNet

SegNet-L

SegNet-M

SegNet-S

Mini-SegNet

Encoder

64 × 2

64 × 2

64 × 2

64 × 1

128 × 2

128 × 2

128 × 2

128 × 2

256 × 3

196 × 2

  

512 × 3

   

512 × 3

   

Decoder

512 × 3

196 × 2

128 × 2

128 × 2

512 × 3

128 × 2

64 × 2

64 × 1

256 × 3

64 × 2

  

128 × 2

   

64 × 2

   

Train params

31,880,650

2,051,050

707,338

670,282

Time(s)/Epoch

328

215

206

195

All-accuracy

93.86/59.06

90.84/63.44

85.32/65.35

83.45/66.46