EURASIP Journal on Audio, Speech, and Music Processing

Table 2 Results (all-accuracy: %, train/test) of various versions of Audio-SegNet network (kernel_size: 3\(\times\)3, maxpooling_size: 2\(\times\)2, upsampling_size: 2\(\times\)2. 64 × 2 represents 2 convolution layers with 64 output mappings)

From: Deep semantic learning for acoustic scene classification

Audio-SegNet	SegNet-L	SegNet-M	SegNet-S	Mini-SegNet
Encoder	64 × 2	64 × 2	64 × 2	64 × 1
	128 × 2	128 × 2	128 × 2	128 × 2
	256 × 3	196 × 2
	512 × 3
	512 × 3
Decoder	512 × 3	196 × 2	128 × 2	128 × 2
	512 × 3	128 × 2	64 × 2	64 × 1
	256 × 3	64 × 2
	128 × 2
	64 × 2
Train params	31,880,650	2,051,050	707,338	670,282
Time(s)/Epoch	328	215	206	195
All-accuracy	93.86/59.06	90.84/63.44	85.32/65.35	83.45/66.46

Back to article page