Skip to main content
Fig. 3 | EURASIP Journal on Audio, Speech, and Music Processing

Fig. 3

From: A depthwise separable convolutional neural network for keyword spotting on an embedded system

Fig. 3

General architecture of the DS-CNN. The input feature map is passed onto the first convolution layer, followed by batch normalization and ReLU activations. The following DS-convolution layers 1-N each consist of a depthwise convolution, followed by batch-normalization and ReLU activation, passed on to a pointwise convolution and another batch normalization and ReLU activation. At the end of the convolutional layers, the output undergoes average pooling and a fully connected (FC) layer with softmax activations

Back to article page