Skip to main content
Fig. 4 | EURASIP Journal on Audio, Speech, and Music Processing

Fig. 4

From: A depthwise separable convolutional neural network for keyword spotting on an embedded system

Fig. 4

Overview of a single depthwise separable convolutional layer consisting of a depthwise convolution followed by a pointwise convolution. (1) The depthwise convolution separately applies a 2-dimensional filter to each of the channels in the input, extracting time-frequency patterns. (2) The pointwise convolution then applies a number of 1-dimensional filters to the output of the depthwise convolution across all channels

Back to article page