Skip to main content
Fig. 2 | EURASIP Journal on Audio, Speech, and Music Processing

Fig. 2

From: Components loss for neural networks in mask-based speech enhancement

Fig. 2

Topology details of the employed CNN in Fig. 1 (adopted from [48, Fig. 6]). The operation Conv (f,h×w) stands for convolution, with F or 2F representing the number of filter kernels in each layer, and (h×w) representing the kernel size. The maxpooling and upsampling layers have a kernel size of (2×1). The stride of maxpooling layers is set to 2. The gray areas contain two symmetric procedures. All possible forward residual skip connections are added to the layers with matched dimensions

Back to article page