Fig. 1From: End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural networkThe proposed end-to-end SER model denoted as the dilated-causal-convolution-only speech emotion recognition with context stacking (DiCCOSER-CS). The convolution filter width, stride and filter depth are listed in round brackets, and the pooling width and stride are listed in square bracketsBack to article page