Fig. 5From: End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural networkPerformances of four proposed model variations on a the RECOLA dataset and b the IEMOCAP dataset. Suffix “-CS” indicates using the context stacking; “max” and “rms” indicate using the max-pooling aggregation and the RMS aggregation, respectivelyBack to article page