Figure 2

Frame-based perceptual feature extraction. In order to efficiently model the characteristic of an emotional mode, the perceptual features computed in each frame are averaged over Y overlapping frames. Hence, a single M-dimensional average feature vector is extracted from each emotional and reference audio pair and is used as a training or test vector.