An evolutionary feature synthesis approach for content-based audio retrieval

Table 2 The extracted low-level audio features

Segment features	Key-frame features
STAT^a (39-D)	MFCC + Δ-MFCC + ΔΔ-MFCC (39-D)
13 Mel-frequency cepstral coefficients (MFCC) (26-D)	12th-order LPC + 14th-order LPCC (26-D)
13 Δ-MFCC (26-D)	K_AUDIO^c (31-D)
13 ΔΔ-MFCC (26-D)
10th-order linear prediction coefficients (LPC) (20-D)
14th-order linear prediction cepstral coefficients (LPCC) (28-D)
S_AUDIO^b (38-D)

^aSTAT includes the mean (μ) and standard deviation (σ) values of signal statistical features, both in time and frequency domain: mean, variance, standard deviation, average deviation, skewness, kurtosis, and also the following segment features (μ,σ): band-energy ratio (BER), spectral centroid, transition rate, FF, irregularity (2 versions), flatness (both in linear and decibel scale), and tonality.
^bS_AUDIO includes the following segment features (μ,σ): tristimulus, smoothness, spectral spread, spectral roll-off, RMS amplitude, inharmonicity, spectral crest, loudness, noisiness, power, odd-to-even ratio, and sub-band powers of six frequency bands.
^cK_AUDIO includes the following key-frame features: irregularity (two versions), tristimulus, smoothness, spectral spread, zero-crossing rate, spectral roll-off, loudness, flatness (linear and decibel scale), tonality, noisiness, RMS amplitude, inharmonicity, spectral crest, odd-to-even ratio, spectral slope, FF, skewness, kurtosis, spectral skewness, spectral kurtosis, and 7-band sub-band powers.