Skip to main content

Table 2 The extracted low-level audio features

From: An evolutionary feature synthesis approach for content-based audio retrieval

Segment features

Key-frame features

STATa (39-D)

MFCC + Δ-MFCC + ΔΔ-MFCC (39-D)

13 Mel-frequency cepstral coefficients (MFCC) (26-D)

12th-order LPC + 14th-order LPCC (26-D)

13 Δ-MFCC (26-D)

K_AUDIOc (31-D)

13 ΔΔ-MFCC (26-D)

 

10th-order linear prediction coefficients (LPC) (20-D)

 

14th-order linear prediction cepstral coefficients (LPCC) (28-D)

 

S_AUDIOb (38-D)

 
  1. aSTAT includes the mean (μ) and standard deviation (σ) values of signal statistical features, both in time and frequency domain: mean, variance, standard deviation, average deviation, skewness, kurtosis, and also the following segment features (μ,σ): band-energy ratio (BER), spectral centroid, transition rate, FF, irregularity (2 versions), flatness (both in linear and decibel scale), and tonality.
  2. bS_AUDIO includes the following segment features (μ,σ): tristimulus, smoothness, spectral spread, spectral roll-off, RMS amplitude, inharmonicity, spectral crest, loudness, noisiness, power, odd-to-even ratio, and sub-band powers of six frequency bands.
  3. cK_AUDIO includes the following key-frame features: irregularity (two versions), tristimulus, smoothness, spectral spread, zero-crossing rate, spectral roll-off, loudness, flatness (linear and decibel scale), tonality, noisiness, RMS amplitude, inharmonicity, spectral crest, odd-to-even ratio, spectral slope, FF, skewness, kurtosis, spectral skewness, spectral kurtosis, and 7-band sub-band powers.