Skip to main content

Table 6 Acoustic features used in the A-GTM-UVigo-Three feature+DTW-based fusion system

From: ALBAYZIN Query-by-example Spoken Term Detection 2016 evaluation

Description No. of features
Sum of auditory spectra 1
Zero-crossing rate 1
Sum of RASTA style filtering auditory spectra 1
Frame intensity 1
Frame loudness 1
Root mean square energy and log-energy 2
Energy in frequency bands 250–650 Hz and 1000–4000 Hz 2
Spectral rolloff points at 25%, 50%, 75%, 90% 4
Spectral flux 1
Spectral entropy 1
Spectral variance 1
Spectral skewness 1
Spectral kurtosis 1
Psychoacoustical sharpness 1
Spectral harmonicity 1
Spectral flatness 1
MFCCs 16
MFCC filterbank 26
Line spectral pairs 8
Cepstral PLP coefficients 9
RASTA PLP coefficients 9
Fundamental frequency (F0) 1
Probability of voicing 1
Jitter 2
Shimmer 1
Log harmonics-to-noise ratio (logHNR) 1
LPC formant frequencies and bandwidths 6
Formant frame intensity 1
First derivative 102
Total 204
  1. RASTA log-RelAtive SpecTrAl, MFCC Mel-frequency cepstral coefficient, PLP perceptual linear prediction, LPC linear prediction coding