Skip to main content

Table 6 Acoustic features used in the A-GTM-UVigo-Three feature+DTW-based fusion system

From: ALBAYZIN Query-by-example Spoken Term Detection 2016 evaluation

Description

No. of features

Sum of auditory spectra

1

Zero-crossing rate

1

Sum of RASTA style filtering auditory spectra

1

Frame intensity

1

Frame loudness

1

Root mean square energy and log-energy

2

Energy in frequency bands 250–650 Hz and 1000–4000 Hz

2

Spectral rolloff points at 25%, 50%, 75%, 90%

4

Spectral flux

1

Spectral entropy

1

Spectral variance

1

Spectral skewness

1

Spectral kurtosis

1

Psychoacoustical sharpness

1

Spectral harmonicity

1

Spectral flatness

1

MFCCs

16

MFCC filterbank

26

Line spectral pairs

8

Cepstral PLP coefficients

9

RASTA PLP coefficients

9

Fundamental frequency (F0)

1

Probability of voicing

1

Jitter

2

Shimmer

1

Log harmonics-to-noise ratio (logHNR)

1

LPC formant frequencies and bandwidths

6

Formant frame intensity

1

First derivative

102

Total

204

  1. RASTA log-RelAtive SpecTrAl, MFCC Mel-frequency cepstral coefficient, PLP perceptual linear prediction, LPC linear prediction coding