Skip to main content

Table 1 Acoustic and visual handcrafted features

From: Ensemble of convolutional neural networks to improve animal audio classification

Features

Descriptors

Reference

Acoustic

Statistical Spectrum Descriptors (SSD) is a set of statistical measures that describe audio content taken from the moments on the Sonogram (the Sone) of each of the twenty-four critical bands defined according to the Bark scale.

[49]

 

Rhythm Histogram (RH) is a feature set where the magnitudes of each modulation frequency bin of the twenty-four critical bands defined according to the Bark scale are summed up to form a histogram of “rhythmic energy” per modulation frequency.

[49]

 

Modulation Frequency Variance Descriptor (MVD) is a 420-dimensional feature vector that measures variation over the critical frequency bands for each modulation frequency.

[49]

 

Temporal Statistical Spectrum Descriptor (TSSD) is a feature set that incorporates temporal information from the SSD (timbre variations, changes in rhythm, etc.).

[14, 44]

 

Temporal Rhythm Histograms (TRH) is a feature set that captures rhythmic changes in music over time.

[49]

Visual

The multiscale uniform local binary pattern (LBP).

[41]

 

The multiscale LBP histogram Fourier descriptor (LHF) obtained from the concatenation of LBP-HF.

[63]

 

The multiscale rotation invariant co-occurrence of adjacent LBPs (LBP-RI).

[40]

 

The Multiscale Local Phase Quantization (MLPQ).

[42]

 

Ensemble of LPQ, where different configurations of LPQ are examined.

[35]

 

The Heterogeneous Auto-Similarities of Characteristics (HASC) descriptor that is applied to heterogeneous dense features maps.

[47]

 

Ensemble of variants of the LHF.

[34]

 

The Gabor filter feature extraction method where several different values for scale level and orientation are experimentally evaluated.

[17]

 

Extracts the standard Binarized Statistical Image Features (BSIF) by projecting subwindows of the entire image onto subspaces.

[24]

 

Adaptive hybrid pattern (AHP), which is an LBP variant that is noise robust because a quantization algorithm is applied that uses an equal probability quantization to maximize partition entropy.

[65]

 

Locally Encoded Transform feature histogram (LETRIST) that explicitly encodes the joint information within an image across feature and scale spaces.

[54]

 

CodebookLess Model, which is a dense sampling approach similar to Bag of Features (BoF).

[60]