From: Ensemble of convolutional neural networks to improve animal audio classification
Features | Descriptors | Reference |
---|---|---|
Acoustic | Statistical Spectrum Descriptors (SSD) is a set of statistical measures that describe audio content taken from the moments on the Sonogram (the Sone) of each of the twenty-four critical bands defined according to the Bark scale. | [49] |
Rhythm Histogram (RH) is a feature set where the magnitudes of each modulation frequency bin of the twenty-four critical bands defined according to the Bark scale are summed up to form a histogram of “rhythmic energy” per modulation frequency. | [49] | |
Modulation Frequency Variance Descriptor (MVD) is a 420-dimensional feature vector that measures variation over the critical frequency bands for each modulation frequency. | [49] | |
Temporal Statistical Spectrum Descriptor (TSSD) is a feature set that incorporates temporal information from the SSD (timbre variations, changes in rhythm, etc.). | ||
Temporal Rhythm Histograms (TRH) is a feature set that captures rhythmic changes in music over time. | [49] | |
Visual | The multiscale uniform local binary pattern (LBP). | [41] |
The multiscale LBP histogram Fourier descriptor (LHF) obtained from the concatenation of LBP-HF. | [63] | |
The multiscale rotation invariant co-occurrence of adjacent LBPs (LBP-RI). | [40] | |
The Multiscale Local Phase Quantization (MLPQ). | [42] | |
Ensemble of LPQ, where different configurations of LPQ are examined. | [35] | |
The Heterogeneous Auto-Similarities of Characteristics (HASC) descriptor that is applied to heterogeneous dense features maps. | [47] | |
Ensemble of variants of the LHF. | [34] | |
The Gabor filter feature extraction method where several different values for scale level and orientation are experimentally evaluated. | [17] | |
Extracts the standard Binarized Statistical Image Features (BSIF) by projecting subwindows of the entire image onto subspaces. | [24] | |
Adaptive hybrid pattern (AHP), which is an LBP variant that is noise robust because a quantization algorithm is applied that uses an equal probability quantization to maximize partition entropy. | [65] | |
Locally Encoded Transform feature histogram (LETRIST) that explicitly encodes the joint information within an image across feature and scale spaces. | [54] | |
CodebookLess Model, which is a dense sampling approach similar to Bag of Features (BoF). | [60] |