Skip to main content

Table 1 Acoustic and visual handcrafted features

From: Ensemble of convolutional neural networks to improve animal audio classification

AcousticStatistical Spectrum Descriptors (SSD) is a set of statistical measures that describe audio content taken from the moments on the Sonogram (the Sone) of each of the twenty-four critical bands defined according to the Bark scale.[49]
 Rhythm Histogram (RH) is a feature set where the magnitudes of each modulation frequency bin of the twenty-four critical bands defined according to the Bark scale are summed up to form a histogram of “rhythmic energy” per modulation frequency.[49]
 Modulation Frequency Variance Descriptor (MVD) is a 420-dimensional feature vector that measures variation over the critical frequency bands for each modulation frequency.[49]
 Temporal Statistical Spectrum Descriptor (TSSD) is a feature set that incorporates temporal information from the SSD (timbre variations, changes in rhythm, etc.).[14, 44]
 Temporal Rhythm Histograms (TRH) is a feature set that captures rhythmic changes in music over time.[49]
VisualThe multiscale uniform local binary pattern (LBP).[41]
 The multiscale LBP histogram Fourier descriptor (LHF) obtained from the concatenation of LBP-HF.[63]
 The multiscale rotation invariant co-occurrence of adjacent LBPs (LBP-RI).[40]
 The Multiscale Local Phase Quantization (MLPQ).[42]
 Ensemble of LPQ, where different configurations of LPQ are examined.[35]
 The Heterogeneous Auto-Similarities of Characteristics (HASC) descriptor that is applied to heterogeneous dense features maps.[47]
 Ensemble of variants of the LHF.[34]
 The Gabor filter feature extraction method where several different values for scale level and orientation are experimentally evaluated.[17]
 Extracts the standard Binarized Statistical Image Features (BSIF) by projecting subwindows of the entire image onto subspaces.[24]
 Adaptive hybrid pattern (AHP), which is an LBP variant that is noise robust because a quantization algorithm is applied that uses an equal probability quantization to maximize partition entropy.[65]
 Locally Encoded Transform feature histogram (LETRIST) that explicitly encodes the joint information within an image across feature and scale spaces.[54]
 CodebookLess Model, which is a dense sampling approach similar to Bag of Features (BoF).[60]