Skip to main content

Table 3 Performance of different approaches on each animal sound dataset

From: Ensemble of convolutional neural networks to improve animal audio classification

ApproachDescriptorBIRDBIRDZWHALEBAT
Handcrafted features with SVMAcoustic features80.282.185.8
 LBP85.887.090.691.2
 LBP-HF85.086.289.992.6
 LBP-RI86.187.591.093.0
 MLPQ87.588.892.193.5
 HASC87.989.192.092.9
 LHF86.086.990.591.9
 GABOR87.387.290.390.9
 BSIF88.887.590.492.4
 AHP84.477.589.992.1
 LETRIST67.775.690.389.5
 BoF89.960.487.294.2
Deep learning using the four types of audio imagesCNN (Fig. 3)61.884.493.598.6
 AlexNet79.888.995.597.8
 GoogleNet77.886.194.895.9
 Vgg-1683.690.496.690.1
 Vgg-1986.389.696.688.6
 ResNet5081.988.996.193.7
 InceptionV382.388.596.585.9
Ensembles of deep learningFus_Spec87.991.096.697.3
 Fus_HP49.888.195.2
 Fus_Scatter46.691.396.7
 Fus_Spec + Fus_HP + Fus_Scatter87.293.997.197.3
 Fus_Spec + Fus_Scatter87.994.897.297.3
 Fus_Spec + Fus_Scatter + CNN84.095.196.198.7
Ensembles of DL and handcraftedFus_Spec + Fus_Scatter + CNN + Fus_Hand94.199.095.999.3
 Fus_Spec + Fus_Scatter + Fus_Hand94.798.996.598.9
Related worksDeep learning, acoustic, and visual features [36]94.893.3
 Acoustic and visual features [39]94.592.2
 MFCC + SVM [64]93.6
 DFT + SVM [62]92.0
  1. The rates are described using accuracy, except for the WHALE dataset, in which the rates are in AUC-ROC
  2. *Fus_Scatter and Fus_HP were not used in this result once they were not available for BAT
  3. The metric used for the WHALE dataset is AUC-ROC