Skip to main content

Table 3 Performance of different approaches on each animal sound dataset

From: Ensemble of convolutional neural networks to improve animal audio classification

Handcrafted features with SVMAcoustic features80.282.185.8
Deep learning using the four types of audio imagesCNN (Fig. 3)61.884.493.598.6
Ensembles of deep learningFus_Spec87.991.096.697.3
 Fus_Spec + Fus_HP + Fus_Scatter87.293.997.197.3
 Fus_Spec + Fus_Scatter87.994.897.297.3
 Fus_Spec + Fus_Scatter + CNN84.
Ensembles of DL and handcraftedFus_Spec + Fus_Scatter + CNN + Fus_Hand94.199.095.999.3
 Fus_Spec + Fus_Scatter + Fus_Hand94.798.996.598.9
Related worksDeep learning, acoustic, and visual features [36]94.893.3
 Acoustic and visual features [39]94.592.2
 MFCC + SVM [64]93.6
 DFT + SVM [62]92.0
  1. The rates are described using accuracy, except for the WHALE dataset, in which the rates are in AUC-ROC
  2. *Fus_Scatter and Fus_HP were not used in this result once they were not available for BAT
  3. The metric used for the WHALE dataset is AUC-ROC