M. A. Acevedo, C. J. Corrada-Bravo, H. Corrada-Bravo, L. J. Villanueva-Rivera, T. M. Aide, Automated classification of bird and amphibian calls using machine learning: a comparison of methods. Ecol. Inform.4(4), 206–214 (2009).
Article
Google Scholar
J. Andén, S. Mallat, Deep scattering spectrum. IEEE Trans. Signal Process. 62(16), 4114–4128 (2014). https://doi.org/10.1109/TSP.2014.2326991.
Article
MathSciNet
Google Scholar
T. Berg, P. N. Belhumeur, in 2013 IEEE Conference on Computer Vision and Pattern Recognition. Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation, (2013), pp. 955–962. https://doi.org/10.1109/CVPR.2013.128.
S. Branson, G. Van Horn, S. Belongie, P. Perona, Bird species categorization using pose normalized deep convolutional nets. arXiv preprint (2014). arXiv:1406.2952.
J. Bruna, S. Mallat, Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell.35(8), 1872–1886 (2013).
Article
Google Scholar
P. Cano, E. Gómez, F. Gouyon, P. Herrera, M. Koppenberger, B. Ong, X. Serra, S. Streich, N. Wack, Ismir 2004 audio description contest (Music Technology Group of the Universitat Pompeu Fabra, Tech. Rep, 2006).
Z. Cao, J. C Principe, B. Ouyang, F. Dalgleish, A. Vuorenkoski, Marine animal classification using combined cnn and hand-designed image features (IEEE, 2015). https://doi.org/10.23919/oceans.2015.7404375.
Y. M. G Costa, L. S Oliveira, A. L Koerich, F. Gouyon, in Systems, Signals and Image Processing (IWSSIP) 2011 18th International Conference on. Music genre recognition using spectrograms (IEEE, 2011), pp. 1–4.
Y. M. G. Costa, L. S. Oliveira, A. L. Koerich, F. Gouyon, in Systems, Signals and Image Processing (IWSSIP) 2013 20th International Conference on. Music genre recognition based on visual features with dynamic ensemble of classifiers selection (IEEE, 2013), pp. 55–58. https://doi.org/10.1109/iwssip.2013.6623448.
Y. M. G. Costa, L. S. Oliveira, A. L. Koerich, F. Gouyon, J. Martins, Music genre classification using LBP textural features. Signal Process.92(11), 2723–2737 (2012).
Article
Google Scholar
Y. M. G. Costa, L. S. Oliveira, C. N. Silla Jr, An evaluation of convolutional neural networks for music classification using spectrograms. Appl. Soft Comput.52:, 28–38 (2017).
Article
Google Scholar
V. I. Cullinan, S. Matzner, C. A. Duberstein, Classification of birds and bats using flight tracks. Ecol. Inform.27:, 55–63 (2015).
Article
Google Scholar
R. O. Duda, P. E. Hart, D. G. Stork, Pattern Classification and Scene Analysis 2nd ed (Wiley Interscience, 1995).
S. Fagerlund, Bird species recognition using support vector machines. EURASIP J. Adv. Signal Process. 2007:. https://doi.org/10.1155/2007/38637.
D. Fitzgerald, in 13th International Conference on Digital Audio Effects (DAFx-10). Harmonic/percussive separation using median filtering, (2010).
G. K. Freitas, R. L. Aguiar, Y. M. G. Costa, in Computer Science Society (SCCC) 2016 35th International Conference of the Chilean. Using spectrogram to detect north atlantic right whale calls from audio recordings (IEEE, 2016), pp. 1–6. https://doi.org/10.1109/sccc.2016.7836034.
D. Gabor, Theory of communication. part 1 The analysis of information. J. Inst. Electr. Eng. Part III: Radio Commun. Eng.93(26), 429–441 (1946).
Google Scholar
G. Gwardys, D. Grzywczak, Deep image features in music information retrieval. Int. J. Electron. Telecommun.60(4), 321–326 (2014).
Article
Google Scholar
R. M. Haralick, Statistical and structural approaches to texture. Proc. IEEE. 67(5), 786–804 (1979).
Article
Google Scholar
K. He, X. Zhang, S. Ren, J. Sun, in Proceedings of the IEEE conference on computer vision and pattern recognition. Deep residual learning for image recognition, (2016), pp. 770–778. https://doi.org/10.1109/cvpr.2016.90.
M. Dong, Q. Mao, Y. Zhan, in Proceedings of the 22Nd ACM International Conference on Multimedia, MM ’14. Speech emotion recognition using CNN (ACMNew York, 2014), pp. 801–804. https://doi.org/http://doi.acm.org/10.1145/2647868.2654984.
Google Scholar
E. J. Humphrey, J. P. Bello, in Machine Learning and Applications (ICMLA) 2012 11th International Conference on, vol. 2. Rethinking automatic chord recognition with convolutional neural networks (IEEE, 2012), pp. 357–362. https://doi.org/10.1109/icmla.2012.220.
E. J. Humphrey, J. P. Bello, Y. LeCun, in ISMIR. Moving beyond feature design: Deep architectures and automatic feature learning in music informatics, (2012), pp. 403–408.
J. Kannala, E. Rahtu, in Pattern Recognition (ICPR) 2012 21st International Conference on. Bsif: Binarized statistical image features (IEEE, 2012), pp. 1363–1366.
A. Krizhevsky, I. Sutskever, G. E Hinton, in Advances in Neural Information Processing Systems. Imagenet classification with deep convolutional neural networks, (2012), pp. 1097–1105. https://doi.org/10.1145/3065386.
Article
Google Scholar
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E Howard, W. Hubbard, L. D. Jackel, Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989).
Article
Google Scholar
W. Lim, T. Lee, in Signal Processing Conference (EUSIPCO) 2017 25th European. Harmonic and percussive source separation using a convolutional auto encoder (IEEE, 2017), pp. 1804–1808. https://doi.org/10.23919/eusipco.2017.8081520.
D. R. Lucio, Y. M. G. Costa, in Computing Conference (CLEI) 2015 Latin American. Bird species classification using spectrograms (IEEE, 2015), pp. 1–11. https://doi.org/10.1109/clei.2015.7359990.
B. McFee, C. Raffel, D. Liang, D. P Ellis, M. McVicar, E. Battenberg, O. Nieto, in Proceedings of the 14th Python in Science Conference. librosa: Audio and music signal analysis in python, (2015), pp. 18–25. https://doi.org/10.25080/majora-7b98e3ed-003.
V. Mitra, W. Wang, H. Franco, Y. Lei, C. Bartels, M. Graciarena, in Fifteenth Annual Conference of the International Speech Communication Association. Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions, (2014).
A. Montalvo, Y. M. G. Costa, J. R. Calvo, in Iberoamerican Congress on Pattern Recognition. Language identification using spectrogram texture (Springer, 2015), pp. 543–550. https://doi.org/10.1007/978-3-319-25751-8_65.
Chapter
Google Scholar
T. Nakashika, C. Garcia, T. Takiguchi, in Thirteenth Annual Conference of the International Speech Communication Association. Local-feature-map integration using convolutional neural networks for music genre classification, (2012).
L. Nanni, R. L. Aguiar, Y. M. G. Costa, S. Brahnam, C. N. Silla Jr, R. L. Brattin, Z. Zhao, Bird and whale species identification using sound images. IET Comput. Vis. (2017). https://doi.org/10.1049/iet-cvi.2017.0075.
Article
Google Scholar
L. Nanni, S. Brahnam, A. Lumini, Combining different local binary pattern variants to boost performance. Expert Syst. Appl. 38(5), 6209–6216 (2011).
Article
Google Scholar
L. Nanni, S. Brahnam, A. Lumini, T. Barrier. Ensemble of Local Phase Quantization Variants with Ternary Encoding (SpringerBerlin Heidelberg, 2014), pp. 177–188. https://doi.org/10.1007/978-3-642-39289-4_8.
Google Scholar
L. Nanni, Y. M. G. Costa, R. L. Aguiar, C. N. Silla Jr, S. Brahnam, Ensemble of deep learning, visual and acoustic features for music genre classification. J. New Music Res., 1–15 (2018). https://doi.org/10.1080/09298215.2018.1438476.
Article
Google Scholar
L. Nanni, Y. M. G. Costa, S. Brahnam, in 22nd International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision. Set of texture descriptors for music genre classification, (2014).
L. Nanni, Y. M. G. Costa, D. R. Lucio, C. N. Silla Jr., S. Brahnam, in Tools with Artificial Intelligence (ICTAI) 2016 IEEE 28th International Conference on. Combining visual and acoustic features for bird species classification (IEEE, 2016), pp. 396–401. https://doi.org/10.1109/ictai.2016.0067.
L. Nanni, Y. M. G. Costa, D. R Lucio, C. N. Silla Jr, S. Brahnam, Combining visual and acoustic features for audio classification tasks. Pattern Recogn. Lett.88:, 49–56 (2017).
Article
Google Scholar
R. Nosaka, C. H. Suryanto, K. Fukui, in Asian Conference on Computer Vision. Rotation invariant co-occurrence among adjacent lbps (Springer, 2012), pp. 15–25. https://doi.org/10.1007/978-3-642-37410-4_2.
Chapter
Google Scholar
T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Anal. Mach. Intell. IEEE Trans.24(7), 971–987 (2002).
Article
Google Scholar
V. Ojansivu, J. Heikkilä, in Image and Signal Processing, ed. by A. Elmoataz, O. Lezoray, F. Nouboud, and D. Mammass. Blur insensitive texture classification using local phase quantization (SpringerBerlin Heidelberg, 2008), pp. 236–243.
S. Oramas, O. Nieto, F. Barbieri, X. Serra, Multi-label music genre classification from audio, text, and images using deep features. arXiv preprint (2017). arXiv:1707.04916.
F. Pachet, A. Zils, in ISMIR. Automatic extraction of music descriptors from acoustic signals, (2004).
J. Pons, X. Serra, in Acoustics, Speech and Signal Processing (ICASSP) 2017 IEEE International Conference on. Designing efficient architectures for modeling temporal features with convolutional neural networks (IEEE, 2017), pp. 2472–2476. https://doi.org/10.1109/icassp.2017.7952601.
J. Salamon, J. P. Bello, A. Farnsworth, S. Kelling, in Acoustics, Speech and Signal Processing (ICASSP) 2017 IEEE International Conference on. Fusing shallow and deep learning for bioacoustic bird species classification (IEEE, 2017), pp. 141–145. https://doi.org/10.1109/icassp.2017.7952134.
M. San Biagio, M. Crocco, M. Cristani, S. Martelli, V. Murino, in Computer Vision (ICCV) 2013 IEEE International Conference on. Heterogeneous auto-similarities of characteristics (hasc): exploiting relational information for classification (IEEE, 2013), pp. 809–816. https://doi.org/10.1109/iccv.2013.105.
J. Schlüter, S. Böck, in 6th International Workshop on Machine Learning and Music (MML). Musical onset detection with convolutional neural networks (Prague, Czech Republic, 2013).
M. R. Schroeder, B. S. Atal, J. Hall, Optimizing digital speech coders by exploiting masking properties of the human ear. J. Acoust. Soc. Am. 66(6), 1647–1652 (1979).
Article
Google Scholar
L. Sifre, S. Mallat, in ESANN, vol. 44. Combined scattering for rotation invariant texture analysis, (2012), pp. 68–81.
S. Sigtia, S. Dixon, in Acoustics, Speech and Signal Processing (ICASSP) 2014 IEEE International Conference on. Improved music feature learning with deep neural networks (IEEE, 2014), pp. 6959–6963. https://doi.org/10.1109/icassp.2014.6854949.
C. N. Silla Jr, A. L. Koerich, C. A. A. Kaestner, in ISMIR. The latin music database, (2008), pp. 451–456.
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint (2014). arXiv:1409.1556.
T. Song, H. Li, F. Meng, Q. Wu, J. Cai, Letrist: locally encoded transform feature histogram for rotation-invariant texture classification. IEEE Trans. Circ. Syst. Video Technol. (2017). https://doi.org/10.1109/tcsvt.2017.2671899.
Article
Google Scholar
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Rethinking the inception architecture for computer vision, (2016), pp. 2818–2826. https://doi.org/10.1109/cvpr.2016.308.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Going deeper with convolutions, (2015), pp. 1–9. https://doi.org/10.1109/CVPR.2015.7298594.
G. Tzanetakis, P. Cook, Musical genre classification of audio signals. IEEE Trans. Speech Audio Process.10(5), 293–302 (2002).
Article
Google Scholar
C. Wah, S. Branson, P. Welinder, P. Perona, S. Belongie, The Caltech-UCSD Birds-200-2011 Dataset. Tech. Rep. CNS-TR-2011-001 (California Institute of Technology, 2011).
C. Y Wang, A. Santoso, S. Mathulaprangsan, C. C. Chiang, C. H. Wu, J. C. Wang, in Multimedia and Expo (ICME) 2017 IEEE International Conference on. Recognition and retrieval of sound events using sparse coding convolutional neural network (IEEE, 2017), pp. 589–594. https://doi.org/10.1109/icme.2017.8019552.
Q. Wang, P. Li, L. Zhang, W. Zuo, Towards effective codebookless model for image classification. Pattern Recogn.59:, 63–71 (2016).
Article
Google Scholar
J. Xie, M. Zhu, Handcrafted features and late fusion with deep learning for bird sound classification. Ecol. Informa.52:, 74–81 (2019).
Article
Google Scholar
Y. Yovel, M. O. Franz, P. Stilz, H. U. Schnitzler, Plant classification from bat-like echolocation signals. PLoS Comput. Biol.4(3), e1000,032 (2008).
Article
MathSciNet
Google Scholar
G. Zhao, T. Ahonen, J. Matas, M. Pietikainen, Rotation-invariant image and video description with local binary pattern features. IEEE Trans. Image Process.21(4), 1465–1477 (2012).
Article
MathSciNet
Google Scholar
Z. Zhao, S. h. Zhang, Z. y. Xu, K. Bellisario, N. h. Dai, H. Omrani, B. C. Pijanowski, Automated bird acoustic event detection and robust species classification. Ecol. Informa.39:, 99–108 (2017).
Article
Google Scholar
Z. Zhu, X. You, C. P. Chen, D. Tao, W. Ou, X. Jiang, J. Zou, An adaptive hybrid pattern for noise-robust texture analysis. Pattern Recogn.48(8), 2592–2608 (2015).
Article
Google Scholar