TY - JOUR AU - Sohn, J. AU - Kim, N. S. AU - Sung, W. PY - 1999 DA - 1999// TI - A statistical model-based voice activity detection JO - IEEE Signal Proc. Lett. VL - 6 UR - https://doi.org/10.1109/97.736233 DO - 10.1109/97.736233 ID - Sohn1999 ER - TY - JOUR AU - Zhang, X. -. L. AU - Wang, D. PY - 2016 DA - 2016// TI - Boosting contextual information for deep neural network based voice activity detection JO - IEEE/ACM Trans. Audio, Speech Lang. Process. (TASLP) VL - 24 UR - https://doi.org/10.1109/TASLP.2015.2505415 DO - 10.1109/TASLP.2015.2505415 ID - Zhang2016 ER - TY - STD TI - R. Zazo, T. N. Sainath, G. Simko, C. Parada, in Interspeech 2016. Feature learning with raw-waveform CLDNNs for Voice Activity Detection, (2016), pp. 3668–3672. https://doi.org/10.21437/Interspeech.2016-268. ID - ref3 ER - TY - JOUR AU - Minami, K. AU - Akutsu, A. AU - Hamada, H. AU - Tonomura, Y. PY - 1998 DA - 1998// TI - Video handling with music and speech detection JO - IEEE MultiMedia VL - 5 UR - https://doi.org/10.1109/93.713301 DO - 10.1109/93.713301 ID - Minami1998 ER - TY - CHAP AU - Seyerlehner, K. AU - Pohle, T. AU - Schedl, M. AU - Widmer, G. PY - 2007 DA - 2007// TI - Automatic music detection in television productions BT - Proc. of the 10th International Conference on Digital Audio Effects (DAFx’07) PB - SCRIME / LaBRI CY - Bordeaux ID - Seyerlehner2007 ER - TY - CHAP AU - Temko, A. AU - Malkin, R. AU - Zieger, C. AU - Macho, D. AU - Nadeu, C. AU - Omologo, M. PY - 2007 DA - 2007// TI - CLEAR Evaluation of Acoustic Event Detection and Classification Systems BT - Proceedings of the 1st International Evaluation Conference on Classification of Events, Activities and Relationships. CLEAR’06 PB - Springer CY - Berlin ID - Temko2007 ER - TY - JOUR AU - Stowell, D. AU - Giannoulis, D. AU - Benetos, E. AU - Lagrange, M. AU - Plumbley, M. D. PY - 2015 DA - 2015// TI - Detection and Classification of Acoustic Scenes and Events JO - IEEE Trans. Multimedia VL - 17 UR - https://doi.org/10.1109/TMM.2015.2428998 DO - 10.1109/TMM.2015.2428998 ID - Stowell2015 ER - TY - JOUR AU - Russakovsky, O. AU - Deng, J. AU - Su, H. AU - Krause, J. AU - Satheesh, S. AU - Ma, S. AU - Huang, Z. AU - Karpathy, A. AU - Khosla, A. AU - Bernstein, M. AU - Berg, A. C. AU - Fei-Fei, L. PY - 2015 DA - 2015// TI - ImageNet large scale visual recognition challenge JO - Int. J. Comput. Vis. (IJCV) VL - 115 UR - https://doi.org/10.1007/s11263-015-0816-y DO - 10.1007/s11263-015-0816-y ID - Russakovsky2015 ER - TY - CHAP AU - Gemmeke, J. F. AU - Ellis, D. P. W. AU - Freedman, D. AU - Jansen, A. AU - Lawrence, W. AU - Moore, R. C. AU - Plakal, M. AU - Ritter, M. PY - 2017 DA - 2017// TI - Audio Set: An ontology and human-labeled dataset for audio events BT - Proc. IEEE ICASSP 2017 PB - IEEE CY - New Orleans ID - Gemmeke2017 ER - TY - CHAP AU - Hershey, S. AU - Chaudhuri, S. AU - Ellis, D. P. AU - Gemmeke, J. F. AU - Jansen, A. AU - Moore, R. C. AU - Plakal, M. AU - Platt, D. AU - Saurous, R. A. AU - Seybold, B. PY - 2017 DA - 2017// TI - CNN architectures for large-scale audio classification BT - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017 PB - IEEE CY - New Orleans UR - https://doi.org/10.1109/ICASSP.2017.7952132 DO - 10.1109/ICASSP.2017.7952132 ID - Hershey2017 ER - TY - STD TI - Y. Xu, Q. Kong, W. Wang, M. D. Plumbley, Large-scale weakly supervised audio classification using gated convolutional neural network. CoRR. abs/1710.00343: (2017). 1710.00343. UR - http://arxiv.org/abs/1710.00343 ID - ref11 ER - TY - STD TI - Q. Kong, Y. Xu, W. Wang, M. D. Plumbley, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Audio set classification with attention model: A probabilistic perspective, (2018), pp. 316–320. https://doi.org/10.1109/ICASSP.2018.8461392. ID - ref12 ER - TY - STD TI - Q. Kong, C. Yu, T. Iqbal, Y. Xu, W. Wang, M. D. Plumbley, weakly labelled audioset tagging with attention neural networks (2019). 1903.00765. UR - http://arxiv.org/abs/1903.00765 ID - ref13 ER - TY - STD TI - A. Graves. Supervised sequence labelling. Supervised sequence labelling with recurrent neural networks (SpringerBerlin/Heidelberg, 2012), pp. 5–13. ID - ref14 ER - TY - JOUR AU - Elman, J. L. PY - 1990 DA - 1990// TI - Finding structure in time JO - Cogn. Sci. VL - 14 UR - https://doi.org/10.1207/s15516709cog1402_1 DO - 10.1207/s15516709cog1402_1 ID - Elman1990 ER - TY - JOUR AU - Bengio, Y. AU - Simard, P. AU - Frasconi, P. PY - 1994 DA - 1994// TI - Learning long-term dependencies with gradient descent is difficult JO - IEEE Trans. Neural Netw. VL - 5 UR - https://doi.org/10.1109/72.279181 DO - 10.1109/72.279181 ID - Bengio1994 ER - TY - JOUR AU - Hochreiter, S. AU - Schmidhuber, J. PY - 1997 DA - 1997// TI - Long Short-Term Memory JO - Neural Comput. VL - 9 UR - https://doi.org/10.1162/neco.1997.9.8.1735 DO - 10.1162/neco.1997.9.8.1735 ID - Hochreiter1997 ER - TY - JOUR AU - Krizhevsky, A. AU - Sutskever, I. AU - Hinton, G. E. PY - 2017 DA - 2017// TI - ImageNet classification with deep convolutional neural networks JO - Commun. ACM VL - 60 UR - https://doi.org/10.1145/3065386 DO - 10.1145/3065386 ID - Krizhevsky2017 ER - TY - CHAP AU - Sainath, T. N. AU - Weiss, R. J. AU - Senior, A. AU - Wilson, K. W. AU - Vinyals, O. PY - 2015 DA - 2015// TI - Learning the speech front-end with raw waveform CLDNNs BT - Interspeech 2015 PB - International Speech Communication Association CY - Dresden ID - Sainath2015 ER - TY - STD TI - M. Ravanelli, Y. Bengio, Speaker Recognition from raw waveform with SincNet (2018). arXiv preprint arXiv:1808.00158. ID - ref20 ER - TY - CHAP AU - van den Oord, A. AU - Dieleman, S. AU - Zen, H. AU - Simonyan, K. AU - Vinyals, O. AU - Graves, A. AU - Kalchbrenner, N. AU - Senior, A. AU - Kavukcuoglu, K. PY - 2016 DA - 2016// TI - Wavenet: A generative model for raw audio BT - 9th ISCA Speech Synthesis Workshop PB - International Speech Communication Association CY - Sunnyvale ID - van den Oord2016 ER - TY - JOUR AU - Gonzalez-Dominguez, J. AU - Lopez-Moreno, I. AU - Moreno, P. J. AU - Gonzalez-Rodriguez, J. PY - 2015 DA - 2015// TI - Frame-by-frame language identification in short utterances using deep neural networks JO - Neural Netw. VL - 64 UR - https://doi.org/10.1016/j.neunet.2014.08.006 DO - 10.1016/j.neunet.2014.08.006 ID - Gonzalez-Dominguez2015 ER - TY - JOUR AU - Zazo, R. AU - Lozano-Diez, A. AU - Gonzalez-Dominguez, J. AU - Toledano, D. T. AU - Gonzalez-Rodriguez, J. PY - 2016 DA - 2016// TI - Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks JO - PLoS ONE VL - 11 UR - https://doi.org/10.1371/journal.pone.0146917 DO - 10.1371/journal.pone.0146917 ID - Zazo2016 ER - TY - CHAP AU - Lozano-Diez, A. AU - Silnova, A. AU - Matejka, P. AU - Glembek, O. AU - Plchot, O. AU - Pesán, J. AU - Burget, L. AU - González-Rodríguez, J. PY - 2016 DA - 2016// TI - Analysis and optimization of bottleneck features for speaker recognition BT - Odyssey PB - International Speech Communication Association CY - Bilbao ID - Lozano-Diez2016 ER - TY - STD TI - Y. Zhang, M. Pezeshki, P. Brakel, S. Zhang, C. Laurent, Y. Bengio, A. Courville, in Interspeech 2016. Towards end-to-end speech recognition with deep convolutional neural networks, (2016), pp. 410–414. https://doi.org/10.21437/Interspeech.2016-1446. ID - ref25 ER - TY - CHAP AU - Graves, A. AU - Jaitly, N. PY - 2014 DA - 2014// TI - Towards end-to-end speech recognition with recurrent neural networks BT - Proceedings of the 31st International Conference on Machine Learning PB - PMLR CY - Bejing ID - Graves2014 ER - TY - CHAP AU - Toledano, D. T. AU - Fernández-Gallego, M. P. AU - Lozano-Diez, A. PY - 2018 DA - 2018// TI - Multi-resolution speech analysis for automatic speech recognition using deep neural networks: Experiments on timit BT - PloS One PB - Public Library of Science CY - San Francisco ID - Toledano2018 ER - TY - CHAP AU - Jeong, I. -. Y. AU - Lee, K. PY - 2016 DA - 2016// TI - Learning temporal features using a deep neural network and its application to music genre classification BT - ISMIR 2016, 7th International Society for Music Information Retrieval Conference PB - ISMIR CY - New York City ID - Jeong2016 ER - TY - CHAP AU - Korzeniowski, F. AU - Widmer, G. PY - 2017 DA - 2017// TI - End-to-end musical key estimation using a convolutional neural network BT - 25th European Signal Processing Conference (EUSIPCO-2017) PB - EURASIP CY - Kos island UR - https://doi.org/10.23919/EUSIPCO.2017.8081351 DO - 10.23919/EUSIPCO.2017.8081351 ID - Korzeniowski2017 ER - TY - JOUR AU - Stevens, S. S. AU - Volkmann, J. AU - Newman, E. B. PY - 1937 DA - 1937// TI - A scale for the measurement of the psychological magnitude pitch JO - J. Acoust. Soc. Am. VL - 8 UR - https://doi.org/10.1121/1.1915893 DO - 10.1121/1.1915893 ID - Stevens1937 ER - TY - STD TI - D. P. Kingma, J. Ba, in ICLR 2015, 3rd International Conference for Learning Representations, San Diego, vol. abs/1412.6980. Adam: A method for stochastic optimization, (2014). http://arxiv.org/abs/1412.6980. UR - http://arxiv.org/abs/1412.6980 ID - ref31 ER - TY - STD TI - F. Chollet, et al., Keras (2015). https://keras.io (accessed on 14 Jan 2019). UR - https://keras.io ID - ref32 ER - TY - CHAP AU - Abadi, M. AU - Barham, P. AU - Chen, J. AU - Chen, Z. AU - Davis, A. AU - Dean, J. AU - Devin, M. AU - Ghemawat, S. AU - Irving, G. AU - Isard, M. PY - 2016 DA - 2016// TI - TensorFlow: A System for Large-Scale Machine Learning BT - OSDI ’16, 12th USENIX Symposium on Operating Systems Design and Implementation PB - USENIX CY - Savannah ID - Abadi2016 ER - TY - JOUR AU - Srivastava, N. AU - Hinton, G. AU - Krizhevsky, A. AU - Sutskever, I. AU - Salakhutdinov, R. PY - 2014 DA - 2014// TI - Dropout: a simple way to prevent neural networks from overfitting JO - J. Mach. Learn. Res. VL - 15 ID - Srivastava2014 ER - TY - JOUR AU - Zhai, C. AU - Lafferty, J. PY - 2004 DA - 2004// TI - A study of smoothing methods for language models applied to information retrieval JO - ACM Trans. Inf. Syst. (TOIS) VL - 22 UR - https://doi.org/10.1145/984321.984322 DO - 10.1145/984321.984322 ID - Zhai2004 ER -