TY - STD TI - W. Xiong, L. Wu, F. Alleva, J. Droppo, X. Huang, A. Stolcke, The Microsoft 2017 Conversational Speech Recognition System (2017). https://arxiv.org/abs/1708.06073. UR - https://arxiv.org/abs/1708.06073 ID - ref1 ER - TY - STD TI - T. T. Ping, Automatic speech recognition for non-native speakers. PhD thesis, Université Joseph-Fourier - Grenoble (2008). ID - ref2 ER - TY - STD TI - A. Metallinou, J. Cheng, in Fifteenth Annual Conference of the International Speech Communication Association. Using deep neural networks to improve proficiency assessment for children English language learners, (2014). ID - ref3 ER - TY - STD TI - T. Drugman, T. Dutoit, in INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, United Kingdom, September 6-10, 2009. Glottal closure and opening instant detection from speech signals (ISCA, 2009), pp. 2891–2894. http://www.isca-speech.org/archive/interspeech_2009/i09_2891.html. UR - http://www.isca-speech.org/archive/interspeech_2009/i09_2891.html ID - ref4 ER - TY - STD TI - K. Livescu, J. Glass, in 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), vol. 3. Lexical modeling of non-native speech for automatic speech recognition, (2000), pp. 1683–1686. https://doi.org/10.1109/ICASSP.2000.862074. ID - ref5 ER - TY - STD TI - T. Tan, L. Besacier, in 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP ’07, vol. 4. Acoustic Model Interpolation for Non-Native Speech Recognition, (2007), pp. IV-1009–IV-1012. https://doi.org/10.1109/ICASSP.2007.367243. ID - ref6 ER - TY - STD TI - L. M. Tomokiyo, Recognizing non-native speech: characterizing and adapting to non-native usage in LVCSR. PhD thesis, Carnegie Mellon University (2001). ID - ref7 ER - TY - JOUR AU - Abdel-Hamid, O. AU - Mohamed, A. -. R. AU - Jiang, H. AU - Deng, L. AU - Penn, G. AU - Yu, D. PY - 2014 DA - 2014// TI - Convolutional neural networks for speech recognition JO - IEEE/ACM Trans. Audio Speech Lang. Proc. VL - 22 UR - https://doi.org/10.1109/TASLP.2014.2339736 DO - 10.1109/TASLP.2014.2339736 ID - Abdel-Hamid2014 ER - TY - JOUR AU - Dave, N. PY - 2013 DA - 2013// TI - Feature extraction methods LPC, PLP and MFCC in speech recognition JO - Int. J. Adv. Res. Eng. Technol. VL - 1 ID - Dave2013 ER - TY - JOUR AU - Dehak, N. AU - Kenny, P. J. AU - Dehak, R. AU - Dumouchel, P. AU - Ouellet, P. PY - 2011 DA - 2011// TI - Front-end factor analysis for speaker verification JO - Trans. Audio Speech Lang. Proc. VL - 19 UR - https://doi.org/10.1109/TASL.2010.2064307 DO - 10.1109/TASL.2010.2064307 ID - Dehak2011 ER - TY - JOUR AU - Li, M. PY - 2013 DA - 2013// TI - Automatic speaker age and gender recognition using acoustic and prosodic level information fusion JO - Comput. Speech Lang. VL - 27 UR - https://doi.org/10.1016/j.csl.2012.01.008 DO - 10.1016/j.csl.2012.01.008 ID - Li2013 ER - TY - STD TI - A. Graves, A. Mohamed, G. Hinton, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 38. Speech recognition with deep recurrent neural networks, (2013), pp. 6645–6649. https://doi.org/10.1109/ICASSP.2013.6638947. ID - ref12 ER - TY - STD TI - D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, E. Elsen, J. Engel, L. Fan, C. Fougner, T. Han, A. Hannun, B. Jun, P. LeGresley, L. Lin, S. Narang, A. Ng, S. Ozair, R. Prenger, J. Raiman, S. Satheesh, D. Seetapun, S. Sengupta, Y. Wang, Z. Wang, C. Wang, B. Xiao, D. Yogatama, J. Zhan, Z. Zhu, Deep speech 2: end-to-end speech recognition in English and Mandarin (2015). https://arxiv.org/abs/1512.02595. UR - https://arxiv.org/abs/1512.02595 ID - ref13 ER - TY - JOUR AU - Hinton, G. E. AU - Deng, L. AU - Yu, D. AU - Dahl, G. E. AU - Mohamed, A. AU - Jaitly, N. AU - Senior, A. AU - Vanhoucke, V. AU - Nguyen, P. AU - Sainath, T. N. AU - Kingsbury, B. PY - 2012 DA - 2012// TI - Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups JO - IEEE Signal Process. Mag. VL - 29 UR - https://doi.org/10.1109/MSP.2012.2205597 DO - 10.1109/MSP.2012.2205597 ID - Hinton2012 ER - TY - STD TI - K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR. abs/1409.1556: (2015). ID - ref15 ER - TY - STD TI - K. Radzikowski, L. Wang, O. Yoshie, R. Nowak, Dual supervised learning for non-native speech recognition. EURASIP J. Audio Speech Music Process.2019(3), 1–10 (2019). https://doi.org/10.1186/s13636-018-0146-4. https://rdcu.be/bgUxy. UR - https://rdcu.be/bgUxy ID - ref16 ER - TY - STD TI - R. Kacper, W. Le, Y. Osamu, in Proceedings of the Conference of Institute of Electrical Engineers of Japan, Electronics and Information Systems Division. Non-native english speaker’s speech correction, based on domain focused document, (2016). ID - ref17 ER - TY - CHAP AU - Kacper, R. AU - Le, W. AU - Osamu, Y. PY - 2016 DA - 2016// TI - Non-native english speakers’ speech correction, based on domain focused document BT - Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services (iiWAS ’16) PB - ACM CY - New York ID - Kacper2016 ER - TY - STD TI - R. Kacper, W. Le, Y. Osamu, in Proceedings of the conference of institute of electrical engineers of japan, electronics and information systems division. Non-native speech recognition using characteristic speech features, with respect to nationality, (2017). ID - ref19 ER - TY - STD TI - L. A. Gatys, A. S. Ecker, M. Bethge, A Neural Algorithm of Artistic Style (2015). https://arxiv.org/abs/1508.06576. UR - https://arxiv.org/abs/1508.06576 ID - ref20 ER - TY - STD TI - J. Johnson, A. Alahi, L. Fei-Fei, Perceptual losses for real-time style transfer and super-resolution (2016). https://arxiv.org/abs/1603.08155. UR - https://arxiv.org/abs/1603.08155 ID - ref21 ER - TY - STD TI - E. Grinstein, N. Q. K. Duong, A. Ozerov, P. Pérez, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Audio Style Transfer (IEEE, 2018). doi:10.1109/icassp.2018.8461711. ID - ref22 ER - TY - STD TI - P. Verma, J. O. Smith, neural style transfer for audio spectograms (2018). https://arxiv.org/abs/1801.01589. UR - https://arxiv.org/abs/1801.01589 ID - ref23 ER - TY - STD TI - K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014). ID - ref24 ER - TY - STD TI - F. Chollet, et al., Keras. GitHub (2015). https://github.com/fchollet/keras. UR - https://github.com/fchollet/keras ID - ref25 ER - TY - CHAP AU - Graves, A. AU - Fernández, S. AU - Gomez, F. AU - Schmidhuber, J. PY - 2006 DA - 2006// TI - Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks BT - Proceedings of the 23rd International Conference on Machine Learning (ICML ’06) PB - Association for Computing Machinery CY - New York UR - https://doi.org/10.1145/1143844.1143891 DO - 10.1145/1143844.1143891 ID - Graves2006 ER - TY - STD TI - B. Hixon, E. Schneider, S. Epstein, in INTERSPEECH. Phonemic Similarity Metrics to Compare Pronunciation Methods, (2011). ID - ref27 ER -