TY - JOUR AU - Chih, T. AU - Ru, P. AU - Shamma, S. PY - 2005 DA - 2005// TI - Multiresolution spectrotemporal analysis of complex sounds JO - J Acoust Soc Am VL - 118 UR - https://doi.org/10.1121/1.1945807 DO - 10.1121/1.1945807 ID - Chih2005 ER - TY - STD TI - Y Lecun, Y Bengio, in The Handbook of Brain Theory and Neural Networks, ed. by MA Arbib. Convolutional networks for images, speech and time series (MIT PressCambridge, 1995), pp. 255–258. ID - ref2 ER - TY - JOUR AU - Mohamed, A. AU - Dahl, G. E. AU - Hinton, G. PY - 2012 DA - 2012// TI - Acoustic modeling using deep belief networks JO - IEEE Trans ASLP VL - 20 ID - Mohamed2012 ER - TY - JOUR AU - Dahl, G. E. AU - Yu, D. AU - Deng, L. AU - Acero, A. PY - 2012 DA - 2012// TI - Context-dependent pre-trained deep neural networks for large vocabulary speech recognition JO - IEEE Trans ASLP VL - 20 ID - Dahl2012 ER - TY - STD TI - F Seide, G Li, L Chen, D Yu, in Proc ASRU. Feature engineering in context-dependent deep neural networks for conversational speech transcription, (2011), pp. 24–29. ID - ref5 ER - TY - STD TI - N Jaitly, P Nguyen, A Senior, V Vanhoucke, in Proc Interspeech. Application of pretrained deep neural networks to large vocabulary speech recognition, (2012). ID - ref6 ER - TY - STD TI - O Abdel-Hamid, A Mohamed, H Jiang, G Penn, in Proc ICASSP. Applying convolutional neural network concepts to hybrid NN-HMM model for speech recognition, (2012), pp. 4277–4280. ID - ref7 ER - TY - STD TI - L Deng, O Abdel-Hamid, D Yu, in Proc ICASSP. A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion, (2013), pp. 6669–6673. ID - ref8 ER - TY - STD TI - TN Sainath, A Mohamed, B Kingsbury, B Ramabhadran, in Proc ICASSP. Deep convolutional neural networks for LVCSR, (2013), pp. 8614–8618. ID - ref9 ER - TY - STD TI - L Tóth, in Proc ICASSP. Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition, (2014), pp. 190–194. ID - ref10 ER - TY - STD TI - O Abdel-Hamid, L Deng, D Yu, in Proc Interspeech. Exploring convolutional neural network structures and optimization techniques for speech recognition, (2013), pp. 3366–3370. ID - ref11 ER - TY - STD TI - TN Sainath, B Kingsbury, A Mohamed, G Dahl, G Saon, H Soltau, T Beran, A Aravkin, B Ramabhadran, in Proc ASRU. Improvements to deep convolutional neural networks for LVCSR, (2013), pp. 315–320. ID - ref12 ER - TY - STD TI - TN Sainath, A Mohamed, B Kingsbury, B Ramabhadran, in Proc ICASSP. Joint training of convolutional and non-convolutional neural networks, (2014), pp. 5572–5576. ID - ref13 ER - TY - JOUR AU - Sainath, T. N. AU - Kingsbury, B. AU - Saon, G. AU - Soltau, H. AU - Mohamed, A. AU - Dahl, G. AU - Ramabhadran, B. PY - 2015 DA - 2015// TI - Deep convolutional neural networks for large-scale speech tasks JO - Neural Netw VL - 64 UR - https://doi.org/10.1016/j.neunet.2014.08.005 DO - 10.1016/j.neunet.2014.08.005 ID - Sainath2015 ER - TY - STD TI - IJ Goodfellow, D Warde-Farley, M Mirza, A Courville, Y Bengio, in Proc ICML. Maxout networks, (2013), pp. 1319–1327. ID - ref15 ER - TY - STD TI - X Glorot, A Bordes, Y Bengio, in Proc AISTATS. Deep sparse rectifier neural networks, (2011). ID - ref16 ER - TY - STD TI - M Cai, Y Shi, J Liu, in Proc ASRU. Deep maxout neural networks for speech recognition, (2013), pp. 291–296. ID - ref17 ER - TY - STD TI - Y Miao, F Metze, S Rawat, in Proc ASRU. Deep maxout networks for low-resource speech recognition, (2013), pp. 398–403. ID - ref18 ER - TY - STD TI - P Swietojanski, J Li, JT Huang, in Proc ICASSP. Investigation of maxout networks for speech recognition, (2014), pp. 7649–7653. ID - ref19 ER - TY - STD TI - X Zhang, J Trmal, D Povey, S Khudanpur, in Proc ICASSP. Improving deep neural network acoustic models using generalized maxout networks, (2014), pp. 215–219. ID - ref20 ER - TY - STD TI - K Veselý, M Karafiát, F Grézl, in Proc ASRU. Convolutive bottleneck network features for LVCSR, (2011), pp. 42–47. ID - ref21 ER - TY - JOUR AU - Ketabdar, H. AU - Bourlard, H. PY - 2010 DA - 2010// TI - Enhanced phone posteriors for improving speech recognition systems JO - IEEE Trans ASLP VL - 18 ID - Ketabdar2010 ER - TY - JOUR AU - Pinto, J. AU - Sivaram, G. S. V. S. AU - Magimai-Doss, M. AU - Hermansky, H. AU - Bourlard, H. PY - 2010 DA - 2010// TI - Analysis of MLP based hierarchical phoneme posterior probability estimator JO - IEEE Trans ASLP VL - 19 ID - Pinto2010 ER - TY - STD TI - D Vasquez, R Gruhn, W Minker, Hierarchical neural network structures for phoneme recognition (Springer, Berlin, 2013). ID - ref24 ER - TY - STD TI - L Tóth, in Proc ICASSP. A hierarchical, context-dependent neural network architecture for improved phone recognition, (2011), pp. 5040–5043. ID - ref25 ER - TY - STD TI - Y Zhang, E Chuangsuwanich, J Glass, in Proc Interspeech. Language ID-based training of multilingual stacked bottleneck features, (2014), pp. 1–5. ID - ref26 ER - TY - STD TI - L Tóth, in Proc Interspeech. Convolutional deep rectifier neural nets for phone recognition, (2013), pp. 1722–1726. ID - ref27 ER - TY - STD TI - GE Hinton, N Srivastava, A Krizhevsky, I Sutskever, R Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors. CoRR. abs/1207.0580 (2012). ID - ref28 ER - TY - STD TI - M Cai, Y Shi, J Liu, in Proc ICASSP. Stochastic pooling maxout networks for low-resource speech recognition, (2014), pp. 3266–3270. ID - ref29 ER - TY - JOUR AU - Lee, K. -. F. AU - Hon, H. -. W. PY - 1989 DA - 1989// TI - Speaker-independent phone recognition using hidden Markov models JO - IEEE Trans ASSP VL - 37 UR - https://doi.org/10.1109/29.46546 DO - 10.1109/29.46546 ID - Lee1989 ER - TY - STD TI - X Glorot, Y Bengio, in Proc AISTATS. Understanding the difficulty of training deep feedforward neural networks, (2010), pp. 249–256. ID - ref31 ER - TY - STD TI - B Kingsbury, in Proc ICASSP. Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling, (2009), pp. 3761–3764. ID - ref32 ER - TY - STD TI - K Veselý, A Ghoshal, L Burget, D Povey, in Proc. Interspeech. Sequence-discriminative training of deep neural networks, (2013), pp. 2345–2349. ID - ref33 ER - TY - STD TI - H Bourlard, N Morgan, Connectionist speech recognition—a hybrid approach (Kluwer, Boston, 1994). ID - ref34 ER - TY - STD TI - L Tóth, in Proc ICASSP. Phone recognition with deep sparse rectifier neural networks, (2013), pp. 6985–6989. ID - ref35 ER - TY - STD TI - GE Dahl, TN Sainath, GE Hinton, in Proc ICASSP. Improving deep neural networks for LVCSR using rectified linear units and dropout, (2013), pp. 8609–8613. ID - ref36 ER - TY - STD TI - MD Zeiler, M Ranzato, R Monga, M Mao, K Yang, QV Le, P Nguyen, A Senior, V Vanhoucke, J Dean, GE Hinton, in Proc ICASSP. On rectified linear units for speech processing, (2013), pp. 3517–3521. ID - ref37 ER - TY - STD TI - AL Maas, AY Hannun, AY Ng, in Proc ICML. Rectifier nonlinearities improve neural network acoustic models, (2013). ID - ref38 ER - TY - STD TI - J-T Huang, J Li, Y Gong, in Proc ICASSP. An analysis of convolutional neural networks for speech recognition, (2015), pp. 4989–4993. ID - ref39 ER - TY - STD TI - Y Miao, F Metze, in Proc Interspeech. Convolutional neural networks for language-universal feature extraction and cross-language hybrid systems, (2014), pp. 800–804. ID - ref40 ER - TY - STD TI - M Cai, Y Shi, J Kang, J Liu, T Su, in Proc ISCSLP. Convolutional maxout neural networks for low-resource speech recognition, (2014), pp. 133–137. ID - ref41 ER - TY - STD TI - S Renals, P Swietojanski, in Proc HSCMA. Neural networks for distant speech recognition, (2014). ID - ref42 ER - TY - STD TI - MD Zeiler, R Fergus, Stochastic pooling for regularization of deep convolutional neural networks. CoRR. abs/1301.3557 (2013). ID - ref43 ER - TY - STD TI - H Hermansky, D Ellis, S Sharma, in Proc ICASSP. Tandem connectionist feature extraction for conventional HMM systems, (2000), pp. 1635–1638. ID - ref44 ER - TY - STD TI - C Plahl, R Schlüter, H Ney, in Proc Interspeech. Hierarchical bottle neck features for LVCSR, (2010), pp. 1197–1200. ID - ref45 ER - TY - STD TI - D Vásquez, G Aradilla, R Gruhn, W Minker, in Proc ASRU. A hierarchical structure for modeling inter and intra phonetic information for phoneme recognition, (2009), pp. 124–129. ID - ref46 ER - TY - STD TI - T Grósz, L Tóth, in Text, Speech and Dialogue, ed. by I Habernal, V Matousek. A comparison of deep neural network training methods for large vocabulary speech recognition (Springer, Berlin, 2013), pp. 36–43. ID - ref47 ER - TY - STD TI - G Gosztolya, T Grósz, L Tóth, D Imseng, in Proc ICASSP. Building context-dependent DNN acoustic models using Kullback-Leibler divergence-based state tying, (2015), pp. 4570–4574. ID - ref48 ER - TY - STD TI - L Deng, J Chen, in Proc ICASSP. Sequence classification using the high-level features extracted from deep neural networks, (2014), pp. 6844–6848. ID - ref49 ER - TY - STD TI - C Plahl, TN Sainath, B Ramabhadran, D Nahamoo, in Proc ICASSP. Improved pre-training of deep belief networks using sparse encoding symmetric machines, (2012), pp. 4165–4168. ID - ref50 ER - TY - STD TI - O Abdel-Hamid, H Jiang, in Proc Interspeech. Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition, (2013), pp. 1248–1252. ID - ref51 ER - TY - STD TI - A Graves, A Mohamed, GE Hinton, in Proc ICASSP. Speech recognition with deep recurrent neural networks, (2013), pp. 6645–6649. ID - ref52 ER - TY - STD TI - V Peddinti, TN Sainath, S Maymon, B Ramabhadran, D Nahamoo, V Goel, in Proc ICASSP. Deep scattering spectrum with deep neural networks, (2014), pp. 210–214. ID - ref53 ER -