An FFT-Based Companding Front End for Noise-Robust Automatic Speech Recognition

Raj, Bhiksha; Turicchia, Lorenzo; Schmidt-Nielsen, Bent; Sarpeshkar, Rahul

doi:10.1155/2007/65420

Research Article
Open access
Published: 26 June 2007

An FFT-Based Companding Front End for Noise-Robust Automatic Speech Recognition

Bhiksha Raj¹,
Lorenzo Turicchia²,
Bent Schmidt-Nielsen¹ &
…
Rahul Sarpeshkar²

EURASIP Journal on Audio, Speech, and Music Processing volume 2007, Article number: 065420 (2007) Cite this article

2436 Accesses
9 Citations
Metrics details

Abstract

We describe an FFT-based companding algorithm for preprocessing speech before recognition. The algorithm mimics tone-to-tone suppression and masking in the auditory system to improve automatic speech recognition performance in noise. Moreover, it is also very computationally efficient and suited to digital implementations due to its use of the FFT. In an automotive digits recognition task with the CU-Move database recorded in real environmental noise, the algorithm improves the relative word error by 12.5% at -5 dB signal-to-noise ratio (SNR) and by 6.2% across all SNRs (-5 dB SNR to +5 dB SNR). In the Aurora-2 database recorded with artificially added noise in several environments, the algorithm improves the relative word error rate in almost all situations.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32]

References

Lippmann RP: Speech recognition by machines and humans. Speech Communication 1997,22(1):1-15. 10.1016/S0167-6393(97)00021-6
Article Google Scholar
Pickles JO: An Introduction to the Physiology of Hearing. Academic Press, London, UK; 1988.
Google Scholar
Seneff S: A joint synchrony/mean-rate model of auditory speech processing. Journal of Phonetics 1988,16(1):55-76.
Google Scholar
Ghitza O: Auditory models and human performance in tasks related to speech coding and speech recognition. IEEE Transactions on Speech and Audio Processing 1994,2(1, part 2):115-132. 10.1109/89.260357
Article Google Scholar
Van Schaik A, Meddis R: Analog very large-scale integrated (VLSI) implementation of a model of amplitude-modulation sensitivity in the auditory brainstem. Journal of the Acoustical Society of America 1999,105(2):811-821. 10.1121/1.426270
Article Google Scholar
Goldstein JL: Modeling rapid waveform compression on the basilar membrane as multiple-bandpass-nonlinearity filtering. Hearing Research 1990,49(1–3):39-60.
Article Google Scholar
Meddis R, O'Mard LP, Lopez-Poveda EA: A computational algorithm for computing nonlinear auditory frequency selectivity. Journal of the Acoustical Society of America 2001,109(6):2852-2861. 10.1121/1.1370357
Article Google Scholar
Jankowski CR Jr., Vo H-DH, Lippmann RP: A comparison of signal processing front ends for automatic word recognition. IEEE Transactions on Speech and Audio Processing 1995,3(4):286-293. 10.1109/89.397093
Article Google Scholar
Davis SB, Mermelstein P: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 1980,28(4):357-366. 10.1109/TASSP.1980.1163420
Article Google Scholar
Hermansky H: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 1990,87(4):1738-1752. 10.1121/1.399423
Article Google Scholar
Strope B, Alwan A: A model of dynamic auditory perception and its application to robust word recognition. IEEE Transactions on Speech and Audio Processing 1997,5(5):451-464. 10.1109/89.622569
Article Google Scholar
Holmberg M, Gelbart D, Hemmert W: Automatic speech recognition with an adaptation model motivated by auditory processing. IEEE Transactions on Audio, Speech, and Language Processing 2006,14(1):43-49.
Article Google Scholar
Tchorz J, Kollmeier B: A model of auditory perception as front end for automatic speech recognition. Journal of the Acoustical Society of America 1999,106(4):2040-2050. 10.1121/1.427950
Article Google Scholar
Hermansky H, Morgan N: RASTA processing of speech. IEEE Transactions on Speech and Audio Processing 1994,2(4):578-589. 10.1109/89.326616
Article Google Scholar
Moore BCJ: An Introduction to the Psychology of Hearing. 4th edition. Academic Press, New York, NY, USA; 1997.
Google Scholar
Turicchia L, Sarpeshkar R: The silicon cochlea: from biology to bionics. In Biophysics of the Cochlea: From Molecules to Models. Edited by: Gummer AW. World Scientific, Singapore; 2003:417-423.
Chapter Google Scholar
Turicchia L, Sarpeshkar R: A bio-inspired companding strategy for spectral enhancement. IEEE Transactions on Speech and Audio Processing 2005,13(2):243-253.
Article Google Scholar
Oxenham AJ, Simonson AM, Turicchia L, Sarpeshkar R: Evaluation of companding-based spectral enhancement using simulated cochlear-implant processing. Journal of the Acoustical Society of America 2007,121(3):1709-1716. 10.1121/1.2434757
Article Google Scholar
Bhattacharya A, Zeng F-G: Companding to improve cochlear implants' speech processing in noise. Proceedings of Conference on Implantable Auditory Prostheses, July-August 2005, Pacific Grove, Calif, USA
Google Scholar
Lee YW, Kwon SY, Ji YS, et al.: Speech enhancement in noise environment using companding strategy. Proceedings of the 5th Asia Pacific Symposium on Cochlear Implant and Related Sciences (APSCI '05), November 2005, Hong Kong
Google Scholar
Loizou PC, Kasturi K, Turicchia L, Sarpeshkar R, Dorman M, Spahr T: Evaluation of the companding and other strategies for noise reduction in cochlear implants. Proceedings of Conference on Implantable Auditory Prostheses, July-August 2005, Pacific Grove, Calif, USA
Google Scholar
Turicchia L, Kasturi K, Loizou PC, Sarpeshkar R: Evaluation of the companding algorithm for noise reduction in cochlear implants. submitted for publication
Guinness J, Raj B, Schmidt-Nielsen B, Turicchia L, Sarpeshkar R: A companding front end for noise-robust automatic speech recognition. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1: 249-252.
Google Scholar
Stone MA, Moore BCJ: Spectral feature enhancement for people with sensorineural hearing impairment: effects on speech intelligibility and quality. Journal of Rehabilitation Research and Development 1992,29(2):39-56. 10.1682/JRRD.1992.04.0039
Article Google Scholar
Baer T, Moore BCJ, Gatehouse S: Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment: effects on intelligibility, quality, and response times. Journal of Rehabilitation Research and Development 1993,30(1):49-72.
Google Scholar
Juang B-H, Rabiner LR, Wilpon JG: On the use of bandpass liftering in speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 1987,35(7):947-954. 10.1109/TASSP.1987.1165237
Article Google Scholar
Hunt MJ: Some experience in in-car speech recognition. Proceedings of the Workshop on Robust Methods for Speech Recognition in Adverse Conditions, May 1999, Tampere, Finland 25-31.
Google Scholar
University Technology Corporation : CSLR Speech Corpora. http://cslr.colorado.edu/beginweb/speechcorpora/corpus.html
Hirsch H-G, Pearce D: The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Proceedings of Automatic Speech Recognition: Challenges for the New Millenium (ISCA ITRW ASR '00), September 2000, Paris, France 181-188.
Google Scholar
Kandel ER, Schwarz JH, Jessell TM: Principles of Neural Science. McGraw Hill, New York, NY, USA; 2000.
Google Scholar
Singh R, Seltzer ML, Raj B, Stern RM: Speech in noisy environments: robust automatic segmentation, feature extraction, and hypothesis combination. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '01), May 2001, Salt Lake, Utah, USA 1: 273-276.
Google Scholar
The Hidden Markov Model Toolkit (HTK) University of Cambridge, http://htk.eng.cam.ac.uk/

Download references

Author information

Authors and Affiliations

Mitsubishi Electric Research Laboratories (MERL), 201 Broadway, Cambridge, MA, 02139-4307, USA
Bhiksha Raj & Bent Schmidt-Nielsen
Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA
Lorenzo Turicchia & Rahul Sarpeshkar

Authors

Bhiksha Raj
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Turicchia
View author publications
You can also search for this author in PubMed Google Scholar
Bent Schmidt-Nielsen
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Sarpeshkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bhiksha Raj.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Raj, B., Turicchia, L., Schmidt-Nielsen, B. et al. An FFT-Based Companding Front End for Noise-Robust Automatic Speech Recognition. J AUDIO SPEECH MUSIC PROC. 2007, 065420 (2007). https://doi.org/10.1155/2007/65420

Download citation

Received: 29 November 2006
Revised: 14 March 2007
Accepted: 23 April 2007
Published: 26 June 2007
DOI: https://doi.org/10.1155/2007/65420

An FFT-Based Companding Front End for Noise-Robust Automatic Speech Recognition

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords