Open Access

Denoising in the Domain of Spectrotemporal Modulations

EURASIP Journal on Audio, Speech, and Music Processing20072007:042357

DOI: 10.1155/2007/42357

Received: 19 December 2006

Accepted: 10 September 2007

Published: 15 November 2007

Abstract

A noise suppression algorithm is proposed based on filtering the spectrotemporal modulations of noisy signals. The modulations are estimated from a multiscale representation of the signal spectrogram generated by a model of sound processing in the auditory system. A significant advantage of this method is its ability to suppress noise that has distinctive modulation patterns, despite being spectrally overlapping with the signal. The performance of the algorithm is evaluated using subjective and objective tests with contaminated speech signals and compared to traditional Wiener filtering method. The results demonstrate the efficacy of the spectrotemporal filtering approach in the conditions examined.

[123456789101112131415161718192021222324]

Authors’ Affiliations

(1)
Electrical Engineering Department, University of Maryland

References

  1. Lim JS, Oppenheim AV: Enhancement and bandwith compression of noisy speech. Proceedings of the IEEE 1979,67(12):1586-1604.View ArticleGoogle Scholar
  2. Ephraim Y, Van Trees HL: Signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 1995,3(4):251-266. 10.1109/89.397090View ArticleGoogle Scholar
  3. Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error-log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(2):443-445. 10.1109/TASSP.1985.1164550View ArticleGoogle Scholar
  4. Martin R: Statistical methods for the enhancement of noisy speech. Proceedings of the 8th IEEE International Workshop on Acoustic Echo and Noise Control (IWAENC '03), September 2003, Kyoto, Japan 1-6.Google Scholar
  5. Shamma S: Encoding sound timbre in the auditory system. IETE Journal of Research 2003,49(2):193-205.Google Scholar
  6. Elhilali M, Chi T, Shamma S: A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Communication 2003,41(2-3):331-348. 10.1016/S0167-6393(02)00134-6View ArticleGoogle Scholar
  7. Mesgarani N, Shamma S, Slaney M: Speech discrimination based on multiscale spectro-temporal modulations. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Canada 1: 601-604.Google Scholar
  8. Carlyon RP, Shamma S: An account of monaural phase sensitivity. Journal of the Acoustical Society of America 2003,114(1):333-348. 10.1121/1.1577557View ArticleGoogle Scholar
  9. Tchroz J, Kollmeier B: SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Transactions on Speech and Audio Processing 2003,11(3):184-192. 10.1109/TSA.2003.811542View ArticleGoogle Scholar
  10. Wang K, Shamma S: Spectral shape analysis in the central auditory system. IEEE Transactions on Speech and Audio Processing 1995,3(5):382-395. 10.1109/89.466657View ArticleGoogle Scholar
  11. Lyon R, Shamma S: Auditory representation of timbre and pitch. In Auditory Computation, Springer Handbook of Auditory Research. Volume 6. Springer, New York, NY, USA; 1996:221-270. 10.1007/978-1-4612-4070-9_6Google Scholar
  12. Yang X, Wang K, Shamma S: Auditory representations of acoustic signals. IEEE Transactions on Information Theory 1992,38(2, part 2):824-839. special issue on wavelet transforms and multi-resolution signal analysis 10.1109/18.119739View ArticleGoogle Scholar
  13. Chi T, Ru P, Shamma S: Multiresolution spectrotemporal analysis of complex sounds. Journal of the Acoustical Society of America 2005,118(2):887-906. 10.1121/1.1945807View ArticleGoogle Scholar
  14. Shamma S: Methods of neuronal modeling. In Spatial and Temporal Processing in the Auditory System. 2nd edition. MIT press, Cambridge, Mass, USA; 1998:411-460.Google Scholar
  15. Depireux DA, Simon JZ, Klein DJ, Shamma S: Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. Journal of Neurophysiology 2001,85(3):1220-1234.Google Scholar
  16. Kowalski N, Depireux DA, Shamma S: Analysis of dynamic spectra in ferret primary auditory cortex. I. Characteristics of single-unit responses to moving ripple spectra. Journal of Neurophysiology 1996,76(5):3503-3523.Google Scholar
  17. Elhilali M, Chi T, Shamma S: A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Communication 2003,41(2-3):331-348. 10.1016/S0167-6393(02)00134-6View ArticleGoogle Scholar
  18. Varga A, Steeneken HJM, Tomlinson M, Jones D: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. 1992.Google Scholar
  19. De Lathauwer L, De Moor B, Vandewalle J: A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications 2000,21(4):1253-1278. 10.1137/S0895479896305696MathSciNetView ArticleMATHGoogle Scholar
  20. Vapnik VN: The Nature of Statistical Learning Theory. Springer, Berlin, Germany; 1995.View ArticleMATHGoogle Scholar
  21. Scalart P, Filho JV: Speech enhancement based on a priori signal to noise estimation. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 2: 629-632.Google Scholar
  22. Zavarehei E http://dea.brunel.ac.uk/cmsp/Home_Esfandiar
  23. Seneff S, Zue V: Transcription and alignment of the timit database. In An Acoustic Phonetic Continuous Speech Database, 1988, Gaithersburgh, Md, USA. Edited by: Garofolo JS. National Institute of Standards and Technology (NIST);Google Scholar
  24. Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs 2001.

Copyright

© N. Mesgarani and S. Shamma. 2007

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.