Skip to main content
  • Research Article
  • Open access
  • Published:

Denoising in the Domain of Spectrotemporal Modulations

Abstract

A noise suppression algorithm is proposed based on filtering the spectrotemporal modulations of noisy signals. The modulations are estimated from a multiscale representation of the signal spectrogram generated by a model of sound processing in the auditory system. A significant advantage of this method is its ability to suppress noise that has distinctive modulation patterns, despite being spectrally overlapping with the signal. The performance of the algorithm is evaluated using subjective and objective tests with contaminated speech signals and compared to traditional Wiener filtering method. The results demonstrate the efficacy of the spectrotemporal filtering approach in the conditions examined.

[123456789101112131415161718192021222324]

References

  1. Lim JS, Oppenheim AV: Enhancement and bandwith compression of noisy speech. Proceedings of the IEEE 1979,67(12):1586-1604.

    Article  Google Scholar 

  2. Ephraim Y, Van Trees HL: Signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 1995,3(4):251-266. 10.1109/89.397090

    Article  Google Scholar 

  3. Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error-log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(2):443-445. 10.1109/TASSP.1985.1164550

    Article  Google Scholar 

  4. Martin R: Statistical methods for the enhancement of noisy speech. Proceedings of the 8th IEEE International Workshop on Acoustic Echo and Noise Control (IWAENC '03), September 2003, Kyoto, Japan 1-6.

    Google Scholar 

  5. Shamma S: Encoding sound timbre in the auditory system. IETE Journal of Research 2003,49(2):193-205.

    Google Scholar 

  6. Elhilali M, Chi T, Shamma S: A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Communication 2003,41(2-3):331-348. 10.1016/S0167-6393(02)00134-6

    Article  Google Scholar 

  7. Mesgarani N, Shamma S, Slaney M: Speech discrimination based on multiscale spectro-temporal modulations. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Canada 1: 601-604.

    Google Scholar 

  8. Carlyon RP, Shamma S: An account of monaural phase sensitivity. Journal of the Acoustical Society of America 2003,114(1):333-348. 10.1121/1.1577557

    Article  Google Scholar 

  9. Tchroz J, Kollmeier B: SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Transactions on Speech and Audio Processing 2003,11(3):184-192. 10.1109/TSA.2003.811542

    Article  Google Scholar 

  10. Wang K, Shamma S: Spectral shape analysis in the central auditory system. IEEE Transactions on Speech and Audio Processing 1995,3(5):382-395. 10.1109/89.466657

    Article  Google Scholar 

  11. Lyon R, Shamma S: Auditory representation of timbre and pitch. In Auditory Computation, Springer Handbook of Auditory Research. Volume 6. Springer, New York, NY, USA; 1996:221-270. 10.1007/978-1-4612-4070-9_6

    Google Scholar 

  12. Yang X, Wang K, Shamma S: Auditory representations of acoustic signals. IEEE Transactions on Information Theory 1992,38(2, part 2):824-839. special issue on wavelet transforms and multi-resolution signal analysis 10.1109/18.119739

    Article  Google Scholar 

  13. Chi T, Ru P, Shamma S: Multiresolution spectrotemporal analysis of complex sounds. Journal of the Acoustical Society of America 2005,118(2):887-906. 10.1121/1.1945807

    Article  Google Scholar 

  14. Shamma S: Methods of neuronal modeling. In Spatial and Temporal Processing in the Auditory System. 2nd edition. MIT press, Cambridge, Mass, USA; 1998:411-460.

    Google Scholar 

  15. Depireux DA, Simon JZ, Klein DJ, Shamma S: Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. Journal of Neurophysiology 2001,85(3):1220-1234.

    Google Scholar 

  16. Kowalski N, Depireux DA, Shamma S: Analysis of dynamic spectra in ferret primary auditory cortex. I. Characteristics of single-unit responses to moving ripple spectra. Journal of Neurophysiology 1996,76(5):3503-3523.

    Google Scholar 

  17. Elhilali M, Chi T, Shamma S: A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Communication 2003,41(2-3):331-348. 10.1016/S0167-6393(02)00134-6

    Article  Google Scholar 

  18. Varga A, Steeneken HJM, Tomlinson M, Jones D: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. 1992.

    Google Scholar 

  19. De Lathauwer L, De Moor B, Vandewalle J: A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications 2000,21(4):1253-1278. 10.1137/S0895479896305696

    Article  MathSciNet  MATH  Google Scholar 

  20. Vapnik VN: The Nature of Statistical Learning Theory. Springer, Berlin, Germany; 1995.

    Book  MATH  Google Scholar 

  21. Scalart P, Filho JV: Speech enhancement based on a priori signal to noise estimation. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 2: 629-632.

    Google Scholar 

  22. Zavarehei E http://dea.brunel.ac.uk/cmsp/Home_Esfandiar

  23. Seneff S, Zue V: Transcription and alignment of the timit database. In An Acoustic Phonetic Continuous Speech Database, 1988, Gaithersburgh, Md, USA. Edited by: Garofolo JS. National Institute of Standards and Technology (NIST);

    Google Scholar 

  24. Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs 2001.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nima Mesgarani.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Mesgarani, N., Shamma, S. Denoising in the Domain of Spectrotemporal Modulations. J AUDIO SPEECH MUSIC PROC. 2007, 042357 (2007). https://doi.org/10.1155/2007/42357

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1155/2007/42357

Keywords