Skip to main content

Advertisement

Denoising in the Domain of Spectrotemporal Modulations

Article metrics

Abstract

A noise suppression algorithm is proposed based on filtering the spectrotemporal modulations of noisy signals. The modulations are estimated from a multiscale representation of the signal spectrogram generated by a model of sound processing in the auditory system. A significant advantage of this method is its ability to suppress noise that has distinctive modulation patterns, despite being spectrally overlapping with the signal. The performance of the algorithm is evaluated using subjective and objective tests with contaminated speech signals and compared to traditional Wiener filtering method. The results demonstrate the efficacy of the spectrotemporal filtering approach in the conditions examined.

[123456789101112131415161718192021222324]

References

  1. 1.

    Lim JS, Oppenheim AV: Enhancement and bandwith compression of noisy speech. Proceedings of the IEEE 1979,67(12):1586-1604.

  2. 2.

    Ephraim Y, Van Trees HL: Signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 1995,3(4):251-266. 10.1109/89.397090

  3. 3.

    Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error-log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(2):443-445. 10.1109/TASSP.1985.1164550

  4. 4.

    Martin R: Statistical methods for the enhancement of noisy speech. Proceedings of the 8th IEEE International Workshop on Acoustic Echo and Noise Control (IWAENC '03), September 2003, Kyoto, Japan 1-6.

  5. 5.

    Shamma S: Encoding sound timbre in the auditory system. IETE Journal of Research 2003,49(2):193-205.

  6. 6.

    Elhilali M, Chi T, Shamma S: A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Communication 2003,41(2-3):331-348. 10.1016/S0167-6393(02)00134-6

  7. 7.

    Mesgarani N, Shamma S, Slaney M: Speech discrimination based on multiscale spectro-temporal modulations. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Canada 1: 601-604.

  8. 8.

    Carlyon RP, Shamma S: An account of monaural phase sensitivity. Journal of the Acoustical Society of America 2003,114(1):333-348. 10.1121/1.1577557

  9. 9.

    Tchroz J, Kollmeier B: SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Transactions on Speech and Audio Processing 2003,11(3):184-192. 10.1109/TSA.2003.811542

  10. 10.

    Wang K, Shamma S: Spectral shape analysis in the central auditory system. IEEE Transactions on Speech and Audio Processing 1995,3(5):382-395. 10.1109/89.466657

  11. 11.

    Lyon R, Shamma S: Auditory representation of timbre and pitch. In Auditory Computation, Springer Handbook of Auditory Research. Volume 6. Springer, New York, NY, USA; 1996:221-270. 10.1007/978-1-4612-4070-9_6

  12. 12.

    Yang X, Wang K, Shamma S: Auditory representations of acoustic signals. IEEE Transactions on Information Theory 1992,38(2, part 2):824-839. special issue on wavelet transforms and multi-resolution signal analysis 10.1109/18.119739

  13. 13.

    Chi T, Ru P, Shamma S: Multiresolution spectrotemporal analysis of complex sounds. Journal of the Acoustical Society of America 2005,118(2):887-906. 10.1121/1.1945807

  14. 14.

    Shamma S: Methods of neuronal modeling. In Spatial and Temporal Processing in the Auditory System. 2nd edition. MIT press, Cambridge, Mass, USA; 1998:411-460.

  15. 15.

    Depireux DA, Simon JZ, Klein DJ, Shamma S: Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. Journal of Neurophysiology 2001,85(3):1220-1234.

  16. 16.

    Kowalski N, Depireux DA, Shamma S: Analysis of dynamic spectra in ferret primary auditory cortex. I. Characteristics of single-unit responses to moving ripple spectra. Journal of Neurophysiology 1996,76(5):3503-3523.

  17. 17.

    Elhilali M, Chi T, Shamma S: A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Communication 2003,41(2-3):331-348. 10.1016/S0167-6393(02)00134-6

  18. 18.

    Varga A, Steeneken HJM, Tomlinson M, Jones D: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. 1992.

  19. 19.

    De Lathauwer L, De Moor B, Vandewalle J: A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications 2000,21(4):1253-1278. 10.1137/S0895479896305696

  20. 20.

    Vapnik VN: The Nature of Statistical Learning Theory. Springer, Berlin, Germany; 1995.

  21. 21.

    Scalart P, Filho JV: Speech enhancement based on a priori signal to noise estimation. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 2: 629-632.

  22. 22.

    Zavarehei E http://dea.brunel.ac.uk/cmsp/Home_Esfandiar

  23. 23.

    Seneff S, Zue V: Transcription and alignment of the timit database. In An Acoustic Phonetic Continuous Speech Database, 1988, Gaithersburgh, Md, USA. Edited by: Garofolo JS. National Institute of Standards and Technology (NIST);

  24. 24.

    Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs 2001.

Download references

Author information

Correspondence to Nima Mesgarani.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Keywords

  • Acoustics
  • Speech Signal
  • Objective Test
  • Auditory System
  • Distinctive Modulation