Denoising in the Domain of Spectrotemporal Modulations

Mesgarani, Nima; Shamma, Shihab

doi:10.1155/2007/42357

Research Article
Open access
Published: 15 November 2007

Denoising in the Domain of Spectrotemporal Modulations

Nima Mesgarani¹ &
Shihab Shamma¹

EURASIP Journal on Audio, Speech, and Music Processing volume 2007, Article number: 042357 (2007) Cite this article

1682 Accesses
22 Citations
Metrics details

Abstract

A noise suppression algorithm is proposed based on filtering the spectrotemporal modulations of noisy signals. The modulations are estimated from a multiscale representation of the signal spectrogram generated by a model of sound processing in the auditory system. A significant advantage of this method is its ability to suppress noise that has distinctive modulation patterns, despite being spectrally overlapping with the signal. The performance of the algorithm is evaluated using subjective and objective tests with contaminated speech signals and compared to traditional Wiener filtering method. The results demonstrate the efficacy of the spectrotemporal filtering approach in the conditions examined.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]

References

Lim JS, Oppenheim AV: Enhancement and bandwith compression of noisy speech. Proceedings of the IEEE 1979,67(12):1586-1604.
Article Google Scholar
Ephraim Y, Van Trees HL: Signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 1995,3(4):251-266. 10.1109/89.397090
Article Google Scholar
Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error-log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(2):443-445. 10.1109/TASSP.1985.1164550
Article Google Scholar
Martin R: Statistical methods for the enhancement of noisy speech. Proceedings of the 8th IEEE International Workshop on Acoustic Echo and Noise Control (IWAENC '03), September 2003, Kyoto, Japan 1-6.
Google Scholar
Shamma S: Encoding sound timbre in the auditory system. IETE Journal of Research 2003,49(2):193-205.
Google Scholar
Elhilali M, Chi T, Shamma S: A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Communication 2003,41(2-3):331-348. 10.1016/S0167-6393(02)00134-6
Article Google Scholar
Mesgarani N, Shamma S, Slaney M: Speech discrimination based on multiscale spectro-temporal modulations. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Canada 1: 601-604.
Google Scholar
Carlyon RP, Shamma S: An account of monaural phase sensitivity. Journal of the Acoustical Society of America 2003,114(1):333-348. 10.1121/1.1577557
Article Google Scholar
Tchroz J, Kollmeier B: SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Transactions on Speech and Audio Processing 2003,11(3):184-192. 10.1109/TSA.2003.811542
Article Google Scholar
Wang K, Shamma S: Spectral shape analysis in the central auditory system. IEEE Transactions on Speech and Audio Processing 1995,3(5):382-395. 10.1109/89.466657
Article Google Scholar
Lyon R, Shamma S: Auditory representation of timbre and pitch. In Auditory Computation, Springer Handbook of Auditory Research. Volume 6. Springer, New York, NY, USA; 1996:221-270. 10.1007/978-1-4612-4070-9_6
Google Scholar
Yang X, Wang K, Shamma S: Auditory representations of acoustic signals. IEEE Transactions on Information Theory 1992,38(2, part 2):824-839. special issue on wavelet transforms and multi-resolution signal analysis 10.1109/18.119739
Article Google Scholar
Chi T, Ru P, Shamma S: Multiresolution spectrotemporal analysis of complex sounds. Journal of the Acoustical Society of America 2005,118(2):887-906. 10.1121/1.1945807
Article Google Scholar
Shamma S: Methods of neuronal modeling. In Spatial and Temporal Processing in the Auditory System. 2nd edition. MIT press, Cambridge, Mass, USA; 1998:411-460.
Google Scholar
Depireux DA, Simon JZ, Klein DJ, Shamma S: Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. Journal of Neurophysiology 2001,85(3):1220-1234.
Google Scholar
Kowalski N, Depireux DA, Shamma S: Analysis of dynamic spectra in ferret primary auditory cortex. I. Characteristics of single-unit responses to moving ripple spectra. Journal of Neurophysiology 1996,76(5):3503-3523.
Google Scholar
Elhilali M, Chi T, Shamma S: A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Communication 2003,41(2-3):331-348. 10.1016/S0167-6393(02)00134-6
Article Google Scholar
Varga A, Steeneken HJM, Tomlinson M, Jones D: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. 1992.
Google Scholar
De Lathauwer L, De Moor B, Vandewalle J: A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications 2000,21(4):1253-1278. 10.1137/S0895479896305696
Article MathSciNet MATH Google Scholar
Vapnik VN: The Nature of Statistical Learning Theory. Springer, Berlin, Germany; 1995.
Book MATH Google Scholar
Scalart P, Filho JV: Speech enhancement based on a priori signal to noise estimation. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 2: 629-632.
Google Scholar
Zavarehei E http://dea.brunel.ac.uk/cmsp/Home_Esfandiar
Seneff S, Zue V: Transcription and alignment of the timit database. In An Acoustic Phonetic Continuous Speech Database, 1988, Gaithersburgh, Md, USA. Edited by: Garofolo JS. National Institute of Standards and Technology (NIST);
Google Scholar
Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs 2001.

Download references

Author information

Authors and Affiliations

Electrical Engineering Department, University of Maryland, 1103 A.V.Williams Building, College Park, MD, 20742, USA
Nima Mesgarani & Shihab Shamma

Authors

Nima Mesgarani
View author publications
You can also search for this author in PubMed Google Scholar
Shihab Shamma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nima Mesgarani.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Mesgarani, N., Shamma, S. Denoising in the Domain of Spectrotemporal Modulations. J AUDIO SPEECH MUSIC PROC. 2007, 042357 (2007). https://doi.org/10.1155/2007/42357

Download citation

Received: 19 December 2006
Revised: 07 May 2007
Accepted: 10 September 2007
Published: 15 November 2007
DOI: https://doi.org/10.1155/2007/42357

Denoising in the Domain of Spectrotemporal Modulations

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords