- Research Article
- Open Access
- Published:
Denoising in the Domain of Spectrotemporal Modulations
EURASIP Journal on Audio, Speech, and Music Processing volume 2007, Article number: 042357 (2007)
Abstract
A noise suppression algorithm is proposed based on filtering the spectrotemporal modulations of noisy signals. The modulations are estimated from a multiscale representation of the signal spectrogram generated by a model of sound processing in the auditory system. A significant advantage of this method is its ability to suppress noise that has distinctive modulation patterns, despite being spectrally overlapping with the signal. The performance of the algorithm is evaluated using subjective and objective tests with contaminated speech signals and compared to traditional Wiener filtering method. The results demonstrate the efficacy of the spectrotemporal filtering approach in the conditions examined.
References
Lim JS, Oppenheim AV: Enhancement and bandwith compression of noisy speech. Proceedings of the IEEE 1979,67(12):1586-1604.
Ephraim Y, Van Trees HL: Signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 1995,3(4):251-266. 10.1109/89.397090
Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error-log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(2):443-445. 10.1109/TASSP.1985.1164550
Martin R: Statistical methods for the enhancement of noisy speech. Proceedings of the 8th IEEE International Workshop on Acoustic Echo and Noise Control (IWAENC '03), September 2003, Kyoto, Japan 1-6.
Shamma S: Encoding sound timbre in the auditory system. IETE Journal of Research 2003,49(2):193-205.
Elhilali M, Chi T, Shamma S: A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Communication 2003,41(2-3):331-348. 10.1016/S0167-6393(02)00134-6
Mesgarani N, Shamma S, Slaney M: Speech discrimination based on multiscale spectro-temporal modulations. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Canada 1: 601-604.
Carlyon RP, Shamma S: An account of monaural phase sensitivity. Journal of the Acoustical Society of America 2003,114(1):333-348. 10.1121/1.1577557
Tchroz J, Kollmeier B: SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Transactions on Speech and Audio Processing 2003,11(3):184-192. 10.1109/TSA.2003.811542
Wang K, Shamma S: Spectral shape analysis in the central auditory system. IEEE Transactions on Speech and Audio Processing 1995,3(5):382-395. 10.1109/89.466657
Lyon R, Shamma S: Auditory representation of timbre and pitch. In Auditory Computation, Springer Handbook of Auditory Research. Volume 6. Springer, New York, NY, USA; 1996:221-270. 10.1007/978-1-4612-4070-9_6
Yang X, Wang K, Shamma S: Auditory representations of acoustic signals. IEEE Transactions on Information Theory 1992,38(2, part 2):824-839. special issue on wavelet transforms and multi-resolution signal analysis 10.1109/18.119739
Chi T, Ru P, Shamma S: Multiresolution spectrotemporal analysis of complex sounds. Journal of the Acoustical Society of America 2005,118(2):887-906. 10.1121/1.1945807
Shamma S: Methods of neuronal modeling. In Spatial and Temporal Processing in the Auditory System. 2nd edition. MIT press, Cambridge, Mass, USA; 1998:411-460.
Depireux DA, Simon JZ, Klein DJ, Shamma S: Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. Journal of Neurophysiology 2001,85(3):1220-1234.
Kowalski N, Depireux DA, Shamma S: Analysis of dynamic spectra in ferret primary auditory cortex. I. Characteristics of single-unit responses to moving ripple spectra. Journal of Neurophysiology 1996,76(5):3503-3523.
Elhilali M, Chi T, Shamma S: A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Communication 2003,41(2-3):331-348. 10.1016/S0167-6393(02)00134-6
Varga A, Steeneken HJM, Tomlinson M, Jones D: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. 1992.
De Lathauwer L, De Moor B, Vandewalle J: A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications 2000,21(4):1253-1278. 10.1137/S0895479896305696
Vapnik VN: The Nature of Statistical Learning Theory. Springer, Berlin, Germany; 1995.
Scalart P, Filho JV: Speech enhancement based on a priori signal to noise estimation. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 2: 629-632.
Zavarehei E http://dea.brunel.ac.uk/cmsp/Home_Esfandiar
Seneff S, Zue V: Transcription and alignment of the timit database. In An Acoustic Phonetic Continuous Speech Database, 1988, Gaithersburgh, Md, USA. Edited by: Garofolo JS. National Institute of Standards and Technology (NIST);
Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs 2001.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Mesgarani, N., Shamma, S. Denoising in the Domain of Spectrotemporal Modulations. J AUDIO SPEECH MUSIC PROC. 2007, 042357 (2007). https://doi.org/10.1155/2007/42357
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1155/2007/42357
Keywords
- Acoustics
- Speech Signal
- Objective Test
- Auditory System
- Distinctive Modulation