Recognition of Noisy Speech: A Comparative Survey of Robust Model Architecture and Feature Enhancement

Schuller, Björn; Wöllmer, Martin; Moosmayr, Tobias; Rigoll, Gerhard

doi:10.1155/2009/942617

Research Article
Open access
Published: 24 May 2009

Recognition of Noisy Speech: A Comparative Survey of Robust Model Architecture and Feature Enhancement

Björn Schuller¹,
Martin Wöllmer¹,
Tobias Moosmayr² &
…
Gerhard Rigoll¹

EURASIP Journal on Audio, Speech, and Music Processing volume 2009, Article number: 942617 (2009) Cite this article

1763 Accesses
27 Citations
Metrics details

Abstract

Performance of speech recognition systems strongly degrades in the presence of background noise, like the driving noise inside a car. In contrast to existing works, we aim to improve noise robustness focusing on all major levels of speech recognition: feature extraction, feature enhancement, speech modelling, and training. Thereby, we give an overview of promising auditory modelling concepts, speech enhancement techniques, training strategies, and model architecture, which are implemented in an in-car digit and spelling recognition task considering noises produced by various car types and driving conditions. We prove that joint speech and noise modelling with a Switching Linear Dynamic Model (SLDM) outperforms speech enhancement techniques like Histogram Equalisation (HEQ) with a mean relative error reduction of 52.7% over various noise types and levels. Embedding a Switching Linear Dynamical System (SLDS) into a Switching Autoregressive Hidden Markov Model (SAR-HMM) prevails for speech disturbed by additive white Gaussian noise.

Publisher note

To access the full article, please see PDF.

Author information

Authors and Affiliations

Institute for Human-Machine Communication, Technische Universität München (TUM), 80290, Munich, Germany
Björn Schuller, Martin Wöllmer & Gerhard Rigoll
BMW Group, Forschungs- und Innovationszentrum, Akustik, Komfort und Werterhaltung, 80788, München, Germany
Tobias Moosmayr

Authors

Björn Schuller
View author publications
You can also search for this author in PubMed Google Scholar
Martin Wöllmer
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Moosmayr
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Rigoll
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Björn Schuller.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Schuller, B., Wöllmer, M., Moosmayr, T. et al. Recognition of Noisy Speech: A Comparative Survey of Robust Model Architecture and Feature Enhancement. J AUDIO SPEECH MUSIC PROC. 2009, 942617 (2009). https://doi.org/10.1155/2009/942617

Download citation

Received: 28 October 2008
Revised: 21 January 2009
Accepted: 15 February 2009
Published: 24 May 2009
DOI: https://doi.org/10.1155/2009/942617

Recognition of Noisy Speech: A Comparative Survey of Robust Model Architecture and Feature Enhancement

Abstract

Publisher note

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords