A Semi-Continuous State-Transition Probability HMM-Based Voice Activity Detector

Othman, H; Aboulnasr, T

doi:10.1155/2007/43218

Research Article
Open access
Published: 07 February 2007

A Semi-Continuous State-Transition Probability HMM-Based Voice Activity Detector

H Othman¹ &
T Aboulnasr¹

EURASIP Journal on Audio, Speech, and Music Processing volume 2007, Article number: 043218 (2007) Cite this article

1449 Accesses
2 Citations
3 Altmetric
Metrics details

Abstract

We introduce an efficient hidden Markov model-based voice activity detection (VAD) algorithm with time-variant state-transition probabilities in the underlying Markov chain. The transition probabilities vary in an exponential charge/discharge scheme and are softly merged with state conditional likelihood into a final VAD decision. Working in the domain of ITU-T G.729 parameters, with no additional cost for feature extraction, the proposed algorithm significantly outperforms G.729 Annex B VAD while providing a balanced tradeoff between clipping and false detection errors. The performance compares very favorably with the adaptive multirate VAD, option 2 (AMR2).

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16]

References

Benyassine A, Shlomot E, Su H-Y, Massaloux D, Lamblin C, Petit J-P: ITU-T recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications. IEEE Communications Magazine 1997,35(9):64-73. 10.1109/35.620527
Article Google Scholar
Cho YD, Kondoz A: Analysis and improvement of a statistical model-based voice activity detector. IEEE Signal Processing Letters 2001,8(10):276-278. 10.1109/97.957270
Article Google Scholar
Sohn J, Kim NS, Sung W: A statistical model-based voice activity detection. IEEE Signal Processing Letters 1999,6(1):1-3. 10.1109/97.736233
Article Google Scholar
Nemer E, Gourbran R, Mahmoud S: Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Transactions on Speech and Audio Processing 2001,9(3):217-231. 10.1109/89.905996
Article Google Scholar
Marzinzik M, Kollmeier B: Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Transactions on Speech and Audio Processing 2002,10(2):109-118. 10.1109/89.985548
Article Google Scholar
Yang S, Li Z-G, Chen Y-Q: A fractal based voice activity detector for internet telephone. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '03), April 2003, Hong Kong 1: 808-811.
Article Google Scholar
ITU-T G.729 Annex B : A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70. 1996.
Google Scholar
Beritelli F, Casale S, Ruggeri G, Serrano S: Performance evaluation and comparison of G.729/AMR/fuzzy voice activity detectors. IEEE Signal Processing Letters 2002,9(3):85-88. 10.1109/97.995824
Article Google Scholar
Beritelli F, Casale S, Cavallaro A: A robust voice activity detector for wireless communications using soft computing. IEEE Journal on Selected Areas in Communications 1998,16(9):1818-1829. 10.1109/49.737650
Article Google Scholar
Gazor S, Zhang W: A soft voice activity detector based on a Laplacian-Gaussian model. IEEE Transactions on Speech and Audio Processing 2003,11(5):498-505. 10.1109/TSA.2003.815518
Article Google Scholar
ETSI EN 301 708 v7.1.1 (1999-12) : European Standard (Telecommunications series), Digital cellular telecommunications system (Phase 2+); Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) speech traffic channels; General description. (GSM 06.94 version 7.1.1 Release 1998)
Kelly GE, Lindsey JK: Models for estimating the change-point in gas exchange data. Proceedings of the 22nd Conference on Applied Statistics in Ireland (CASI '02), May 2002, Antrim, Ireland
Google Scholar
ITU-T Series P Supplement 23, "ITU-T coded-speech database," February 1998, http://www.itu.int
Google Scholar
Othman H, Aboulnasr T: A Gaussian/Laplacian hybrid statistical voice activity detector for line spectral frequency-based speech coders. Proceedings of the 46th IEEE International Midwest Symposium on Circuits and Systems (MWSCAS '03), December 2003, Cairo, Egypt 2: 693-696.
Article Google Scholar
Othman H, Aboulnasr T: A semi-continuous state transition probability HMM-based voice activity detection. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 5: 821-824.
Google Scholar
Tian Y, Wu J, Wang Z, Lu D: Fuzzy clustering and Bayesian information criterion based threshold estimation for robust voice activity detection. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '03), April 2003, Hong Kong 1: 444-447.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology and Engineering, Faculty of Engineering, University of Ottawa, Ontario, K1N 6N5, Canada
H Othman & T Aboulnasr

Authors

H Othman
View author publications
You can also search for this author in PubMed Google Scholar
T Aboulnasr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to H Othman.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Othman, H., Aboulnasr, T. A Semi-Continuous State-Transition Probability HMM-Based Voice Activity Detector. J AUDIO SPEECH MUSIC PROC. 2007, 043218 (2007). https://doi.org/10.1155/2007/43218

Download citation

Received: 15 December 2005
Revised: 13 November 2006
Accepted: 28 November 2006
Published: 07 February 2007
DOI: https://doi.org/10.1155/2007/43218

A Semi-Continuous State-Transition Probability HMM-Based Voice Activity Detector

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords