Detection and Separation of Speech Events in Meeting Recordings Using a Microphone Array

Asano, Futoshi; Yamamoto, Kiyoshi; Ogata, Jun; Yamada, Miichi; Nakamura, Masami

doi:10.1155/2007/27616

Research Article
Open access
Published: 02 July 2007

Detection and Separation of Speech Events in Meeting Recordings Using a Microphone Array

Futoshi Asano¹,
Kiyoshi Yamamoto¹,
Jun Ogata¹,
Miichi Yamada² &
…
Masami Nakamura²

EURASIP Journal on Audio, Speech, and Music Processing volume 2007, Article number: 027616 (2007) Cite this article

1618 Accesses
17 Citations
Metrics details

Abstract

When applying automatic speech recognition (ASR) to meeting recordings including spontaneous speech, the performance of ASR is greatly reduced by the overlap of speech events. In this paper, a method of separating the overlapping speech events by using an adaptive beamforming (ABF) framework is proposed. The main feature of this method is that all the information necessary for the adaptation of ABF, including microphone calibration, is obtained from meeting recordings based on the results of speech-event detection. The performance of the separation is evaluated via ASR using real meeting recordings.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18]

References

Moore DC, McCowan IA: Microphone array speech recognition: experiments on overlapping speech in meetings. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), April 2003, Hong Kong 5: 497-500.
Google Scholar
Dielmann A, Renals S: Dynamic Bayesian networks for meeting structuring. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Que, Canada 5: 629-632.
Google Scholar
Ajmera J, Lathoud G, McCowan I: Clustering and segmenting speakers and their locations in meetings. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Que, Canada 1: 605-608.
Google Scholar
Katoh M, Yamamoto K, Ogata J, et al.: State estimation of meetings by information fusion using Bayesian network. Proceedings of the 9th European Conference on Speech Communication and Technology, September 2005, Lisbon, Portugal 113-116.
Google Scholar
Hain T, Dines J, Garau G, et al.: Transcription of conference room meetings: an investigation. Proceedings of the 9th European Conference on Speech Communication and Technology (EUROSPEECH '05), September 2005, Lisbon, Portugal 1661-1664.
Google Scholar
Haykin S (Ed): Unsupervised Adaptive Filtering, Vol. 1. John Wiley & Sons, New York, NY, USA; 2000.
Google Scholar
Johnson DH, Dudgeon DE: Array Signal Processing. Prentice-Hall, Englewood Cliffs, NJ, USA; 1993.
MATH Google Scholar
Hoshuyama O, Sugiyama A, Hirano A: A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters. IEEE Transactions on Signal Processing 1999,47(10):2677-2684. 10.1109/78.790650
Article Google Scholar
Oak P, Kellermann W: A calibration method for robust generalized sidelobe cancelling beamformers. Proceedings of International Workshop on Acoustic Echo and Noise Control (IWAENC '05), September 2005, Eindhoven, The Netherlands 97-100.
Google Scholar
Gannot S, Cohen I: Speech enhancement based on the general transfer function GSC and postfiltering. IEEE Transactions on Speech and Audio Processing 2004,12(6):561-571. 10.1109/TSA.2004.834599
Article Google Scholar
Asano F, Hayamizu S, Yamada T, Nakamura S: Speech enhancement based on the subspace method. IEEE Transactions on Speech and Audio Processing 2000,8(5):497-507. 10.1109/89.861364
Article Google Scholar
Asano F, Yamamoto K, Hara I, et al.: Detection and separation of speech event using audio and video information fusion and its application to robust speech interface. EURASIP Journal on Applied Signal Processing 2004,2004(11):1727-1738. 10.1155/S1110865704402303
Article Google Scholar
Asano F, Ogata J: Detection and separation of speech events in meeting recordings. Proceedings of the 9th International Conference on Spoken Language Processing (ICSLP '06), September 2006, Pittsburgh, Pa, USA 2586-2589.
Google Scholar
Asano F, Ikeda S, Ogawa M, Asoh H, Kitawaki N: Combined approach of array processing and independent component analysis for blind separation of acoustic signals. IEEE Transactions on Speech and Audio Processing 2003,11(3):204-215. 10.1109/TSA.2003.809191
Article Google Scholar
Schmidt RO: Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation 1986,34(3):276-280. 10.1109/TAP.1986.1143830
Article Google Scholar
Suzuki Y, Asano F, Kim H-Y, Sone T: An optimum computer-generated pulse signal suitable for the measurement of very long impulse responses. Journal of the Acoustical Society of America 1995,97(2):1119-1123. 10.1121/1.412224
Article Google Scholar
Leggetter CJ, Woodland PC: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 1995,9(2):171-185. 10.1006/csla.1995.0010
Article Google Scholar
Gauvain J-L, Lee C-H: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing 1994,2(2):291-298. 10.1109/89.279278
Article Google Scholar

Download references

Author information

Authors and Affiliations

Information Technology Research Institute, National Institute of Advanced Industrial Science and Technology, Tsukuba Central 2, 1-1-1 Umezono, Tsukuba, 305-8568, Japan
Futoshi Asano, Kiyoshi Yamamoto & Jun Ogata
Advanced Media, Inc., 48F Sunshine 60 Building, 3-1-1 Higashi-Ikebukuro, Toshima-Ku, Tokyo, 170-6048, Japan
Miichi Yamada & Masami Nakamura

Authors

Futoshi Asano
View author publications
You can also search for this author in PubMed Google Scholar
Kiyoshi Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar
Jun Ogata
View author publications
You can also search for this author in PubMed Google Scholar
Miichi Yamada
View author publications
You can also search for this author in PubMed Google Scholar
Masami Nakamura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Futoshi Asano.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Asano, F., Yamamoto, K., Ogata, J. et al. Detection and Separation of Speech Events in Meeting Recordings Using a Microphone Array. J AUDIO SPEECH MUSIC PROC. 2007, 027616 (2007). https://doi.org/10.1155/2007/27616

Download citation

Received: 02 November 2006
Revised: 14 February 2007
Accepted: 19 April 2007
Published: 02 July 2007
DOI: https://doi.org/10.1155/2007/27616

Detection and Separation of Speech Events in Meeting Recordings Using a Microphone Array

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords