Distant-talking speaker identification by generalized spectral subtraction-based dereverberation and its efficient computation
© Zhang et al.; licensee Springer. 2014
Received: 4 July 2013
Accepted: 21 December 2013
Published: 15 April 2014
Previously, a dereverberation method based on generalized spectral subtraction (GSS) using multi-channel least mean-squares (MCLMS) has been proposed. The results of speech recognition experiments showed that this method achieved a significant improvement over conventional methods. In this paper, we apply this method to distant-talking (far-field) speaker recognition. However, for far-field speech, the GSS-based dereverberation method using clean speech models degrades the speaker recognition performance. This may be because GSS-based dereverberation causes some distortion between clean speech and dereverberant speech. In this paper, we address this problem by training speaker models using dereverberant speech obtained by suppressing reverberation from arbitrary artificial reverberant speech. Furthermore, we propose an efficient computational method for a combination of the likelihood of dereverberant speech using multiple compensation parameter sets. This addresses the problem of determining optimal compensation parameters for GSS. We report the results of a speaker recognition experiment performed on large-scale far-field speech with different reverberant environments to the training environments. The proposed GSS-based dereverberation method achieves a recognition rate of 92.2%, which compares well with conventional cepstral mean normalization with delay-and-sum beamforming using a clean speech model (49.0%) and a reverberant speech model (88.4%). We also compare the proposed method with another dereverberation technique, multi-step linear prediction-based spectral subtraction (MSLP-GSS). The proposed method achieves a better recognition rate than the 90.6% of MSLP-GSS. The use of multiple compensation parameters further improves the speech recognition performance, giving our approach a recognition rate of 93.6%. We implement this method in a real environment using the optimal compensation parameters estimated from an artificial environment. The results show a recognition rate of 87.8% compared with 72.5% for delay-and-sum beamforming using a reverberant speech model.
Because of the existence of reverberation in far-field environments, the recognition performance for distant-talking speech/speakers is drastically degraded. The current approaches to automatic speech recognition (ASR)/speaker recognition that are robust to reverberation can be classified as speech signal processing (pre-processing), robust feature extraction, or model adaptation [1–4].
In this paper, we focus on speech signal processing for speaker identification. Beamforming is one of the simplest and most robust means of spatial filtering to suppress reverberation and background noise. This means it is able to discriminate between signals based on the physical location of their source . Another general approach is cepstral mean normalization (CMN) [6, 7], which has been extensively examined as a simple and effective way of reducing reverberation by normalizing the cepstral features. Because of multiple reflections and diffusions of the sound waves, the energy of previous speech is smeared over time, and overlaps with subsequent speech. This results in a duration that is much longer than the window size of short-term spectral analysis, a problem known as late reverberation . Therefore, the dereverberation of CMN is not completely effective in environments with late reverberation. Several studies have focused on mitigating the above problem [9–18]. In [9, 10], a method based on mean subtraction using a long-term spectral analysis window was proposed. The result showed that subtracting the mean of the log magnitude spectrum improved ASR performance. A blind deconvolution-based approach for restoring speech that has been degraded by the acoustic environment was proposed in . This scheme processed the phase-only output from two microphones using cepstrum operations and signal reconstruction theory. In , a multi-channel speech dereverberation method based on spectral subtraction using a statistical model to estimate the power spectrum was proposed. In the study of , a new set of feature parameters based on the Hilbert envelope of Gammatone filterbank outputs was proposed to reduce the effect of room reverberation in speaker recognition. A novel approach for multi-microphone speech dereverberation was proposed in . The method was based on the construction of a null subspace of the data matrix in the presence of colored noise, employing generalized singular-value decomposition or generalized eigenvalue decomposition of the respective correlation matrices. A method based on multi-step linear prediction (MSLP) was proposed in [15, 20]. The method first estimates late reverberations using long-term multi-step linear prediction, and then suppresses them with subsequent spectral subtraction. A reverberation compensation method for speaker recognition using spectral subtraction , in which late reverberation is treated as additive noise, was proposed in [18, 21]. However, the drawback of this approach is that the optimum parameters for spectral subtraction are empirically estimated from a development dataset, meaning that the late reverberation cannot be subtracted correctly as it is not precisely modeled.
Previously, Wang et al. presented a distant-talking speech recognition method based on generalized spectral subtraction (GSS) employing the multi-channel least mean-squares (MCLMS) algorithm . They treated late reverberation as additive noise, and proposed a noise reduction technique based on GSS [23, 24] to estimate the spectrum of the clean speech using an approximated spectrum of the impulse response. To estimate the spectra of the impulse responses, a variable step-size unconstrained MCLMS algorithm for identifying the impulse responses in a time domain  was extended to the frequency domain. About the early reverberation, we can remove it by GSS method theoretically. But this method may cause some deviation in the MCLMS step. The estimation error of channel impulse response is inevitable, which results in unreliable estimation of power spectrum of clean speech. On the other hand, CMN is robust to reduce the channel distortion within the spectral analysis window . So, early reverberation was suppressed by CMN. A speech recognition experiment showed that the GSS-based dereverberation method achieved an average relative word error reduction rate of 32.6% compared with conventional CMN with beamforming .
GSS-based dereverberation was applied to the field of speech recognition in a previous study . However, the effect of GSS-based dereverberation on distant-talking speaker recognition is still unknown. A preliminary experiment on speaker recognition with a GSS-based method showed that dereverberation using clean speech models degraded the speaker recognition performance, but was very effective for speech recognition. This may be because the GSS-based dereverberation method causes some distortion between the speaker characteristics of clean speech and dereverberant speech. We address this problem by training speaker models using dereverberant speech obtained by suppressing early and late reverberation from arbitrary artificial reverberant speech. We assumed that the distortion of speaker characteristics in the training and test data is similar, so the GSS-based dereverberation method should be effective for speaker recognition.
It is difficult to obtain optimal compensation parameter values (that is, the noise overestimation factor α and exponent parameter n defined in Equation 5) for GSS under different conditions. We assume that the optimal compensation parameters for GSS are dependent on the acoustic environment and utterance content. A fixed compensation parameter cannot robustly suppress reverberation for all conditions. Therefore, we propose a combination of the likelihood of dereverberant speech using multiple compensation parameters for GSS. However, the computational time of this combination method is proportional to the number of compensation parameter sets. To reduce the computational cost, N speaker models with the highest likelihood are obtained using a GSS without tuning (that is, α=n=1). Only these N-best speaker models are used to calculate the likelihood using GSS with other compensation parameters.
With regard to speaker recognition, various models have been studied. The Gaussian mixture model (GMM) has been widely used as a speaker model [26–28]. Its use is motivated by the fact that the Gaussian components represent some general speaker-dependent spectral shapes, and by the capability of Gaussian mixtures to model arbitrary densities. Artificial neural networks  and support vector machines  have been proposed as discriminative models for the boundary between speakers. Recently, joint factor analysis and total factors [31, 32] have been demonstrated as very effective mechanisms for speaker verification by compensating channel variability. The consideration of state-of-the-art speaker models is beyond the scope of the current study. Thus, in this paper, we use GMMs for speaker identification.
The remainder of this paper is organized as follows: Section 2 describes our distant-talking speaker identification system employing a dereverberation method. The outline of blind dereverberation based on SS is described in section 3. The combination of likelihoods with various compensation parameters and its efficient computation is proposed in section 4, and section 5 describes the experimental results of distant-talking speaker recognition in a reverberant environment. Finally, section 6 summarizes the paper.
2 Distant-talking speaker recognition system employing a dereverberation method
The performance of distant-talking speech/speaker recognition is degraded remarkably by reverberation. By removing reverberation, we can expect to improve the speech/speaker recognition performance. However, very little research has studied the difference between speech recognition and speaker recognition in a distant-talking environment. For speech recognition, it is necessary to maximize the inter-phoneme variation while minimizing the intra-phoneme variation in the feature space, whereas for speaker recognition, the focus is on speaker variation instead of phoneme variation. These characteristics mean some methods that are effective in speech recognition may be not effective for speaker recognition, especially in a hands-free environment. For example, a simple and popular channel normalization method, CMN, removes both the transmission characteristics and speaker characteristics, leading to differences in the speaker recognition and speech recognition performance. A previous study  on distant-talking speaker recognition showed that conventional CMN gave much worse results than those without CMN, although it was very effective for speech recognition in a reverberant environment with a short reverberation time. CMN has worse speaker recognition performance than without CMN in a small reverberation environments, while the opposite is true in large-reverberation environments. This is because CMN removes the speaker characteristics, and the channel distortion (reverberation) is not very large. In the speech recognition field, GSS-based dereverberation using clean speech models showed a significant improvement . However, in terms of speaker recognition, the experiment we describe in section 5 shows that it degrades the speaker recognition performance. This could be due to the GSS-based dereverberation method distorting the speaker characteristics of clean speech and dereverberant speech.
To mitigate the distortion of speaker characteristics caused by dereverberation in the test stage, we obtain dereverberant speech by suppressing early and late reverberation from arbitrary artificial reverberant speech, and use this to train the speaker models. We assume that the speaker characteristics suffer similar distortion in the training data and test data. By employing dereverberation in both the training and test stages, the transmission characteristics can be removed and the relative speaker characteristics can be maximized. Compared with speaker models trained with reverberant speech, our method is expected to exhibit a better speaker recognition performance. In previous research, GMMs trained with reverberant speech have been used for distant-talking speaker recognition. However, the mismatch of distant-talking environments between the training condition and the test condition has still not been addressed. Furthermore, when late reverberations have a large amount of energy, the performance of speech/speaker recognition cannot be improved sufficiently, even with GMMs or hidden Markov models trained with a matched reverberant condition [4, 33]. This means that GMMs and hidden Markov models cannot handle severe late reverberations precisely. We can see the effect of the dereverberation step in speaker recognition in papers such as [18, 21, 34].
3 Outline of blind dereverberation
3.1 Dereverberation based on GSS
where f is the frame index, H(ω) is the STFT of the impulse response, S(f,ω) is the STFT of clean speech s, and H(d,ω) denotes the part of H(ω) corresponding to the frame delay d. That is, with a long impulse response, the channel distortion is no longer of a multiplicative nature in a linear spectral domain, but is instead convolutional.
where α is the noise overestimation factor, β is the spectral floor parameter for avoiding negative or underflow values, is the power spectrum of estimated clean speech, and is the STFT of the impulse response, which can be blindly estimated by the MCLMS algorithm method mentioned in . D is the number of reverberation windows.
Previous studies have shown that GSS with an arbitrary exponent parameter is more effective than power SS for noise reduction [23, 24]. In this paper, GSS is used to suppress late reverberation, and early reverberation is compensated by subtracting the cepstral mean of the utterance at the feature extraction stage.
where , is the spectrum of estimated clean speech and n is the exponent parameter. When n=1, Equation 5 is a power spectral subtraction-based method.
3.2 Compensation parameter estimation for GSS by MCLMS
A variable step-size unconstrained MCLMS (VSS-UMCLMS) algorithm was proposed to minimize the cost function J in the time-domain . Wang et al.  extended the time-domain VSS-UMCLMS algorithm to the frequency domain to estimate the compensation parameters for GSS-based dereverberation.
where H n (d,l) is the l th frame of the n th impulse response at correspond frame d. If the SIMO system is blindly identifiable, the matrix RX+ is rank deficient by 1 (in the absence of noise) and the channel impulse responses can be uniquely determined.
where and is the estimated model filter at frame d. Here, the tilde in distinguishes this instantaneous value from its mathematical expectation .
By minimizing the cost function J in Equation 16, the impulse response can be blindly derived.
3.3 Dereverberation method based on multiple-step linear prediction
where L is the linear prediction order and wm,i is the prediction coefficient. When D=1, we have multi-channel linear prediction. To calculate the appropriate wm,i, the present signal of the m th microphone x m (t) should be presented as the sum of the weighted signals of the previous D samples (first term of Equation 19) and signal d m (t) without late reverberation (second term of Equation 19).
After the optimization of wm,i, the dereverberant speech can be calculated by the SS method. In , the wm,i are calculated by minimizing the mean square energy of the prediction residual.
4 Combination method and its efficient computation
It is difficult to determine the optimum exponent parameter n and the noise overestimation factor α for GSS. In this study, we use a combination of the various speaker model likelihoods with different compensation parameter sets.
where is the likelihood produced by the k th speaker model with the i th compensation parameter set. K is the number of registered speakers and I denotes the number of compensation parameter sets. The speaker with the maximum likelihood is determined as the target speaker. As a result of this procedure, special tuning is not necessary for GSS.
where T L equals γ T F . The computational cost has therefore been decreased compared with the conventional combination method.
5.1 Experimental setup
Firstly, the proposed method for hands-free speaker identification was evaluated using artificial reverberant speech for determining the most suitable parameters. Then we implemented the method for real reverberant speech with suited parametersc.
In order to compare our work with other dereverberation method. We compared the performance of our proposed method and multi-step linear prediction  (MSLP) both in artificial and real reverberant environment.
Eight multi-channel impulse responses were selected from the Real World Computing Partnership (RWCP) sound scene database  and the CENSREC-4 database . These were convoluted with clean speech to create artificial reverberant speech. A large-scale database, the Japanese Newspaper Article Sentence (JNAS)  corpus, was used as clean speech. The utterances in the training data were composed of 130 male and female speakers, with 10 utterances taken from each. Each speaker gave 20 utterances for the test data. The average time for all utterances was about 5.8 s.
Details of recording conditions for impulse response measurement
CENSREC-4 database for training
Japanese style room
Japanese style bath
RWCP database for test
Echo room (cylinder)
Tatami-floored room (S)
Tatami-floored room (L)
Echo room (panel)
Channel numbers corresponding to Figure 5 using for dereverberation
1, 3, 5, 7
17, 21, 25, 29
1, 5, 9, 13
Conditions for speaker recognition
25 dimensions with CMN
(12 MFCCs + Δ + Δ power)
GMMs with 128 diagonal
Conditions for GSS-based dereverberation
Number of reverberant windows D
Spectral floor parameter β
Description of each speaker recognition method
Clean speech models
2 (Method in )
Clean speech models
4 (MSLP-based method)
based on MSLP-GSS
5 (Proposed method)
based on MCLMS-GSS
5.2 Experimental results
5.2.1 Experimental results of artificial reverberant speech
Distant-talking speaker recognition rates of artificial data (%)
Number of impulse response
condition for test
Comparison of results of artificial data with different compensation parameter sets and combination methods for speaker identification
Number of impulse response
condition for test (%)
Parameters (n, α)
Comparison of results of artificial data with different parameter of β and combination methods for speaker identification
Number of impulse response
condition for test (%)
5.2.2 Experimental results of real reverberant speech
Speaker recognition rates in real environment
based on LTLSS
Previously, Wang et al. proposed a blind dereverberation method based on GSS that employed MCLMS for hands-free speech recognition . In this study, we applied this method to hands-free speaker identification. However, in the speaker identification field, the method proposed in  performed worse than the baseline method. This is the opposite result to that for speech recognition. We addressed this problem by training speaker models using dereverberant speech, which was obtained by suppressing reverberation from arbitrary artificial reverberant speech. The reverberant speech for test data was also compensated using MCLMS-GSS-based dereverberation. By combining various compensation parameter sets for GSS and efficiently calculating the speaker likelihoods, a more robust result was obtained without parameter tuning. Based on a dereverberant speech models, the proposed method achieved a recognition rate of 93.6%, which compares well with conventional CMN with beamforming using clean speech models (49.0%), and reverberant speech models (88.4%). In addition, the method introduced in this paper does not increase the computational cost over that of previous methods. Furthermore, we implemented this method in a real environment with optimal compensation parameters estimated from an artificial environment. The proposed technique achieves a recognition rate of 87.8%, compared with 72.5% using a reverberant speech model. We also compared our proposed method with other dereverberation methods based on MSLP-GSS, both in artificial and real environments, under the same conditions of the SS method. The proposed method achieved a recognition rate of 91.7%, compared with 90.6% using MSLP-GSS, in an artificial environment, and 87.8% compared with 83.8% in a real environment.
a Delay-and-sum beamforming reduces the directivity of each microphone channel, especially when using many microphones that are far away from each other (as in the test condition). In our previous work , beamforming was shown to produce better results. The time delay information was calculated according to each speech recording.
b Details of the experimental setup are described in section 5.
c For real reverberant speech, the processing step is the same as for artificial reverberant speech.
d For example, to estimate the clean spectrum of the 2i th window W2i, the estimated clean spectra of the 2(i −1)th window W2(i−1), the 2(i −2)th window W2(i−2) were used.
e In this study, the values of I, N, and K, in Equation 21 were set to 10, 5, and 260. γ was 92, i.e., the computational time for the likelihood calculation of K speaker models was 92 times that for feature extraction conducted on a 2.0-GHz Intel(R) Xeon(R) Server running Linux with 12-GB main memory.
This work was partially supported by a research grant from the Tateisi Science and Technology Foundation.
- Huang Y, Benesty J, Chen J: Acoustic MIMO Signal Processing. Berlin: Springer-Verlag; 2006.Google Scholar
- Maganti H, Matassoni M: An auditory based modulation spectral feature for reverberant speech recognition. In Proceedings of INTERSPEECH-2010. Makuhari, Chiba, 26-30 September, Curran Associates, Inc., Red Hook, NY; 2010:570-573.Google Scholar
- Raut C, Nishimoto T, Sagayama S: Adaptation for long convolutional distortion by maximum likelihood based state filtering approach. In Proceedings of the 2006 ICASSP Toulouse, France, 14-19 May 2006 vol. 1. IEEE, Piscataway, 2006; 1133-1136.Google Scholar
- Yoshioka T, Sehr A, Delcroix M, Kinoshita K, Maas R, Nakatani T, Kellermann W: Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition. IEEE Signal Process. Mag 2012, 29(6):114-126.View ArticleGoogle Scholar
- Hughes TB, Kim HS, DiBiase JH, Silverman HF: Performance of an an HMM speech recognizer using a real-time tracking microphone array as input. IEEE Trans. Speech Audio Process 1999, 7(3):346-349. 10.1109/89.759045View ArticleGoogle Scholar
- Furui S: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process 1981, 29(2):254-272. 10.1109/TASSP.1981.1163530View ArticleGoogle Scholar
- Liu F, Stern R, Huang X, Acero A: Efficient cepstral normalization for robust speech recognition. Proceedings of the workshop on Human Language Technology Princeton, 69–74 (Association for Computational Linguistics, Stroudsburg, 1993)View ArticleGoogle Scholar
- Lebart K, Boucher J, Denbigh P: A new method based on spectral subtraction for speech dereverberation. Acta Acustica 2001, 87: 359-366.Google Scholar
- Gelbart D, Morgan N: Double the trouble: handling noise and reverberation in far-field automatic speech recognition. In INTERSPEECH 2002. Denver, 16-20 September, 2002; 968-971.Google Scholar
- Gelbart D, Morgan N: Evaluating long-term spectral subtraction for reverberant ASR. In ASRU 2001. Madonna di Campiglio, Italy, 9-13 December 2001;View ArticleGoogle Scholar
- Wu M, Wang D: A two-stage algorithm for one-microphone reverberant speech enhancement. IEEE Trans. ASLP 2006, 14(3):774-784.Google Scholar
- Habets EA: Multi-channel speech dereverberation based on a statistical model of late reverberation. In Proceedings of IEEE ICASSP. Philadelphia, 18-23 March vol. 4, IEEE, Piscataway; 2005:173-176.Google Scholar
- Sadjadi SO, Hasnen JHL: Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions. In Proceedings of IEEE ICASSP. Prague, Czech Republic, 22-27 May 2011; 5448-5451.Google Scholar
- Gannot S, Moonen M: Subspace methods for multimicrophone speech dereverberation. EURASIP J. Appl. Signal Processv 2003, 2003(1):1074-1090.View ArticleGoogle Scholar
- Kinoshita K, Delcroix M, Nakatani T, Miyoshi M: Spectral subtraction steered by multi-step forward linear prediction for single channel speech dereverberation. In Proceedings of IEEE ICASSP 2006. Toulouse, France, 14-19 May 2006; 817-820.Google Scholar
- Boll S: Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoustics Speech Signal Process 1979, 27(2):113-120. 10.1109/TASSP.1979.1163209View ArticleGoogle Scholar
- Delcroix M, Hikichi T, Miyoshi M: Precise dereverberation using multi-channel linear prediction. IEEE Trans. ASLP 2007, 15(2):430-440.Google Scholar
- Jin Q, Schultz T: A Waibel, Far-field speaker recognition. IEEE Trans. ASLP 2007, 15(7):2023-2032.Google Scholar
- Subramaniam S, Petropulu AP, Wendt C: Cepstrum-based deconvolution for speech dereverberation. IEEE Trans. Speech Audio Process 1996, 4(5):392-396. 10.1109/89.536934View ArticleGoogle Scholar
- Kinoshita K, Delcroix M, Nakatani T, Miyoshi M: Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction. IEEE Trans. Audio Speech Lang. Process 2009, 17(4):534-545.View ArticleGoogle Scholar
- Jin Q, Pan Y, Schultz T: Far-field speaker recognition. In Proceedings ICASSP 2006. Toulouse, France, 14-19 May vol. 1 IEEE, Piscataway; 2006:937-940.Google Scholar
- Wang L, Odani K, Kai A: Dereverberation and denoising based on generalized spectral subtraction by nutil-channel LMS algorithm using a small-scale microphone array. Eurasip J. Adv. Signal Process 2012., 2012(12):Google Scholar
- Sim BL, Tong YC, Chang JS, Tan CT: A parametric formulation of the generalized spectral subtraction method. IEEE Trans. Speech Audio Process 1998, 6(4):328-337. 10.1109/89.701361View ArticleGoogle Scholar
- Inoue T, Saruwatari H, Takahashi Y, Shikano K, Kondo K: Theoretical analysis of musical noise in generalized spectral subtraction based on higher-order statistics. IEEE Trans. Audio Speech Lang. Process 2011, 19(6):1770-1779.View ArticleGoogle Scholar
- Wang L, Nakagawa S, Kitaoka N: Blind dereverberation based on CMN and spectral subtraction by multi-channel LMS algorithm. In Proceedings of InterSpeech 2008. Brisbane, 22-26; September 2008:1032-1035.Google Scholar
- Reynolds DA: Speaker identification and verification using Gaussian mixture speaker models. Speech Commun 1995, 17: 91-108. 10.1016/0167-6393(95)00009-DView ArticleGoogle Scholar
- Reynolds DA, Quatieri TF, Dunn R: Speaker verification using adapted Gaussian mixture models. Dig. Signal Process 2000, 10(1-3):19-41. 10.1006/dspr.1999.0361View ArticleGoogle Scholar
- Wang L, Kitaoka N, Nakagawa S: Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM. Speech Commun 2007, 49(6):501-513. 10.1016/j.specom.2007.04.004View ArticleGoogle Scholar
- Farrell K, Mammone R, Assaleh K: Speaker recognition using neural networks and conventional classifiers. IEEE Trans. on Speech Audio Process 1994, 2(1):194-205. 10.1109/89.260362View ArticleGoogle Scholar
- Campbell W, Campbell J, Reynolds D, Singer E, Torres-Carrasquillo P: Support vector machines for speaker and language recognition. Comput. Speech Lang 2006, 20(2–3):210-229.View ArticleGoogle Scholar
- Kenny P, Ouellet P, Dehak N, Gupta V, Dumouchel P: A study of inter-speaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process 2008, 15(7):980-988.View ArticleGoogle Scholar
- Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process 2011, 19(4):788-798.View ArticleGoogle Scholar
- Kingsbury B, Morgan N: Recognizing reverberant speech with RASTA-PLP. In Proceedings of IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP). Munich, 21-24 April vol.2 IEEE, Piscataway; 1997:1259-1262.Google Scholar
- Surendran AC, Flanagan JL: Stable dereverberation using microphone arrays for speaker verification. J. Acoust. Soc. Am 1994, 96(5):3261-3262.View ArticleGoogle Scholar
- Huang Y, Benesty J: Adaptive blind channel identification: multi-channel least mean square and Newton algorithms. In ICASSP Orlando, 13-17 May vol. 2. IEEE, Piscataway, 2002; 1637–1640Google Scholar
- Huang Y, Benesty J: Adaptive multichannel least mean square and Newton algorithms for blind channel identification. Signal Process 2002, 82: 1127-1138. 10.1016/S0165-1684(02)00247-5View ArticleGoogle Scholar
- Huang Y, Benesty J, Chen J: Optimal step size of the adaptive multi-channel LMS algorithm for blind SIMO identification. IEEE Signal Process. Lett 2005, 12(3):173-175.View ArticleGoogle Scholar
- Wang L, Kitaoka N, Nakagawa S: Distant-talking speech recognition based on spectral subtraction by multi-channel LMS algorithm. IEICE Trans. Inf. Syst. 2011, E94-D(3):659-667. 10.1587/transinf.E94.D.659View ArticleGoogle Scholar
- Nakamura S, Hiyane K, Asano F, Nishiura T, Yamada T: Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition. In Proceedings of LREC 2000. Athens, Greece, 31 May - 2 June 2000; 965-968.Google Scholar
- Nishiura T, Nakayama M, Denda Y, Kitaoka N, Yamamoto K, Yamada T, Tsuge S, Miyajima C, Fujimoto M, Takiguchi T, Tamura S, Kuroiwa S, Takeda K, Nakamura S: Evaluation framework for distant-talking speech recognition under reverberant environments. In Proceedings of INTERSPEECH 2008. Brisbane, Australia, 22-26 September 2008; 968-971.Google Scholar
- Itou K, Takeda K, Kakezawa T, Matsuoka T, Kobayashi T, Shikano K, Itahashi S, M Yamamoto: Janpanese speech corpus for large vocabulary continuous speech recognition research. J. Acoust. Soc. Jpn. (E) 1999, 20(3):199-206. 10.1250/ast.20.199View ArticleGoogle Scholar
- Patrick A Naylor: Signal-based performance evaluation of dereverberation algorithms. J. Electrical Comput. Eng 2010., 2010(5): Article ID 127513. doi:10.1155/2010/127513Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.