Single-channel acoustic echo cancellation in noise based on gradient-based adaptive filtering
© Mahbub et al.; licensee Springer. 2014
Received: 12 November 2013
Accepted: 25 March 2014
Published: 3 May 2014
In this paper, a two-stage scheme is proposed to deal with the difficult problem of acoustic echo cancellation (AEC) in single-channel scenario in the presence of noise. In order to overcome the major challenge of getting a separate reference signal in adaptive filter-based AEC problem, the delayed version of the echo and noise suppressed signal is proposed to use as reference. A modified objective function is thereby derived for a gradient-based adaptive filter algorithm, and proof of its convergence to the optimum Wiener-Hopf solution is established. The output of the AEC block is fed to an acoustic noise cancellation (ANC) block where a spectral subtraction-based algorithm with an adaptive spectral floor estimation is employed. In order to obtain fast but smooth convergence with maximum possible echo and noise suppression, a set of updating constraints is proposed based on various speech characteristics (e.g., energy and correlation) of reference and current frames considering whether they are voiced, unvoiced, or pause. Extensive experimentation is carried out on several echo and noise corrupted natural utterances taken from the TIMIT database, and it is found that the proposed scheme can significantly reduce the effect of both echo and noise in terms of objective and subjective quality measures.
KeywordsAdaptive filter Convergence analysis Echo cancellation Least mean squares algorithm Noise reduction Spectral subtraction Single-channel communication
The phenomenon of acoustic echo occurs when the output speech signal from a loudspeaker gets reflected from different surfaces, like ceilings, walls, and floors and then fed back to the microphone. In its worst case, acoustic echo can cause howling of a significant portion of sound energy [1, 2]. In real life applications, such as a lecture in a large conference hall or in the public address system of a trade fair, the presence of acoustic echo along with the environmental noise is a very common phenomenon, which degrades the speech quality even leading to complete loss of intelligibility.
In order to deal with the problem of acoustic echo cancellation (AEC), conventionally echo suppressors, earphones, and directional microphones have been used, which generally place restrictions on the talkers’ movement . As an alternate of such hardware-based solutions, adaptive filter algorithms are widely being applied where apart from the input channel, a separate echo-free reference channel is required [3–13]. Among different adaptive filter algorithms, the least mean squares (LMS) algorithm and its different variants are very popular for their satisfactory performances and less computational burden [4, 10, 12–14]. Besides these algorithms, the recursive least squares (RLS) algorithm is well-known for its fast convergence at the expense of computational complexity . The adaptive filter algorithms have also been used for acoustic noise cancellation (ANC) .
There are some methods that deal with both acoustic echo and noise cancellation (AENC) [16–18]. The echo canceller used in  utilizes a sub-band noise cancellation scheme. In , echo cancellation is done by an adaptive LMS filter while a linear prediction error filter removes the residual echo and noise. In , a single Wiener filter is employed to simultaneously suppress the echo and noise. It is to be mentioned that all these AENC methods employ more than one microphone, while the solutions using single microphone are favorable in most of the real-life applications.
In this paper, an AENC scheme is proposed which can efficiently deal with the single-channel scenario. First, unlike conventional LMS algorithm, considering the delayed version of the previously echo- and noise-suppressed signal as reference, a gradient-based adaptive LMS algorithm is developed for single channel AEC. Preliminary results obtained by using this idea is reported in . However, in the current paper, analytical proof of convergence towards the optimum Wiener-Hopf solution is presented. Next, a single-channel ANC algorithm based on spectral subtraction with an adaptive spectral floor estimation is developed, which reduces not only the effect of noise but also some residual echo. Finally, analyzing different speech characteristics of the reference and current frames, multiconditional updating constraints are proposed in order to obtain precise control on convergence characteristics. For performance evaluation, extensive experimentation is conducted on several real-life echo and noise corrupted speech signals at different acoustic environments.
2 Problem formulation
where s(n−k0)=[s(n−k0−1),s(n−k0−2),…,s(n−k0−p)] T and v(n−k0)=[v(n−k0−1),v(n−k0−2),…,v(n−k0−p)] T with k0 being a predefined flat delay and a n =[a n (1),a n (2),…,a n (p)] T consists of the coefficients corresponding to the acoustic room transfer function A(z). The order p and coefficient values of A(z) depend on the room characteristics. It is to be noted that in this case, there is no scope of obtaining a separate echo-free reference or a separate noise-only reference, which makes the single-channel AENC problem extremely difficult to handle.
3 Proposed single-channel AENC scheme
3.1 Proposed two-stage setup
where and are the residual echo of the speech and noise portions of the input signal, respectively, and it is assumed that these signals exhibit the properties of white Gaussian noise. Next, e(n) is passed through a spectral subtraction-based single-channel ANC block which produces output that closely resembles s(n) provided that the residual echo-noise portion Ψ(n) becomes very small.
It is to be noted that the task of noise reduction, unlike the proposed AENC scheme, may be carried out prior to the AEC block. However, because of possible nonlinearities introduced by the prior noise reduction block, no proper reference would be available for the single-channel AEC block . Hence, the arrangement shown in Figure 3a is adopted, in which the noise reduction block also serves as a post-processor for attenuating the residual echo.
3.2 Development of proposed gradient-based single-channel LMS AEC scheme
where consists of different lags of cross-correlation between the echo signal x s (n)+x v (n) and the noisy input signal s(n)+v(n), while R(s+v)(s+v) is the auto-correlation matrix of s(n)+v(n). There is no doubt that is the most optimum solution possible. Hence, it is shown that even for a single-channel noise corrupted AEC problem, the most optimum solution can be achieved under the assumptions stated earlier.
3.3 Convergence analysis of the proposed AEC scheme
Thus, it is found that the average value of the weight vector converges to the Wiener-Hopf solution, which is the optimum solution with increasing number of iteration.
3.4 Noise reduction in spectral domain
where the phase (arg[E i (ω)]) is generally assumed to be the phase of the noise corrupted signal without causing significant degradation in terms of loss of intelligibility of the speech signal . It can be seen that an estimate of the magnitude spectrum of the signal can be obtained provided an estimate of noise spectrum is available, which is generally computed during the periods when speech is known a priori not to be present.
Here, α s s is the subtraction factor and β s s is the spectral floor parameter with α s s ≥1 and 0≤β s s ≤1. The task of noise power spectral density estimation is carried out based on the minimum statistics noise estimator proposed in  which can handle the time-varying nature of the noise.
4 Development of adaptive update constraints
The level of cross-correlation
The amount of signal power
The mean square error (MSE) between consecutive estimates of the unknown filter coefficients.
Through extensive experimentation on different speech frames, it is found that the negligibility of the cross-correlation terms r s s (n), , , and (as described after (12)) strongly depends on the voicing characteristics of speech frames and the input noise. Because of inherent periodicity of the voiced speech frame, the degree of cross-correlation between two voiced speech frames of a person becomes higher in comparison to that between two unvoiced speech frames which are random in nature. Regarding signal power, the ratio of power of a voiced speech frame and an unvoiced speech frame is found to be higher in comparison to that of the two voiced speech frames. As white Gaussian noise is considered, the degree of cross-correlation between the speech and noise is found to be negligible and the noise powers in two different frames may not differ significantly. As a result, the effect of input noise is found to be negligible on the power ratio.
The ratio of Pref(n) and Psup(n) is denoted as the power ratio Prs(n) and considered as one of the control characteristics.
where −M/2≤i≤M/2−1 and 0≤j≤(M−1).
Variation of LMS updating performance due to various characteristics of reference and current speech frame
Reference speech sample
Current noise- and echo-corrupted speech sample
LMS update performance
In some cases, it is observed that though the power ratio is very small, quite satisfactory updating is obtained, such as the U-V case shown in Figure 7. Another characteristic observed here is lower value of correlation coefficient Crs(n) with higher value of Pref(n). It is to be mentioned that the proposed AEC algorithm is developed on the assumption of negligibility of the cross correlation between current frame and reference frame. However, since both reference and current frame may belong to the same person, in case of high degree of correlation, the adaptive algorithm would try to suppress portion from the echo-corrupted signal resulting in unusual degradation= in convergence performance. Hence, introducing an upper bound on Crs(n), the second condition is proposed as Condition II: Crs(n)≤Υ 1 and Pref≥β.
The presence of a certain level of noise can be utilized as an advantage in pause instances where generally the updating is not performed. Since noise is considered uncorrelated to itself, updating at frames where only noise is present would be quite satisfactory. In this case, the value of Crs(n) must be very small and thus another condition on updating is proposed as Condition III: Crs(n)≤Υ 2≤Υ 1.
In order to continue the updating, an upper bound on the variation of successive estimates is set as following condition: Condition IV: e c o e f f (n)≤ℵ.
Considering smaller values of ecoeff(n) allows to avoid updating at those instances where abrupt and significant changes occur in the estimated coefficients. In the proposed method, in order to carry out the LMS update, at least one of the above four conditions must be fulfilled.
5 Simulation results and comments
Performance of the proposed algorithm is investigated in different echo-generating environments at various input noise levels considering several male and female utterances available in the TIMIT database . An acoustic room environment is simulated using an FIR filter of length N f , where as per conventional approaches, filter coefficients during the flat delay portion are assumed to be zero. The flat delay time (k0) can be pre-calculated based on the distance between the microphone and the speaker . Because of the implicit zeros corresponding to the flat delay, it is evident that a few number (N f −k0) of unknown coefficients has to be determined. In the proposed method, a smaller step size is used to obtain a smooth convergence.
First, a subjective evaluation is carried out based on the feedback about the quality of the echo- and noise-suppressed signal provided by five individual listeners at different noisy echo-generating environments. From the overall response of the listeners in terms of mean objective score (MOS), a very satisfactory performance of the proposed method is obtained even under severe echo-generating conditions in noise.
which indicates the overall distortion removal.
Performance comparison with varying room acoustics
N f −k0
Avg. ERLE (dB)
Avg. ERLE (dB)
Performance comparison with noise level variation
The problem of echo cancellation in the presence of noise, especially in single-channel environment, is a very challenging task, which has been efficiently tackled in this paper. First, the single-channel AEC block is designed based on the gradient-based adaptive LMS filter where to overcome the problem of getting a separate reference signal, we propose to use the delayed version of the echo-suppressed signal. Such a unique proposal of getting the reference signal is justified by presenting a detailed mathematical proof of achieving the most optimum Wiener-Hopf solution of the estimated filter coefficients, and a convergence analysis is carried out. Moreover, in order to achieve fast and smooth convergence, a set of updating constraints is proposed by analyzing the speech characteristics of different types of speech frames, such as voiced, unvoiced, and pause. In the ANC block, a modified single-channel spectral subtraction method is considered for its robust performance. It is shown that the proposed AENC scheme with updating constraints provides a very satisfactory performance in different echo-generating conditions and various levels of SNR in terms of SDR and ERLE.
Derivation of the solution of the LMS update
- Vaseghi SV: Advanced Digital Signal Processing and Noise Reduction. Wiley, Chichester; 2000.Google Scholar
- Kuo SM, Lee BH: Real-Time Digital Signal Processing. Wiley; 2001.View ArticleGoogle Scholar
- Breining C, Dreiseitel P, Hänsler E, Mader A, Nitsch B, Puder H, Schertler T, Schmidt G, Tilp J: Acoustic echo control - an application of very-high-order adaptive filters. IEEE Signal Process. Mag 1999, 16(4):42-69. 10.1109/79.774933View ArticleGoogle Scholar
- Hänsler E: The hands-free telephone problem: an annotated bibliography. Signal Process 1992, 27(3):259-271. 10.1016/0165-1684(92)90074-7View ArticleGoogle Scholar
- Khong AWH, Naylor PA: Stereophonic acoustic echo cancellation employing selective-tap adaptive algorithms. IEEE Trans. Audio, Speech, Lang. Process 2006, 14(3):785-796.View ArticleGoogle Scholar
- Lindstrom F, Schuldt C, Claesson I: An improvement of the two-path algorithm transfer logic for acoustic echo cancellation. IEEE Trans. Audio, Speech, Lang. Process 2007, 15(4):1320-1326.View ArticleGoogle Scholar
- Wu S, Qiu X, Wu M: Stereo acoustic echo cancellation employing frequency-domain preprocessing and adaptive filter. IEEE Trans. Audio, Speech, Lang. Process 2011, 19(3):614-623.View ArticleGoogle Scholar
- Nath R: Adaptive echo cancellation based on a multipath model of acoustic channel. Circuits, Syst. Signal Process., Springer US 2013, 32(4):1673-1698. 10.1007/s00034-012-9529-4View ArticleGoogle Scholar
- Yukawa M, de Lamare RC, Sampaio-Neto R: Efficient acoustic echo cancellation with reduced-rank adaptive filtering based on selective decimation and adaptive interpolation. IEEE Trans. Audio, Speech, Lang. Process 2008, 16(4):696-710.View ArticleGoogle Scholar
- Hänsler E, Schmidt G: Acoustic Echo and Noise Control: a Practical Approach. Wiley, New York; 2004.View ArticleGoogle Scholar
- Myllylä V: Residual echo filter for enhanced acoustic echo control. Signal Process 2006, 86(6):1193-1205. 10.1016/j.sigpro.2005.07.036View ArticleGoogle Scholar
- Topa R, Muresan I, Kirei BS, Homana I: A digital adaptive echo-canceller for room acoustics improvement. Adv. Electrical Comput. Eng 2004, 10: 450-453.Google Scholar
- Haykin S: Adaptive Filter Theory. Prentice-Hall, Inc., Upper Saddle River, NJ; 1996.Google Scholar
- Schmidt G: Applications of acoustic echo control: an overview. In Proc. Eur. Signal Process. Conf.. EUSIPCO, Vienna; 2004:9-16.Google Scholar
- Widrow B, Glover JRJ, McCool JM, Kaunitz J, Williams CS, Hearn RH, Zeidler JR, Dong JE, Goodlin RC: Adaptive noise cancelling: principles and applications. Proc. IEEE 1975, 63(12):1692-1716.View ArticleGoogle Scholar
- Yasukawa H: An acoustic echo canceller with sub-band noise cancelling. IEICE Trans. Fundamentals Electron. Commun. Comput. Sci 1992, E75–A(11):1516-1523.Google Scholar
- Park SJ, Cho CG, Lee C, Youn DH: Integrated echo and noise canceller for hands-free applications. IEEE Trans. Circuits Syst.-II: Analog Digital Signal Process 2002., 49(3):Google Scholar
- Beaugeant C, Turbin V, Scalart P, Gilloire A: New optimal filtering approaches for hands-free telecommunication terminals. Signal Process 1998, 64(1):33-47. 10.1016/S0165-1684(97)00174-6View ArticleGoogle Scholar
- Mahbub U, Fattah SA: Gradient based adaptive filter algorithm for single channel acoustic echo cancellation in noise. In Proc. Int. Conf. Electrical Computer Engineering (ICECE), 2012 7th International Conference On. Dhaka, 688 Bangladesh; 2012:880-883.View ArticleGoogle Scholar
- Boll S: A spectral subtraction algorithm for suppression of acoustic noise in speech. Proc. IEEE Int. Conf. Acoust. Speech, Signal Process. (ICASSP) ’79 1979, 200-203.View ArticleGoogle Scholar
- Berouti M, Schwartz R, Makhoul J: Enhancement of speech corrupted by acoustic noise. IEEE Conf. Acoust. Speech Signal Process. (ICASSP) 1979, 208-211.Google Scholar
- Lim JS: Evaluation of a correlation subtraction method for enhancing speech degraded by additive white noise. IEEE Trans. Acoust. Speech Signal Process 1978, 26(5):471-472. 10.1109/TASSP.1978.1163129View ArticleGoogle Scholar
- Martin R: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process 2001, 9(5):504-512. 10.1109/89.928915View ArticleGoogle Scholar
- Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL, Zue V: Timit acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, Philadelphia; 1993.Google Scholar
- Guangzeng F, Feng L: A new echo caneller with the estimation of flat delay. In IEEE Region Ten Conf. TENCON 92. Melbourne, Australia; 1992. vol. 1, pp. 1–5, Print ISBN 0-7803-0849-2, DOI- 10.1109/TENCON.1992.271995Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.