We propose a new low complexity, low delay, and fast converging frequency-domain adaptive algorithm for network echo cancellation in VoIP exploiting MMax and sparse partial (SP) tap-selection criteria in the frequency domain. We incorporate these tap-selection techniques into the multidelay filtering (MDF) algorithm in order to mitigate the delay inherent in frequency-domain algorithms. We illustrate two such approaches and discuss their tradeoff between convergence performance and computational complexity. Simulation results show an improvement in convergence rate for the proposed algorithm over MDF and significantly reduced complexity. The proposed algorithm achieves a convergence performance close to that of the recently proposed, but substantially more complex improved proportionate MDF (IPMDF) algorithm.

1. Introduction

The popularity of voice over internet protocol (VoIP) coupled with an increasing expectation for natural communication over packet-switched networks has called for improvement in VoIP technologies in recent years. As network systems migrate from traditional voice telephony over public switch telephone network (PSTN) to packet-switched networks for VoIP, improving the quality of services (QoS) for VoIP has been and will remain a challenge [1, 2]. As described in [1], several factors that can affect the QoS for VoIP include the choice of speech coder-decoders (codecs) [3], algorithmic processing delay [4], and packet loss [5], where the algorithmic delay is one of the significant factors for determining the budget for delay introduced by network echo cancellers. The problem of network echo is introduced by the impedance mismatch between the 2- and 4-wire circuits of a network hybrid [6], which occurs in VoIP systems, where analog phones are involved in PC-to-phone or phone-to-phone connections [7], where "PC" represents all-digital terminals. Acoustic echo, on the other hand, occurs when hands-free conversations are conducted [8]. Transmission and algorithmic processing cause the echo to be transmitted back to the originator with a delay, hence impeding effective communication. As a result, network echo cancellation for IP networks has received increased attention in recent years. For effective network echo cancellation (NEC), adaptive filters such as shown in Figure 1 have been employed for the estimation of network impulse response. Using the estimated impulse response, a replica of the echo is generated and subtracted from the far-end transmitted signal. The main aim of this work is therefore to address the problem of (NEC) with reduced complexity and low algorithmic delay through the use of adaptive algorithms.

In VoIP systems, where traditional telephony equipment is connected to the packet-switched network, the resulting network impulse response such as shown in Figure 2 is typically of length 64–128 milliseconds. This impulse response exhibits an "active" region in the range of only 8–12 milliseconds duration, and, consequently, it is dominated by "inactive" regions, where magnitudes are close to zero making the impulse response sparse. The "inactive" region is principally due to the presence of bulk delay caused by unknown network propagation, encoding, and jitter buffer delays [7]. One of the first algorithms which exploits this sparse nature for the identification of network impulse responses is the proportionate normalized least-mean-square (PNLMS) algorithm [9], where each filter coefficient is updated with a step-size which is proportional to the coefficient magnitudes. The PNLMS algorithm is then shown to outperform classical adaptive algorithms with a uniform step-size across all filter coefficients such as the normalized least-mean-square (NLMS) algorithm for NEC application [9]. Although the PNLMS algorithm achieves fast initial convergence, its rate of convergence reduces significantly. This is due to the slow convergence of filter coefficients having small magnitudes. To mitigate this problem, subsequent improved versions such as the improved PNLMS (IPNLMS) [10] and the improved IPNLMS [11] algorithms were proposed. These algorithms share the same characteristic of introducing a controlled mixture of proportionate (PNLMS) and nonproportionate (NLMS) adaptation. Consequently, these algorithms perform better than PNLMS for sparse impulse responses.

The increase in VoIP traffic in recent years has resulted a high demand for high density NEC in which it is desirable to run several hundred echo cancellers in one processor core. Defining as the length of the impulse response, the PNLMS and IPNLMS algorithms require approximately and number of multiplications per sample iteration respectively compared to for the substantially slower converging NLMS algorithm. Hence, in order to reduce the computational complexity of PNLMS and IPNLMS, the sparse partial update NLMS (SPNLMS) algorithm was recently proposed [12], which combines two adaptation strategies: sparse adaptation for improving rate of convergence and partial-updating for complexity reduction. For the majority of adapting iterations, under the sparse partial (SP) adaptation, only those taps corresponding to tap-inputs and filter coefficients both having large magnitudes are updated. However, from time to time the algorithm gives equal opportunity for the coefficients with smaller magnitude to be updated by employing MMax tap-selection [13]. This only updates those filter taps corresponding to the largest magnitude tap-inputs. It is noted that partial update strategies have also been applied to the filtered-X LMS (FxLMS) algorithms as described in [14, 15]. Other ways to reduce the complexity of adaptive filtering algorithm include the use of a shorter adaptive filter to model only the active region of the sparse impulse responses as described in [16].

It is well known that frequency-domain adaptive filtering such as the fast-LMS (FLMS) algorithm [17] offers an attractive means of achieving efficient implementation. In contrast to time-domain adaptive filtering algorithms, frequency-domain adaptive algorithms incorporate block updating strategies, whereby the fast-Fourier transform (FFT) algorithm [18] is used together with the overlap-save method [19, 20]. However, one of the main drawbacks of these frequency-domain approaches is the delay introduced between the input and output, which is generally equal to the length of the adaptive filter. Since reducing the algorithmic processing delay for VoIP applications is crucial, frequency-domain adaptive algorithms with low delay are desirable especially for the identification of long network impulse responses. The multidelay filtering (MDF) algorithm [21] has been proposed in the context of acoustic echo cancellation for mitigating the problem of delay. This algorithm partitions an adaptive filter of length into blocks each of length . As a result, the delay of MDF algorithm is reduced by a factor of compared to FLMS. The benefit of low delay for MDF over FLMS in the context of NEC has been shown in [22].

The aim of this work is to develop a low complexity, low delay, and fast converging adaptive algorithm for identifying sparse impulse responses presented in the problem of NEC for VoIP applications. We achieve this by incorporating the MMax and SP tap-selection into the frequency-domain MDF structure. As will be shown in this work, applying the MMax and SP tap-selection to frequency-domain adaptive filtering presents significant challenges since the time-domain sparse impulse response is not necessarily sparse in the frequency domain. We first review in Section 2 the SPNLMS and MDF algorithms. We then propose, in Section 3.1, to incorporate MMax tap-selection into MDF structure for complexity reduction. We show how this can be achieved using two approaches and we compare their tradeoffs in terms of complexity and performance. We next illustrate, in Section 3.2, how the sparseness of the Fourier transformed impulse response varies with the number of blocks in the MDF structure. Utilizing these results, we show how the SP tap-selection can be incorporated into the MDF structure for fast convergence and low delay. The computational complexity for the proposed algorithm is discussed in Section 3.3. In Section 4, we present the simulation results and discussions using both colored Gaussian noise (CGN) and speech inputs for NEC. Finally, conclusions are drawn in Section 5.

2. Review of the SPNLMS and MDF Algorithms

We first review the problem of sparse system identification. With reference to Figure 1, we define tap-input vector , network impulse response , and coefficients of adaptive filter as

(1)

where is the length of and is defined as vector/matrix transposition. The adaptive filter , which is chosen to be of the same length as , will model the unknown impulse response using the near-end signal

(2)

where is the additive noise.

2.1. The SPNLMS Algorithm

The sparse partial (SP) update NLMS (SPNLMS) algorithm [12] utilizes the sparse nature of network impulse response. This algorithm incorporates two updating strategies: MMax tap-selection [13] for complexity reduction and SP adaptation for fast convergence. Although it is normal to expect that adapting filter coefficients using partial-updating strategies suffers from degradation in convergence performance, it was shown in [12] that such degradation can be offset by the SP tap-selection.

The updating equation for SPNLMS is given by

(3)

where is the step-size, is the regularization parameter and is defined as the -norm. As shown in Figure 1, the a priori error is given by

(4)

The tap-selection matrix

(5)

in (3) determines the step-size gain for each filter coefficient and is dependent on the MMax and SP updating strategies for SPNLMS. The relative significance of these strategies is controlled by the variable such that for , elements for are given by

(6)

and for ,

(7)

The variables and define the number of selected taps for MMax and SP, respectively, and the MMax tap-selection criteria given by (6) for the time-domain is achieved by sorting using, for example, the SORTLINE [23] and short sort [24] routines. It has been shown in [12] that, including the modest overhead for such sorting operations, the SPNLMS algorithm achieves lower complexity than NLMS. To summarize, SPNLMS incorporates MMax tap-selection given by (6) and SP tap-selection given by (7) for complexity reduction and fast convergence, respectively.

2.2. The MDF Algorithm

The MDF algorithm [21] mitigates the problem of delay inherent in FLMS [17] by partitioning the adaptive filter into subfilters each of length , with and . As a consequence of this partitioning, the delay for the MDF is reduced by a factor of compared to FLMS. To describe the MDF algorithm, we define as the frame index and the following time-domain quantities given by

(8)

(9)

(10)

(11)

(12)

We also define a tap-input vector

(13)

where is defined as the block index and the subfilters in (10) are given as

(14)

We next define as the Fourier matrix and a matrix

(15)

with diagonal elements containing the Fourier transform of for the block. We also define the following frequency-domain quantities [8]

(16)

where is the null matrix and is the identity matrix. The MDF algorithm is then given by [21]

(17)

(18)

(19)

(20)

where denotes complex conjugate, is the forgetting factor and is the step-size with [21]. Letting be the input signal variance, the initial regularization parameters [8] are and . For and , MDF is equivalent to FLMS [17].

3. The Sparse Partial Update Multidelay Filtering Algorithm

Our aim is to utilize the low delay inherent in MDF as well as the fast convergence and reduced complexity brought about by combining SP and MMax tap-selection for NEC. We achieve this aim by first describing how MMax tap-selection given in (6) can be incorporated into MDF. We next show, using an illustrative example, how the sparse nature of the impulse response is exploited in the frequency domain which then allows us to integrate the SP tap-selection given by (7). The proposed MMax-MDF and SPMMax-MDF algorithms are described by (17), (18), (19), and

(21)

The difference between (20) and (21) is that the latter employs , and we will describe in the following how this diagonal matrix can be obtained for the cases of MMax and SP tap-selection criterion.

3.1. The MMax-MDF Algorithm

As described in Section 2.1, the MMax tap-selection given in (6) is achieved by sorting . In the frequency-domain MDF implementation, however, elements in are normalized by elements in the vector defined in (19). Hence, for the frequency-domain MMax tap-selection, we select taps corresponding to the maxima of the Fourier transformed tap-inputs normalized by with . For this tap-selection strategy, the concatenated Fourier transformed tap-input across all blocks is given as

(22)

where is defined in (15) and denotes the element of . Elements of the diagonal MMax tap-selection matrix are given by

(23)

for with . Due to the normalization by in (23), we denote this algorithm as MMax-MDF_{N} and define a vector containing the subselected Fourier transformed tap-inputs as

(24)

The diagonal matrix for MMax-MDF_{N} is then given by

(25)

Hence, it can be seen that elements in the vector are obtained from the block of the selected Fourier transformed tap-inputs contained in with indices from to . The adaptation of MMax-MDF_{N} algorithm is described by (23)–(25) and (21).

It is noted that the MMax-MDF_{N} algorithm requires additional divisions for tap-selection due to the normalization by in (23). Hence, to reduce the complexity even further, we consider an alternative approach where such normalization is removed so that elements of the diagonal tap-selection matrix are expressed as

(26)

for and . As opposed to MMax-MDF_{N}, we denote this scheme as the MMax-MDF algorithm since normalization by is removed. Accordingly, elements in for MMax-MDF are computed using (24) and (25), where is obtained from (26). Hence, the adaptation of MMax-MDF algorithm is described by (24)–(26) and (21).

As will be shown in Section 4, the degradation in convergence performance due to tap-selection is less in MMax-MDF_{N} than in MMax-MDF. However, since reducing complexity is our main concern, we choose to use MMax-MDF as our basis for reducing the computational complexity of the proposed algorithm. As will be described in Section 3.2, the proposed algorithm incorporates the SP tap-selection to achieve, in addition, a fast rate of convergence.

3.2. The SPMMax-MDF Algorithm

We show in this section how the SP tap-selection can be incorporated into the frequency domain. The SP tap-selection defined by (7) was proposed to achieve fast convergence for the identification of sparse impulse responses. We note that the direct implementation of SP tap-selection into frequency-domain adaptive filtering such as FLMS is inappropriate since impulse response in the transformed domain is not necessarily sparse. To illustrate this, we study the effect of on the concatenated impulse response of the MDF structure defined by

(27)

where

(28)

for is the subfilter to be identified and

(29)

is a matrix constructed by Fourier matrices each of size . As indicated in (28), the impulse response is partitioned into smaller blocks in the time domain as increases. Figure 3 shows the variation of the magnitude of for and , where MDF is equivalent to FLMS for . As can be seen from the figure, the magnitude of is not sparse for . Hence SP tap-selection in the MDF structure will not improve the convergence performance for . For the cases where , the number of taps with small magnitudes in increases with , that is, the number of subfilters. In Figure 4, we show how the sparseness of the magnitude of varies with using the sparseness measure given by [25, 26]

(30)

where denotes -norm and it was shown in [26, 27] that increases with the sparseness of , where . As can be seen from Figure 4, the magnitude of becomes more sparse as increases. As a consequence, we would expect SP tap-selection to improve the convergence rate of MDF for sparse system identification.

Although integrating SP tap-selection can be beneficial in the frequency domain, it requires careful consideration since as can be seen from (13), the length of the input frame is compared to for the adaptive filter. This causes a length mismatch between and . We overcome this problem by concatenating all frequency-domain subfilters, to obtain , which is of length , that is,

(31)

Since SPMMax-MDF aims to obtain fast convergence with low complexity, our approach of achieving SP tap-selection is then to select elements from for , where elements can be obtained from defined in (22). Elements of the diagonal tap-selection matrix are therefore given by

(32)

for . Employing (32), the diagonal matrix in (21) for the SP tap-selection can be described by (24) and (25).

It should be noted that additional simulations performed using selection criteria by sorting showed no significant improvement for SPMMax-MDF as it was found that the sparseness effect of dominates the selection process compared to the term , which results in selecting the same filter coefficients for adaptation as would be selected using (32). In addition, normalization by incurs an extra divisions, which is not desirable for our VoIP application. As a final comment, since the number of the "active" coefficients of reduces with increasing , we choose to be

(33)

This enables to reduce with increasing hence allowing adaptation to be more concentrated on the "active" region. A good choice of has been found experimentally to be given by . The proposed SPMMax-MDF algorithm is described in Algorithm 1.

3.3. Computational Complexity

Although it is well known, from the computational complexity point of view, that is the optimal choice for the MDF algorithm, it nevertheless is more efficient than time-domain implementations even for [8]. As shown in Algorithm 1, the proposed SPMMax-MDF computes using tap-selection matrix , which is defined by (26) and (32) for and , respectively. We show in Table 1 the number of multiplications and divisions required for MDF, MMax-MDF, MMax-MDF_{N}, and SPMMax-MDF to compute the term . We have also included the recently proposed IPMDF algorithm [22] for comparison. It should be noted that for MMax and SP tap-selection in (26) and (32), no additional computational complexity is introduced since and can be obtained from (18) and (17), respectively. For MMax-MDF_{N}, however, computing the selected filter coefficients for adaptation using (23) incurs additional number of divisions. The complexity for each algorithm for an example case of , , and is shown in Table 2. It can be seen that the complexity of the proposed SPMMax-MDF is approximately of that for the MDF. Compared to MMax-MDF, SPMMax-MDF requires only an additional of multiplications and divisions. However, as will be shown in Section 4, the performance of SPMMax-MDF is better than MMax-MDF. Finally, the complexity of SPMMax-MDF is and of that for the IPMDF algorithm in terms of multiplications and divisions, respectively.

4. Results and Discussions

We present simulation results to illustrate the performance of the proposed SPMMax-MDF algorithm for NEC using a recorded network impulse response with 512 taps [12], as shown in Figure 2. The performance is measured using normalized misalignment defined as

(34)

We used a sampling frequency of 8 kHz and white Gaussian noise (WGN) was added to achieve a signal-to-noise ratio (SNR) of 20 dB. The following parameters for the algorithms are chosen for all simulations [22]: . Step-size control variable has been adjusted for each algorithm so as to achieve the same steady-state performance.

We first compare the variation in convergence of MMax-MDF_{N} and MMax-MDF with using step-size control variables and for MMax-MDF_{N} and MMax-MDF, respectively. We used a CGN input generated by filtering zero-mean WGN through a lowpass filter with a single pole [12]. It can be seen from Figure 5 that for each case of , the degradation in convergence performance due to tap-selection is less for the MMax-MDF_{N} than the MMax-MDF. However, as shown in Tables 1 and 2, MMax-MDF_{N} incurs additional divisions compared to the MMax-MDF algorithm.

We next compare the convergence performance of SPMMax-MDF with MDF and IPMDF using CGN input for in Figure 6. We have used and for all algorithms. We have also used since it was shown in [28] that by such setting, a good balance between complexity reduction and performance degradation due to MMax tap-selection can be reached. As can be seen from the figure, the performance of SPMMax-MDF is close to that for the MDF since for which results in according to (33). Consequently, under the condition of , all the filter coefficients are updated, while under the condition of coefficients are updated. As a result of this, and consistent with any partial update algorithms presented in [28], the performance of SPMMax-MDF approaches that for the MDF. Compared to IPMDF, SPMMax-MDF only requires approximately and of the number of multiplications and division, as indicated in Table 1.

We show in Figure 7 the convergence performance of SPMMax-MDF, MDF, and IPMDF for using CGN input. As before, we have used the same step-size control variable of for all algorithms except for the cases of SPMMax-MDF, where is used to archive the same steady-state performance. It can be seen that for , the proposed SPMMax-MDF algorithm achieves faster rate of convergence in terms of normalized misalignment compared to the more complex MDF during adaptation. Since, as shown in Figure 4, increases with , it can therefore be expected that such improvement can be increased when larger is employed. In addition, as the delay for MDF is reduced by a factor of compared to FLMS, the proposed SPMMax-MDF can archive further delay reduction for larger and thus is desirable for NEC. For the case of and , the number of multiplications and divisions required for each algorithm is shown in Table 2.

Figure 8 shows the performance of the algorithms obtained using a male speech input. Parameters used for each algorithm are the same as that for the previous simulations except that for SPMMax-MDF, where we have used to achieve the same steady-state performance. The computational complexity required for each algorithm is also shown in the figure between square brackets, where the first and the second integers represent the number of multiplications and divisions, respectively. It can be seen that SPMMax-MDF achieves approximately dB improvement in terms of normalized misalignment with lower complexity in comparison to MDF. In addition, the performance of our low cost SPMMax-MDF algorithm approaches that of IPMDF.

5. Conclusions

We have proposed SPMMax-MDF for network echo cancellation in VoIP. This algorithm achieves a faster rate of convergence, low complexity, and low delay by novelly exploiting both the MMax and SP tap-selection in the frequency domain using MDF implementation. We discussed two approaches of incorporating MMax tap-selection into MDF and showed their tradeoff between rate of convergence and complexity. Simulation results using both colored Gaussian noise and speech inputs show that the proposed SPMMax-MDF achieves up to dB improvement in convergence performance with significantly lower complexity compared to MDF. In addition, the performance of our low cost SPMMax-MDF algorithm approaches that of IPMDF. Since the MDF structure has been applied for acoustic echo cancellation (AEC) [21] and blind acoustic channel identification [29], where the impulse responses are nonsparse, the proposed SPMMax-MDF algorithm can also be potentially applied to these applications for reducing computational complexity and algorithmic delay.

Algorithm 1: The SPMMax-MDF algorithm.

,

,

,

,

,

,

,

,

,

,

,

,

,

.

References

Goode B: Voice over internet protocol (VoIP).Proceedings of the IEEE 2002,90(9):1495-1517. 10.1109/JPROC.2002.802005

Chong HM, Matthews HS: Comparative analysis of traditional telephone and voice-over-internet protocol (VoIP) systems.Proceedings of the IEEE International Symposium on Electronics and the Environment (ISEE '04), May 2004, Phoenix, Ariz, USA 106-111.

Kang H-G, Kim HK, Cox RV: Improving the transcoding capability of speech coders.IEEE Transactions on Multimedia 2003,5(1):24-33. 10.1109/TMM.2003.808823

Raake A: Short- and long-term packet loss behavior: towards speech quality prediction for arbitrary loss distributions.IEEE Transactions on Audio, Speech, and Language Processing 2006,14(6):1957-1968.

Radecki J, Zilic Z, Radecka K: Echo cancellation in IP networks.Proceedings of the 45th International Midwest Symposium on Circuits and Systems (MWSCAS '02), August 2002, Tulsa, Okla, USA2: 219-222.

Benesty J, Gay SL: An improved PNLMS algorithm.Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA2: 1881-1884.

Cui J, Naylor PA, Brown DT: An improved IPNLMS algorithm for echo cancellation in packet-switched networks.Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada4: 141-144.

Deng H, Doroslovački M: New sparse adaptive algorithms using partial update.Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada2: 845-848.

Aboulnasr T, Mayyas K: Complexity reduction of the NLMS algorithm via selective coefficient update.IEEE Transactions on Signal Processing 1999,47(5):1421-1424. 10.1109/78.757235

Carini A, Sicuranza GL:Analysis of transient and steady-state behavior of a multichannel filtered-partial-error affine projection algorithm.EURASIP Journal on Audio, Speech, and Music Processing 2007, 2007:-15.

Ferrara ER: Fast implementations of LMS adaptive filters.IEEE Transactions on Acoustics, Speech, and Signal Processing 1980,28(4):474-475. 10.1109/TASSP.1980.1163432

Cooley JW, Tukey JW: An algorithm for the machine calculation of complex Fourier series.Mathematics of Computation 1965,19(90):297-301. 10.1090/S0025-5718-1965-0178586-1

Soo J-S, Pang KK: Multidelay block frequency domain adaptive filter.IEEE Transactions on Acoustics, Speech, and Signal Processing 1990,38(2):373-376. 10.1109/29.103078

Khong AWH, Naylor PA, Benesty J: A low delay and fast converging improved proportionate algorithm for sparse system identification.EURASIP Journal on Audio, Speech, and Music Processing 2007, 2007:-8.

Naylor PA, Sherliker W: A short-sort M-Max NLMS partial-update adaptive filter with applications to echo cancellation.Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '03), April 2003, Hong Kong5: 373-376.

Benesty J, Huang YA, Chen J, Naylor PA: Adaptive algorithms for the identification of sparse impulse responses. In Selected Methods for Acoustic Echo and Noise Control. Edited by: Hänsler E, Schmidt G. Springer, Berlin, Germany; 2006:125-153.

Khong AWH, Naylor PA: Efficient use of sparse adaptive filters.Proceedings of the 40th Asilomar Conference on Signals, Systems and Computers (ACSSC '06), October-November 2006, Pacific Grove, Calif, USA 1375-1379.

Khong AWH, Naylor PA: Selective-tap adaptive filtering with performance analysis for identification of time-varying systems.IEEE Transactions on Audio, Speech, and Language Processing 2007,15(5):1681-1695.

Ahmad R, Khong AWH, Naylor PA: Proportionate frequency domain adaptive algorithms for blind channel identification.Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), May 2006, Toulouse, France5: V29-V32.

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Lin, X., Khong, A.W., Doroslovăcki, M. et al. Frequency-Domain Adaptive Algorithm for Network Echo Cancellation in VoIP.
J AUDIO SPEECH MUSIC PROC.2008, 156960 (2008). https://doi.org/10.1155/2008/156960