 Methodology
 Open access
 Published:
Cascade algorithms for combined acoustic feedback cancelation and noise reduction
EURASIP Journal on Audio, Speech, and Music Processing volumeÂ 2023, ArticleÂ number:Â 37 (2023)
Abstract
This paper presents three cascade algorithms for combined acoustic feedback cancelation (AFC) and noise reduction (NR) in speech applications. A prediction error method (PEM)based adaptive feedback cancelation (PEMbased AFC) algorithm is used for the AFC stage, while a multichannel Wiener filter (MWF) is applied for the NR stage. A scenario with M microphones and 1 loudspeaker is considered, without loss of generality. The first algorithm is the baseline algorithm, namely the cascade Mchannel rank1 MWF and PEMAFC, where a NR stage is performed first using a rank1 MWF followed by a singlechannel AFC stage using a PEMbased AFC algorithm. The second algorithm is the cascade \((M+1)\)channel rank2 MWF and PEMAFC, where again a NR stage is applied first followed by a singlechannel AFC stage. The novelty of this algorithm is to consider an (\(M+1\))channel data model in the MWF formulation with two different desired signals, i.e., the speech component in the reference microphone signal and in the loudspeaker signal, both defined by the speech source signal but not equal to each other. The two desired signal estimates are later used in a singlechannel PEMbased AFC stage. The third algorithm is the cascade Mchannel PEMAFC and rank1 MWF where an Mchannel AFC stage is performed first followed by an Mchannel NR stage. Although in cascade algorithms where NR is performed first and then AFC the estimation of the feedback path is usually affected by the NR stage, it is shown here that by performing a rank2 approximation of the speech correlation matrix this issue can be avoided and the feedback path can be correctly estimated. The performance of the algorithms is assessed by means of closedloop simulations where it is shown that for the considered input signaltonoise ratios (iSNRs) the cascade \((M+1)\)channel rank2 MWF and PEMAFC and the cascade Mchannel PEMAFC and rank1 MWF algorithms outperform the cascade Mchannel rank1 MWF and PEMAFC algorithm in terms of the added stable gain (ASG) and misadjustment (Mis) as well as in terms of perceptual metrics such as the shorttime objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), and signal distortion (SD).
1 Introduction
Acoustic feedback and noise are common problems that corrupt microphone signals and affect the performance of speech and audio signal processing applications and devices, such as hearing aids, public address (PA) systems, incar communication, and teleconferencing systems. Acoustic feedback occurs whenever a signal is captured by a microphone, amplified and played back by a loudspeaker within the same acoustic environment. This acoustic coupling between the microphone (array) and loudspeaker may give rise to instabilities in the system, which translates into signal degradation and, in the worst case, acoustic howling. Different approaches can be found to tackle this problem, with the two most popular being howling suppression and acoustic feedback cancelation (AFC) [1]. AFC solutions rely on a decorrelation of the microphone and loudspeaker signals to obtain an unbiased feedback path estimate [1, 2]. In the literature, many different solutions for AFC can be found using different decorrelation procedures such as probenoise injection [3], timevarying or nonlinear processes in the forward path [4], nullsteering (array) [5], subband implementations [6], and prewhitening [7]. The latter approach has been shown to provide limited perceptual distortion [8, 9]. Similarly, for multimicrophone noise reduction (NR), a wide range of solutions can be found in the literature, where one of the popular algorithms is the multichannel Wiener filter (MWF) [10,11,12], and more recently deep learningbased methods have appeared [13].
Few solutions for combined multimicrophone AFC and NR have been reported in the literature [14, 15]. Similarly to combined acoustic echo cancelation (AEC) and NR, combined AFC and NR can be tackled with integrated and cascade approaches. An integrated approach combines the AFC and NR tasks in a single optimization criterion [14, 15]. A cascade approach consists of an AFC stage and a NR stage which can be combined in two ways, i.e., a multichannel AFC stage followed by a multichannel NR stage, or a singlechannel AFC stage preceded by a multichannel NR stage. The order of the stages has performance implications on the combined system [14, 15].
Existing solutions to combined AFC and NR mainly cover singlemicrophone scenarios [16] and hearing aid applications [5, 14]. In [16], the predictionerror method (PEM)based adaptive filtering with row operations (PEM) algorithm [17] is used in combination with an NR stage based on a minimum mean squared error shorttime logspectral amplitude (MMSELSA) estimation, for a singlemicrophone scenario. In [14] and [15], multiple schemes are presented for combined AFC and NR using a generalized sidelobe canceler (GSC) for the NR stage and a PEMbased AFC stage. In [18], active feedback suppression for one microphone in a hearing aid is proposed using multiple loudspeakers, without considering the presence of noise in the microphone signal. A realtime implementation of a combined NR and feedback suppression method using spectral subtraction in a smartphonebased hearing aid is presented in [19]. In [20], the authors presented integrated and cascade approaches for combined AEC and NR in the context of wireless acoustic sensor and actuator networks. The algorithms in [20] did not consider the presence of a closedloop system, therefore they are not appropriate solutions for combined multimicrophone AFC and NR.
In [21], the authors presented two cascade algorithms for combined multimicrophone AFC and NR for speech applications using a PEMbased AFC algorithm and MWF. The aim of these cascade algorithms is to estimate a desired speech signal without the feedback and noise components, as observed at a chosen reference microphone. A scenario with M microphones and one loudspeaker is considered, without loss of generality. The first algorithm in [21] is the baseline algorithm, namely the cascade Mchannel rank1 MWF and PEMAFC, where a NR stage is performed first using a rank1 MWF followed by a singlechannel AFC stage using the PEMbased AFC algorithm. It is shown by means of simulations that this algorithm does not improve the added stable gain (ASG) in the closedloop system. The second algorithm is the cascade \((M+1)\)channel rank2 MWF and PEMAFC where again a NR stage is applied first followed by a singlechannel AFC stage. The novelty of this algorithm is to consider an (\(M+1\))channel data model in the MWF formulation (i.e., by including the loudspeaker signal) with two different desired signals, i.e., the speech component in the reference microphone signal and in the loudspeaker signal, both defined by the speech source signal but not equal to each other [12]. The two desired signal estimates are later used in a singlechannel PEMbased AFC stage [7, 22]. Although in cascade algorithms where NR is performed first and then AFC, the estimation of the feedback path is usually affected by the NR stage, it is shown in [21] that by performing a rank2 approximation of the speech correlation matrix this issue can be avoided and the feedback path can be correctly estimated.
The contributions of this paper in comparison to [21] are as follows. A third cascade algorithm for AFC and NR using the PEMbased AFC algorithm and MWF is presented, and then the three algorithms are further analyzed and compared. The third algorithm is the cascade Mchannel PEMAFC and rank1 MWF, where an Mchannel AFC stage is performed first followed by an Mchannel rank1 NR stage. A comparison of the performance of the three algorithms is provided based on closedloop simulations using three different scenarios under three signaltonoise ratios (SNRs). It is shown that for the considered input SNRs (iSNRs) both the cascade \((M+1)\)channel rank2 MWF and PEMAFC and the cascade Mchannel PEMAFC and rank1 MWF algorithms outperform the cascade Mchannel rank1 MWF and PEMAFC algorithm in terms of ASG and misadjustment (Mis) as well as in terms of perceptual metrics such as the shorttime objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), and signal distortion (SD). Additionally, the ASG definition is modified to account for the presence of the NR filters in the closedloop system.
The algorithms in [14] and [15] are similar to the ones presented in this paper. However, there are several differences. The algorithms in this paper rely on a voice activity detector (VAD) to estimate statistics of the signals during noiseonly and speechplusnoise periods, while the GSC requires prior knowledge of the desired speech source and loudspeaker location to design the fixed beamformer and blocking matrix. The GSC in [14] and [15] is defined in the time domain, while the NR stage in this paper is performed in the frequency domain. In [15], the combined AFC and NR problem is tackled by using adaptive filters with prefiltering on the output signals of the blocking matrix (noise references), while in [14], one of the proposed schemes uses the loudspeaker signal as an extra input to the adaptive filters. In [14] and [15], the GSC schemes were tested in scenarios where the forward path gain does not increase over time, i.e., with a fixed gain, whereas in this paper a gain profile is used to gradually increase the gain in the closedloop system.
The paper is organized as follows. The signal model is presented in SectionÂ 2. The formulation of the cascade Mchannel rank1 MWF and PEMAFC algorithm is provided in SectionÂ 3. The cascade \((M+1)\)channel rank2 MWF and PEMAFC algorithm is described in SectionÂ 4. The cascade Mchannel PEMAFC and rank1 MWF is described in SectionÂ 5. The computational complexity of the three presented algorithms is analyzed in SectionÂ 6. Simulation results are given in SectionÂ 7, and finally SectionÂ 8 concludes the paper.
2 Signal model
Consider a room with M microphones and L loudspeakers where the aim is to record a desired speech signal, amplify it and play it back in the loudspeakers. The case when \(L=1\) will be considered, without loss of generality, with the speech source signal denoted by s(t), the loudspeaker signal denoted by u(t) and the \(m^{\textrm{th}}\) microphone signal, with \(m =1,\dots , M\), modeled as
where \(H^{(m)}(q,t)\) and \(F^{(m)}(q,t)\) are the transfer function from the speech source position and from the loudspeaker to the \(m^{\textrm{th}}\) microphone, respectively. The latter is also known as the feedback path transfer function. The direct noise signal in the \(m^{\textrm{th}}\) microphone is denoted by \(n^{(m)}(t)\)^{Footnote 1}. The discrete time index is represented by t and \(q^{1}\) is the delay operator, i.e., \(q^{k}u(t) = u(tk)\). The loudspeaker signal can be expressed as
where \(G^{(m)}(q,t)\) is the forward path transfer function for the \(m^{\textrm{th}}\) microphone signal, \(u_{s}(t)\) is the desired speech component, and \(u_n(t)\) is the noise component in the loudspeaker signal. The presence of the forward path creates a closedloop system which introduces signal correlation between the loudspeaker and microphone signals. FigureÂ 1 depicts a block diagram of the closedloop system. It is assumed that the speech source signal can be modeled as
where \(\frac{1}{A(q,t)}\) is an autoregressive (AR) process excited by the white noise signal e(t), which is a common assumption in PEMbased AFC and it is highly timevarying [1, 7, 22] . A combined NR and AFC algorithm aims to estimate the desired speech signal without the feedback and noise components, as observed at a chosen reference microphone \((m=r)\), i.e.,
where \(H^{(r)}(q,t)\) is the transfer function from the speech source position to the reference microphone. Additionally, the speech component including the feedback contribution in the reference microphone signal is expressed as
The STFT domain representation of the timedomain signals will be used here, which is obtained by means of an R samples long analysis window in a WOLA filterbank with \(50\%\) overlap [23]. Therefore, the STFT \(x^{(m)}(\kappa ,l)\) of the \(m^{\textrm{th}}\) microphone signal, \(x^{(m)}(t)\), at frame l can be defined as
with \(\kappa \in \{0,1,\dots , R1 \}\) the frequency bin index, \(l \in \{0,1,\dots , L_f1 \}\) with \(L_f\) being the number of frames, \(\boldsymbol{\mathcal {F}}_{R}\) being the discrete Fourier transform (DFT) matrix of size R and \(g_a(t)\) being an analysis window. Using the STFT representation of each microphone signal, the following \(M \times 1\) STFTdomain microphone vector is defined
Furthermore, an \((M+1) \times 1\) signal vector, consisting of loudspeaker and microphone signals, can be expressed as
where \(s(\kappa ,l)\), \(u_s(\kappa ,l)\), \(u(\kappa ,l)\), and \(\textbf{y}_n(\kappa ,l)\) are the STFTdomain speech source signal, speech component in the loudspeaker signal, loudspeaker signal, and noise component in the microphone and loudspeaker signals, respectively^{Footnote 2}. It is noted that \(\textbf{y}_n(\kappa ,l)\) includes the noise component in the loudspeaker signal (first vector component) as well as its coupling into the microphones, added to the direct noise components in the microphones (all other vector components). The STFTdomain transfer functions from the speech source position to the microphones and from the loudspeaker to the microphones are respectively denoted by \(\textbf{h}(\kappa ,l)\) and \(\textbf{f}(\kappa ,l)\). The timeframe and frequencybin indices l and \(\kappa\) will be mostly omitted in the following for brevity.
The speech correlation matrix is defined as
where \(\Phi _{ss}=E\{ss^{*}\}\),\(\Phi _{su}=E\{s u_s^{*}\}\), \(\Phi _{us}=E\{u_s s^{*}\}=\Phi _{us}^{*}\), \(\Phi _{uu}=E\{u_s u_s^{*}\}\), \(E\{ \cdot \}\) denotes statistical expectation, and \((\cdot )^{*}\) and \((\cdot )^H\) are the conjugate and conjugate transpose operator, respectively. Performing an LDL factorisation on the matrix with the \(\Phi\)â€™s in (11), \(\bar{\textbf{R}}_{\mathbf {yyss}}\) can alternatively be expressed as
where \(\epsilon = \dfrac{\Phi _{su}}{\Phi _{uu}}\) and \(\Gamma = \Phi _{ss}  \dfrac{ \Phi _{su} \Phi _{us}}{\Phi _{uu}}\). It is clear that from the knowledge of \(\bar{\textbf{R}}_{\mathbf {yyss}}\) in (12) alone, \(\textbf{f}\) and \(\textbf{h}\) cannot be uniquely defined whenever there is a nonzero correlation between s and \(u_s\). In SectionÂ 4.1, \(\bar{\textbf{R}}_{\mathbf {yyss}}\) is modeled using a rank2 approximation by assuming that the forward path delay is at least one STFT frame. This delay allows to view the loudspeaker signal as a second source and hence use a rank2 approximation for \(\bar{\textbf{R}}_{\mathbf {yyss}}\). An experimental validation of this assumption is presented in SectionÂ 7.4.
Three different cascade algorithms are presented in the following sections for AFC and NR. The first algorithm performs an Mchannel rank1 MWFbased NR to estimate the contribution of \(s(\kappa ,l)\) and \(u_s(\kappa ,l)\) in the reference microphone, and then a singlechannel AFC is performed on the resulting signals. The second algorithm performs an \((M+1)\)channel rank2 MWFbased NR stage first followed by a singlechannel AFC stage, where the rank2 MWFbased NR is used to estimate the contribution of \(s(\kappa ,l)\) and \(u_s(\kappa ,l)\) in the reference microphone as well as in the loudspeaker, and then a singlechannel AFC is performed on the resulting signals. The third algorithm performs an Mchannel AFC stage first followed by an Mchannel rank1 MWFbased NR stage. In this case, after the Mchannel AFC stage removes the feedback component in each microphone, a rank1 MWFbased NR is used to estimate the contribution of \(s(\kappa ,l)\) in the reference microphone.
3 Cascade Mchannel rank1 MWF and PEMAFC
3.1 NR stage
The objective of the NR stage is to provide an estimate of the speech component in the reference microphone signal. The feedback component will still be present in the output of the NR stage; hence, a singlechannel AFC stage is required to remove it.
In the STFT domain, the correlation matrix of the microphone signal vector \(\textbf{x}\) can be expressed as
where
are the \(M \times M\) microphoneonly speech and noise correlation matrix, respectively. The expressions in (13)â€“(15) are obtained based on the assumption that s and \(\textbf{x}_n\) are uncorrelated. The minimization of the mean squared error (MSE) between the desired signal and the filtered microphone signals defines an optimal filter
with \(d_{\text {NR}} = x_s^{(r)}\) representing the speech component (total contribution of s together with \(u_s\)) in the reference microphone signal. The desired signal estimate \(\hat{d}_{\text {NR}}\) is obtained as
The solution to (16) is the MWF [10, 12], given by
where \(\textbf{e}_r\) selects the \(r^{\textrm{th}}\) column of a matrix.
In practice, by using a VAD, \(\bar{\textbf{R}}_{\textbf{xx}}\) and \(\bar{\textbf{R}}_{\mathbf {xxnn}}\) are first estimated during speechplusnoise periods where the speech source signal and noise are active and noiseonly periods where only the noise is active, i.e.,
where \(\hat{\textbf{R}}_{\textbf{xx}}(\kappa ,l)\) and \(\hat{\textbf{R}}_{\mathbf {xxnn}}(\kappa ,l)\) represent estimates of \(\bar{\textbf{R}}_{\textbf{xx}}\) and \(\bar{\textbf{R}}_{\mathbf {xxnn}}\) at frame l and frequency bin \(\kappa\), respectively. The forgetting factor \(0<\beta <1\) can be chosen depending on the variation of the statistics of the signals, i.e., if the statistics change slowly then \(\beta\) should be chosen close to 1 to obtain longterm estimates that mainly capture the spatial coherence between the microphone signals. The following criterion will then be used to estimate \(\bar{\textbf{R}}_{\mathbf {xxss}}\) [12],
where \(\Vert \cdot \Vert _F\) denotes the Frobenius norm. Spatial prewhitening is applied by pre and postmultiplying by \(\hat{\textbf{R}}^\mathrm {1/2}_{\mathbf{xxnn}}\) and \(\hat{\textbf{R}}^{H/2}_{\mathbf {xxnn}}\), respectively. The solution to (20), (21) is based on a generalized eigenvalue decomposition (GEVD) of the (\(M \times M\)) matrix pencil \(\{ \hat{\textbf{R}}_{\textbf{xx}}, \hat{\textbf{R}}_{\mathbf {xxnn}} \}\) [12, 25]
where \(\hat{\boldsymbol{\Sigma }}_{\textbf{xx}}\) and \(\hat{\boldsymbol{\Sigma }}_{\mathbf {xxnn}}\) are diagonal matrices and \(\hat{\textbf{Q}}\) is an invertible matrix. The rank1 speech correlation matrix estimate \(\hat{\textbf{R}}_{\mathbf{xxss}}\) is then [12]
where \(\hat{\sigma }_{xx,i}\) and \(\hat{\sigma }_{xxnn,i}\) are the ith diagonal element of \(\hat{\boldsymbol{\Sigma }}_{\textbf{xx}}\) and \(\hat{\boldsymbol{\Sigma }}_{\mathbf {xxnn}}\), respectively, corresponding to the ith largest ratio \(\hat{\sigma }_{xx,i}/\hat{\sigma }_{xxnn,i}\). Using (24) and \(\hat{\textbf{R}}_{\textbf{xx}}\) (cfr. (22)) in (18), the rank1 MWF estimate \(\hat{\textbf{w}}\) can be expressed as
The estimate, \(\hat{x}_s^{(r)}\), is obtained as in (17) with \(\hat{\textbf{w}}\) replacing \(\bar{\textbf{w}}\)
The corresponding timedomain signals are obtained by adding the \(L_f\) overlapping windowed frames as
where \(g_s\) is a synthesis window with nonzero values in the interval \(0 \le t \le R1\) and \(\delta _{\textrm{NR}}\) is the delay from the NR stage.
3.2 AFC stage
The NR stage provides an estimate for \(x_s^{(r)}(t)\) (cfr.Â (6)) from which the AFC stage will now estimate \(H^{(r)}(q,t) s(t)\). A singlechannel PEMbased AFC algorithm is used. This kind of algorithms were initially developed in [7, 17], and they provide estimates of both the feedback path and the speech source signal model. The PEMbased AFC algorithm used here is the frequencydomain version presented in [22] (the reader is referred to [22] for a detailed explanation of the AFC algorithm). The algorithm uses an overlapsave (OLS) filterbank to compute convolutions in the frequency domain, which requires a rectangular window. The input signals to the AFC algorithm are the (noisy) loudspeaker signal u and the estimate in (29). A short description of the singlechannel PEMbased AFC algorithm is provided in AlgorithmÂ 1.
A complete description of the cascade Mchannel rank1 MWF and PEMAFC algorithm is provided in AlgorithmÂ 2, with a block diagram provided in Fig.Â 2(a).
4 Cascade (\(M+1\))channel rank2 MWF and PEMAFC
4.1 NR stage
The objective of the NR stage is to provide an estimate of the speech component in the reference microphone signal and in the loudspeaker signal. The feedback component will still be present in the former, hence a singlechannel AFC stage is required to remove it.
In the STFT domain, the correlation matrix of the signal vector \(\textbf{y}\) in (9) can be expressed as
with \(\bar{\textbf{R}}_{\mathbf {yynn}} =E \{ \textbf{y}_n \textbf{y}_n^H \}\) the \((M+1) \times (M+1)\) noise correlation matrix. The final expression in (30) is obtained based on the assumption that s and \(\textbf{n}\) are uncorrelated. The minimization of the mean squared error (MSE) between the desired signals and the filtered microphone and loudspeaker signals defines an optimal filter
with \(\textbf{d}_{\textrm{NR}} = \left[ {u_s\quad x_s^{(r)}}\right] ^T\). The desired signal estimates \(\hat{u}_s\) and \(\hat{x}_s^{(r)}\) are obtained as
The solution to (31) is the MWF [10, 12], given by
In practice, by using a VAD, \(\bar{\textbf{R}}_{\textbf{yy}}\) and \(\bar{\textbf{R}}_{\mathbf {yynn}}\) are first estimated during speechplusnoise periods where the desired speech signal and noise are active, and noiseonly periods where only the noise is active, i.e.,
where \(\hat{\textbf{R}}_{\textbf{yy}}(\kappa ,l)\) and \(\hat{\textbf{R}}_{\mathbf {yynn}}(\kappa ,l)\) represent estimates of \(\bar{\textbf{R}}_{\textbf{yy}}\) and \(\bar{\textbf{R}}_{\mathbf {yynn}}\) at frame l and frequency bin \(\kappa\), respectively. The following criterion will then be used to estimate \(\bar{\textbf{R}}_{\mathbf {yyss}}\) [12],
Assuming an exact speech signal modelSpatial prewhitening is applied by pre and postmultiplying by \(\hat{\textbf{R}}^\mathrm {1/2}_{\mathbf{yynn}}\) and \(\hat{\textbf{R}}^{H/2}_{\mathbf {yynn}}\), respectively. The solution to (36)(37) is based on a GEVD of the \((M+1) \times (M+1)\) matrix pencil \(\{ \hat{\textbf{R}}_{\textbf{yy}}, \hat{\textbf{R}}_{\mathbf {yynn}} \}\) [12, 25]
where \(\hat{\boldsymbol{\Sigma }}_{\textbf{yy}}\) and \(\hat{\boldsymbol{\Sigma }}_{\mathbf {yynn}}\) are diagonal matrices and \(\hat{\textbf{Q}}\) is an invertible matrix. The rank2 speech correlation matrix estimate \(\hat{\textbf{R}}_{\mathbf{yyss}}\) is then [12]
where \(\hat{\sigma }_{yy,i}\) and \(\hat{\sigma }_{yynn,i}\) are the ith diagonal element of \(\hat{\boldsymbol{\Sigma }}_{\textbf{yy}}\) and \(\hat{\boldsymbol{\Sigma }}_{\mathbf {yynn}}\), respectively, corresponding to the ith largest ratio \(\hat{\sigma }_{yy, i}/\hat{\sigma }_{yynn,i}\). Using (40) and \(\hat{\textbf{R}}_{\textbf{yy}}\) (cfr. (38)) in (34), the rank2 MWF estimate \(\hat{\textbf{W}}\) can be expressed as
The estimates \(\hat{u}_{s}\) and \(\hat{x}_s^{(r)}\), are now obtained as in (32)(33) with \(\hat{\textbf{W}}\) replacing \(\bar{\textbf{W}}\)
The corresponding timedomain signals are obtained by adding the \(L_f\) overlapping windowed frames as
4.2 AFC stage
In the AFC stage a singlechannel PEMbased AFC algorithm is used. The PEMbased AFC algorithm used here is the frequencydomain version presented in [22]. The input signals to the AFC algorithm are \(\hat{u}_s\) and \(\hat{x}_s^{(r)}\). A short description of the PEMbased AFC algorithm is provided in AlgorithmÂ 1. Note that in this AFC stage, the estimates of the speech component in the loudspeaker signal (cfr.Â (49)) and in the reference microphone signal (cfr.Â (48)) are used to estimate the feedback path, unlike in SectionÂ 3.2 where the estimate of the speech component in the reference microphone signal (cfr. (29)) and the noisy loudspeaker signal are used.
A complete description of the cascade (\(M+1\))channel rank2 MWF and PEMAFC algorithm is provided in AlgorithmÂ 3, with block diagram provided in Fig.Â 2(b).
5 Cascade Mchannel PEMAFC and rank1 MWF
Assuming an exact speech signal model \(A^{1}(q,t)\) is available (see (4)), a prefilter A(q,Â t) can be applied, such that the timedomain prefiltered loudspeaker and \(m^{\textrm{th}}\) microphone signal can be expressed as
Similarly, the prefiltered version of the signal vector \(\textbf{y}\) in (9) can be expressed as
where \(\tilde{u}(\kappa ,l)\) and \(\tilde{\textbf{x}}(\kappa ,l)\) represent the STFTdomain prefiltered loudspeaker and microphone signals. Similarly, \(\tilde{u}_s(\kappa ,l)\) is the STFTdomain prefiltered desired speech component in the loudspeaker signal and \(\tilde{\textbf{y}}_n(\kappa ,l)\) is the STFTdomain prefiltered noise component in the loudspeaker and microphone signals. The speech correlation matrix can be rewritten as
where \(\Phi _{\tilde{u}\tilde{u}}=E \{\tilde{u} \tilde{u}^{*}\}\), \(\Phi _{ee} = E\{ ee^{*}\}\), \(\Phi _{e \tilde{u}}=E \{e\tilde{u}^{*}\}=0\) and \(\Phi _{\tilde{u}e}=E \{\tilde{u}e^{*} \}=0\). Since (54) is computed in the STFT domain, the crosscorrelation terms would only be zero if there is a delay of at least one STFTframe in the forward path. It can be observed that, after prefiltering, \(\textbf{h}\) and \(\textbf{f}\) can be readily computed from \(\bar{\textbf{R}}_{\tilde{\textbf{y}} \tilde{\textbf{y}}ss}\). In this case, the order of the AFC and NR stages can be inverted so that an Mchannel AFC stage is performed first, which will estimate the speech component (without its feedback contribution) together with the noise component, and then a multichannel NR stage can follow.
5.1 AFC stage
In the AFC stage, a singlechannel PEMbased AFC algorithm is used for each microphone, i.e., M times. The AR model is estimated for each singlechannel PEMbased AFC algorithm. The same stepsize tuning is used for all adaptive algorithms. The PEMbased AFC algorithm used here is the frequencydomain version presented in [22]. The input signals to the AFC algorithm are u and \(x^{(m)}, \forall m\). A short description of the PEMbased AFC algorithm is provided in AlgorithmÂ 1.
5.2 NR stage
A rank1 MWF is used for the NR stage which operates on the microphone signals after the AFC stage.
The STFT domain representation of the timedomain signals will be used here, which is obtained by means of an R samples long analysis window in a WOLA filterbank with \(50\%\) overlap [23]. Therefore, the STFT \(x_f^{(m)}(\kappa ,l)\) of the \(m^{\textrm{th}}\) microphone signal after the AFC stage, \(x_f^{(m)}(t)\), at frame l can be defined as
The STFTdomain multichannel microphone signal after the AFC stage, assuming perfect feedback cancelation, is modeled as
where \(\textbf{x}_{fn}(\kappa ,l)\) is the STFTdomain noise component in the microphone signal after feedback cancelation. The minimization of the mean squared error (MSE) between the desired signal and the filtered feedbackcompensated microphone signals, \(\textbf{x}_{f}\), defines an optimal filter
with \(d_{\text {NR}}= x_{fs}^{(r)}\). The desired signal estimate is then obtained as \(\hat{d}_{\text {NR}} = \bar{\textbf{w}}^H \textbf{x}_{f}\). The solution to (58) is the wellknown MWF [10, 12], given by
where \(\bar{\textbf{R}}_{\textbf{x}_{\textbf{f}} \textbf{x}_{\textbf{f}}} = E\{ \textbf{x}_{f} \textbf{x}_{f}^H \}\), \(\bar{\textbf{R}}_{\textbf{x}_{\textbf{f}} \textbf{x}_{\textbf{f}}\mathbf {ss}} = E \{ \textbf{h} s s^H \textbf{h}^H \}\), and, similarly, \(\bar{\textbf{R}}_{\textbf{x}_{\textbf{f}} \textbf{x}_{\textbf{f}}\mathbf {nn}} = E\{ \textbf{x}_{fn} \textbf{x}_{fn}^H\}\). The final expression in (59) is obtained based on the assumption that s and \(\textbf{x}_{fn}\) are uncorrelated.
In practice, by using a voice activity detector (VAD), \(\bar{\textbf{R}}_{\textbf{x}_{\textbf{f}} \textbf{x}_{\textbf{f}}}\) and \(\bar{\textbf{R}}_{\textbf{x}_{\textbf{f}} \textbf{x}_{\textbf{f}}\mathbf {nn}}\) are first estimated during speechplusnoise periods where the desired speech signal and background noise are active, and noiseonly periods where only the noise is active [26], i.e.,
where \(\hat{\textbf{R}}_{\textbf{x}_{\textbf{f}} \textbf{x}_{\textbf{f}}}(\kappa ,l)\) and \(\hat{\textbf{R}}_{\textbf{x}_{\textbf{f}} \textbf{x}_{{\textbf {f}}}\mathbf {nn}}(\kappa ,l)\) represent estimates of \(\bar{\textbf{R}}_{\textbf{x}_{\textbf{f}} \textbf{x}_{\textbf{f}}}\) and \(\bar{\textbf{R}}_{\textbf{x}_{{\textbf {f}}} \textbf{x}_{{\textbf {f}}}\mathbf {nn}}\) at frame l and frequency bin index \(\kappa\), respectively. The following criterion will then be used to estimate \(\bar{\textbf{R}}_{\textbf{x}_{\mathbf{f}} \textbf{x}_{\textbf{f}}\mathbf {ss}}\) [12],
Spatial prewhitening is applied by pre and postmultiplying by \(\hat{\textbf{R}}^\mathrm {1/2}_{\textbf{x}_{f} \textbf{x}_{f}\mathbf{nn}}\) and \(\hat{\textbf{R}}^{H/2}_{\textbf{x}_{\textbf{f}} \textbf{x}_{\textbf{f}}\mathbf {nn}}\), respectively. The solution to (63)â€“(64) is based on a GEVD of the (\(M \times M\)) matrix pencil \(\{ \hat{\textbf{R}}_{\textbf{x}_{\textbf{f}} \textbf{x}_{\textbf{f}}}, \hat{\textbf{R}}_{\textbf{x}_{\textbf{f}} \textbf{x}_{\textbf{f}}\mathbf {nn}} \}\) [12, 25]
where \(\hat{\boldsymbol{\Sigma }}_{\textbf{x}_{\textbf{f}} \textbf{x}_{\textbf{f}}}\) and \(\hat{\boldsymbol{\Sigma }}_{\textbf{x}_{\textbf{f}} \textbf{x}_{\textbf{f}}\mathbf {nn}}\) are diagonal matrices and \(\hat{\textbf{Q}}\) is an invertible matrix. The speech correlation matrix estimate \(\hat{\textbf{R}}_{\textbf{x}_{f} \textbf{x}_{f}\mathbf{ss}}\) is then [12]
where \(\hat{\sigma }_{x_f x_f,1}\) and \(\hat{\sigma }_{x_f x_fnn,1}\) are the first diagonal element of \(\hat{\boldsymbol{\Sigma }}_{\textbf{x}_{\textbf{f}} \textbf{x}_{\textbf{f}}}\) and \(\hat{\boldsymbol{\Sigma }}_{\textbf{x}_{\textbf{f}} \textbf{x}_{\textbf{f}}\mathbf {nn}}\), respectively, corresponding to the largest ratio \(\hat{\sigma }_{x_f x_f,i}/\hat{\sigma }_{x_f x_fnn,i}\). Using (67) and \(\hat{\textbf{R}}_{\textbf{x}_{\textbf{f}} \textbf{x}_{\textbf{f}}}\) (cfr. (65)) in (59), the rank1 MWF estimate \(\hat{\textbf{w}}\) can be expressed as
The desired signal estimate is then obtained as \(\hat{d} = \hat{\textbf{w}}^H \textbf{x}_{f}\). The timedomain desired signal is obtained by adding the \(L_f\) overlapping windowed frames as
where \(g_s(t)\) is a synthesis window, \(\delta _t = \delta _{\textrm{AFC}}+\delta _{\textrm{NR}}\) is the total delay from both stages and \(\delta _{\textrm{AFC}}\) is the delay from the AFC stage. A complete description of the cascade Mchannel PEMAFC and rank1 MWF algorithm is provided in AlgorithmÂ 4 and a block diagram is provided in Fig.Â 2(c).
6 Computational complexity
In [22], the computational complexity of the singlechannel PEMbased AFC algorithm has been provided as \(O \left( \frac{6R \log _2(R) + 22R + n_A^2+n_A(5+R)}{R/2n_A}\right)\) in terms of real multiplications. To obtain this expression, equal complexity for a real multiplication and a real division is assumed, as well as a complexity of \(R\log _2{R}\) for the fast Fourier transform (FFT) and inverse FFT operations. The NR stage of the rank1 MWF in SectionsÂ 3.1 and 5.2 has a computational complexity in terms of real multiplications of \(O((4M)^3)\) per frequency bin; hence, by considering \(B=\frac{R}{2}+1\), the total computational complexity of the NR stage is \(O(64BM^3)\). The NR stage of the rank2 MWF in SectionÂ 4.1 has a total computational complexity in terms of real multiplications of \(O(64B(M+1)^3)\). TableÂ 1 shows the computational complexity for the AFC stage and NR stage in terms of real multiplications of each of the presented algorithms. The algorithms are abbreviated as follows in the table descriptors. The cascade Mchannel rank1 MWF and PEMAFC algorithm is abbreviated as Rank1 NRAFC, the cascade \((M+1)\)channel rank2 MWF and PEMAFC algorithm as Rank2 NRAFC and the cascade Mchannel PEMAFC and rank1 MWF as AFCNR.
7 Simulation results
7.1 Scenario description
In order to assess the performance of the presented cascade algorithms, closedloop simulations were performed using the following three scenarios.

Scenario 1 consists of a 4microphone linear array with an intermicrophone distance of 10 cm and a loudspeaker which reproduces an amplified version of the desired speech source signal. The desired source is 25 cm away from the microphone array at an angle of \(0^{\circ }\). The loudspeaker is 1.4 m away from the microphone array at an angle of \(45^{\circ }\). Artificial impulse responses from the loudspeaker and the desired source to the microphones were generated using the randomized image method in [27], and the speech source signal was generated using a cascade of AR models. The signal generation using a cascade of AR models was performed by designing a 1024order lowpass filter with cutoff frequency of 0.9Â \(\pi\)rad/sample. Then, the linear prediction of order 30 was used on the lowpass filter coefficients to obtain the first stable AR model. The second model was designed by first choosing a central frequency \(f_{\text {cen}} = 689.1\,Hz\) and then the coefficients \(\textbf{a}_c\) were obtained as
$$\begin{aligned} a_{\textrm{order}}&= \textrm{round}\left( \dfrac{F_s}{f_{\textrm{cen}}}\right) ,\end{aligned}$$(72)$$\begin{aligned} \textbf{a}_c&= \left[ {1\quad \textbf{0}_{(a_{\textrm{order}}2) \times 1}\quad 0.1\quad 0.5\quad 0.1}\right] ^T \end{aligned}$$(73)where \(F_s\) is the sampling frequency, \(a_{\textrm{order}}\) is the order of the AR model. Results for different SNRs are shown.

Scenario 2 has the same setup as ScenarioÂ 1, however the source signal is replaced by a speech signal [28] and the reverberation time is set to 0.14Â s. Results for different SNRs are shown.

Scenario 3 consists of a 4microphone array with an intermicrophone distance of 10Â cm and a loudspeaker located diagonally (at an angle of approximately \(135^{\circ }\)) from it, which reproduces an amplified version of the desired signal. The desired source is in front of the microphone array, at an angle of approximately \(0^{\circ }\). Measured impulse responses [29] from the loudspeaker and the desired source to the microphones were used and the source signal was a speech signal [28]. The labels from [29] that represent the microphone positions are CMA20_90, CMA10_90, CMA10_90, and CMA20_90; similarly, the labels for the loudspeaker position and desired source position are SL5 and SL2, respectively. For exact coordinates and room description, the reader is referred to [29]. The results for different SNRs are shown. Although the reverberation time of these impulse responses is 0.5Â s, they were truncated to 0.31Â s which keeps most of the reverberant tail.
The loudspeaker signal in all scenarios was obtained by using the desired signal estimate \(\hat{d}(t)\), multiplied and delayed by the forward path gain and delay respectively. The noise added to the microphones in all scenarios was uncorrelated white noise. An oracle frequencydomain VAD was used and was computed using the desired source signal. This oracle VAD was obtained using the STFT representation of the desired speech signal. The average energy for each frequency bin was computed and used as a threshold for determining the speech activity in this frequency bin. For comparison, simulation results using the speech presence probability (SPP) function from [30] on the microphone signals are shown for scenarioÂ 2 and scenario 3. The original SPP function in [30] requires the complete knowledge of the signal, which is not feasible in an AFC scenario due to the closedloop system. Therefore, the SPP function was adapted to online processing by using as input signal the current frame and the previous 10 frames of the unprocessed microphone signal. The threshold for determining the presence of speech was set to 0.8. The window and impulse response length for each scenario are shown in TableÂ 2. The forward path gain profile used for scenario 1 is shown in Fig.Â 3 with \(K_{\textrm{MSG}}\) defined in SectionÂ 7.2.2. Similar forward path gain profiles were used for scenario 2 and scenario 3; however, the duration of the signals is different. The gain profile was chosen such that the noiseonly and speechplusnoise correlation matrices in the three algorithms could be updated while the system is stable, and then the gain is gradually increased to test the proposed algorithms. The forward path delay in the simulations depends on the window size used for both the WOLA and OLS procedures. In all simulations, the forward path delay was set to \(\frac{3R}{2}\). An Rsamples long rootsquared Hann window was used in the WOLA filterbank for the NR stage and an Rsamples long rectangular window was used in the OLS filterbank for the AFC stage.
7.2 Feedback cancelation performance measures
7.2.1 Misadjustment (Mis)
The Mis measure is defined as the normalized distance in dB between the true and estimated feedback path in the time domain. Alternatively, due to Parsevalâ€™s energy theorem, the Mis can be expressed in the frequency domain as [9]
where \(f^{(r)}(\kappa )\) is the true STFTdomain transfer function from the loudspeaker to the reference microphone. To compute this metric the impulse response was first truncated to the STFT length.
7.2.2 Added stable gain (ASG)
The ASG measure is based on the socalled maximum stable gain (MSG) which is the maximum gain achievable in the system without it becoming unstable. In a singlechannel scenario with a spectrally flat forward path, the MSG is given by [1]
where \(\mathcal {P}^{(r)}(l)\) is the set of frequencies that satisfy the phase condition of the Nyquist stability criterion [1] at the reference microphone. The ASG is then obtained as
where \(K_{\textrm{MSG}}\) is the MSG of the system when no feedback canceler is included, i.e., \(\hat{f}^{(r)}(\kappa ,l)=0 \; \forall \kappa ,l,\) in (75).
When a NR stage is included in the closedloop system, the expression in (75) can be modified to account for the NR filters. For this, the MSG is defined at a reference microphone as
where for an Mchannel NR stage \(\hat{f}^{\star (r)}(\kappa ,l)=\hat{f}^{(r)}(\kappa ,l)\) and \(f^{\star (r)}(\kappa ,l)\) is defined as
and for an \(M+1\)channel NR stage \(f^{\star (r)}(\kappa ,l)\) and \(\hat{f}^{\star (r)}(\kappa ,l)\) are
Then, the ASG can be computed as in (76), noting that \(K_{\textrm{MSG}}\) should be computed similarly to (77) with the initial value of \(\hat{\textbf{W}}\). For the simulations presented here \(\hat{\textbf{W}}\) was initialized with \({\left[ \begin{array}{cc}1 &{} 0 \\ 0 &{} 1 \\ \varvec{0}_{(M1)\times 1} &{} \varvec{0}\end{array}\right] }\). It should be noted that a random initialization is also possible.
7.2.3 Signal distortion (SD)
The SD gives an indication of the distortion of the processed signal. Unweighted and weighted SD measures have been used in the literature [8, 9, 31, 32] for different speech enhancement algorithms. The frequencyweighted SD is defined as in [8]
where \(\Phi _e(f,l)\) is the PSD of the estimated signal, \(\Phi _r(f,l)\) is the PSD of the reference signal, f is the frequency index in Hz, which can be related to \(\kappa\) as \(f=\frac{f_s\kappa }{R}\), with \(f_s\) being the sampling rate, and \(w_{ \textrm{ERB}}(f)\) is a weighting function which gives equal weight to each auditory critical band between \(f_l= 300\,Hz\) and \(f_h= 6400\,Hz\). For this metric, the estimated signal is \(\hat{d}(t)\) and the reference signal is \(H^{(r)}(q,t) s(t)\) (cfr. (5)). The measure is computed only during â€śspeechplusnoiseâ€ť periods and the average over all frames is presented.
7.3 Perceptual performance measures
For the perceptual assessment of the cascade algorithms presented in this paper, two metrics have been selected, namely, the PESQ and the STOI [9, 33, 34]. The PESQ measure is part of an International Telecommunications Union (ITU) Standard and widely used to objectively assess the perceptual quality of a speech signal. The STOI measure is a correlationbased speech intelligibility measure that works on the temporal envelopes of short speech frames. We used a MATLAB implementation of the STOI measure from [34] and the PESQ implementation from [35]. These metrics were chosen based on the results presented in [9] where objective metrics were compared to subjective evaluation results for AFC algorithms.
7.4 Closedloop simulations
Closedloop simulation results are presented in this section. For comparison, simulation results using the GSC algorithm from [14] are shown for all scenarios. Two noise references were used, and the loudspeaker signal was included as an extra noise reference. A recursive least squares (RLS) algorithm was used with a forgetting factor of 0.9999. The fixed beamformer and blocking matrix were selected as in [15], where the source is assumed to be in front. The algorithms are abbreviated in the legends and tables descriptors as mentioned in SectionÂ 6. The GSC from [14] using an RLS adaptive filter for the noise references is abbreviated as GSCRLS. The three proposed algorithms and the data for scenario 1 are available in [36].
First, the assumption that \(\hat{\textbf{R}}_{\mathbf {yyss}}\) in (12) can be modeled as a rank2 matrix is validated experimentally. A closedloop simulation was performed without the NR and AFC stages using ScenarioÂ 2. A fixed, random beamformer was used to combine the microphone signals. No noise was included in the microphone signals, \(\beta =0.9\), the forward path gain was set as in Fig.Â 3 with \(K_1=K_{\textrm{MSG}} 15\,dB\) and \(K_2=K_{\textrm{MSG}} 10\,dB\), the forward path delay was set to \(\frac{3R}{2}\) and \(R=\{512,1024,2048\} \text {samples}\). The speech correlation matrix \(\hat{\textbf{R}}_{\mathbf {yyss}}\) was computed in the closedloop system, and its eigenvalues (cfr.(40)) are plotted in Fig.Â 4 over time. It can be seen that for all R, there are two distinct eigenvalues, which validates the assumption of modeling \(\hat{\textbf{R}}_{\mathbf {yyss}}\) as a rank2 matrix. It is noted that as the forward path gain increases (after 6Â s) these two distinct eigenvalues get closer to each other. Similarly, as R decreases, the difference between these two eigenvalues and the others decreases. The reason for this is that the forward path delay also decreases, which is defined based on R.
FigureÂ 5 shows the ASG and Mis for three iSNRs for all algorithms using scenario 1. The iSNR was computed in the reference microphone before any processing of the microphone signals. In addition, the STOI and SD scores for each algorithm are shown in TableÂ 3. The forward path gain was set as in Fig.Â 3 with \(K_1=K_{\textrm{MSG}} 5\,dB\) and \(K_2=K_{\textrm{MSG}}+ 10\,dB\). For the GSCRLS, the gain was fixed at \(K_1=K_{\textrm{MSG}} 5\,dB\) to avoid unstability in the closedloop system. It is observed that both the Rank2 NRAFC and AFCNR increase the ASG and the Mis is reduced. Furthermore, the STOI and SD scores outperform those of the Rank1 NRAFC and GSCRLS for all iSNRs.
FigureÂ 6 shows the ASG and Mis for all algorithms using scenario 2. The STOI, PESQMOS, and SD scores are shown in TableÂ 4. The forward path gain was set as in Fig.Â 3 with \(K_1=K_{\textrm{MSG}} 5\,dB\) and \(K_2=K_{\textrm{MSG}}+ 10\,dB\) for all algorithms. Results for the Rank2 NRAFC and the AFCNR algorithm using the SPP function are also included. It can be seen that the Rank2 NRAFC and the AFCNR outperform the Rank1 NRAFC in terms of ASG and Mis. The GSCRLS also increases the ASG of the system, although when the forward path gain is reaching its maximum value, \(K_2\), the Mis starts to diverge, which makes the closedloop system unstable. Similarly to the results using scenario 1, the STOI, PESQMOS, and SD scores of both the Rank2 NRAFC and AFCNR algorithms when using an oracle VAD outperform those of the Rank1 NRAFC and the GSCRLS algorithms for all iSNRs. As expected, the inclusion of the SPP function decreases the performance of the Rank2 NRAFC and AFCNR algorithms due to poorer estimates of the correlation matrices.
FigureÂ 7 shows the ASG and Mis for all algorithms using scenario 3. The forward path gain was set as in Fig.Â 3 with \(K_1=K_{\textrm{MSG}} 5\,dB\) and \(K_2=K_{\textrm{MSG}}+ 10\,dB\). For the GSCRLS,Â \(K_1=K_{\textrm{MSG}} 10\,dB\) and \(K_2=K_{\textrm{MSG}} 5\,dB\). It can be seen that the ASG is increased for the Rank2 NRAFC and the AFCNR algorithms for all iSNRs. It is also observed that the Rank2 NRAFC and AFCNR decrease the Mis, however not as much as in scenarioÂ 1 and scenario 2. It is also noted that the GSCRLS increases the ASG until the forward path gain starts to increase, and then the system becomes unstable. The STOI and SD scores are presented in TableÂ 5. Both the Rank2 NRAFC and AFCNR outperform the Rank1 NRAFC and the GSCRLS algorithm for all iSNRs.
The observed high ASG values for the Rank2 NRAFC and AFCNR algorithms in scenarioÂ 1 and scenario 2 can be explained by the inclusion of the NR filters in the ASG computation (cfr. (78)â€“(80)) which means that the MWF also influences the stability of the system. The fluctuating ASG values for the Rank1 NRAFC algorithm mean that the system stability is not guaranteed. This has been confirmed both by the perceptual performance measures scores in TablesÂ 3 and 4Â and by the presence of howling in the resulting audio signals. Additionally, it should be noted that the SD scores in Tables 3, 4 and 5 for all algorithms are considerably higher than those reported in the literature [9]. The reason for this is the sensitivity of this metric to the presence of noise in the microphone signals, which distorts the signal. In the literature, most of the considered SNRs are around 30Â dB, which is considerably higher than the ones in this paper. Similarly, the STOI and PESQ metrics are low in all scenarios. This is due to the metrics being computed using the estimate of the desired speech component in the closedloop system. This means that all changes in the NR and AFC filters are reflected in the desired signal estimate. In scenario 3, the feedback path estimate is being undermodeled (cfr. TableÂ 2) which explains the low ASG values for all the algorithms. The estimated feedback path has a smoother frequency response than the true feedback path which can cause a magnitude difference in the ASG computation, resulting in a slowing increasing ASG. Similarly to scenarioÂ 1 and 2, the system is not stable when using the Rank1 NRAFC algorithm. The GSCRLS algorithm performs well whenever the forward path gain is not too close to the MSG; however, it should be noted that changes in the acoustic environment cannot be tracked using this algorithm due to the prior knowledge that is required.
8 Conclusions
Three cascade multichannel NR and AFC algorithms have been presented. Three different scenarios have been used to compare the performance of these algorithms in simulations. It is shown that both the cascade \((M+1)\)channel rank2 MWF and PEMAFC and the cascade Mchannel PEMAFC and rank1 MWF algorithms outperform the cascade Mchannel rank1 MWF and PEMAFC in terms of ASG and Mis. It is then shown in SectionÂ 7 that both the cascade \((M+1)\)channel rank2 MWF and PEMAFC and the cascade Mchannel PEMAFC and rank1 MWF are suitable to solve the combined AFC and NR problem in speech applications. It is also shown that by performing a rank2 approximation of the speech correlation matrix the feedback path can be correctly estimated when an NR stage precedes the AFC stage.
Availability of data and materials
The algorithms are publicly available at [36].
Notes
It is noted that u(t) may also add an additional noise component to \(x^{(m)}(t)\), cfr. (3).
References
T. van Waterschoot, M. Moonen, Fifty years of acoustic feedback control: state of the art and future challenges. Proc. IEEE 99(2), 288â€“327 (2011)
M. Guo, S.H. Jensen, J. Jensen, Evaluation of stateoftheart acoustic feedback cancellation systems for hearing aids. J. Audio Eng. Soc. 61(3), 125â€“137 (2013)
M. Guo, S.H. Jensen, J. Jensen, Novel acoustic feedback cancellation approaches in hearing aid applications using probe noise and probe noise enhancement. IEEE Trans. Audio Speech Lang. Process. 20(9), 2549â€“2563 (2012). https://doi.org/10.1109/TASL.2012.2206025
M. Guo, S.H. Jensen, J. Jensen, S.L. Grant, in Proc. 20th European Signal Process. Conf. (EUSIPCO â€™12). On the use of a phase modulation method for decorrelation in acoustic feedback cancellation. (2012).Â https://ieeexplore.ieee.org/abstract/document/6333787/authors#authors
H. Schepker, S.E. Nordholm, L.T.T. Tran, S. Doclo, Nullsteering beamformerbased feedback cancellation for multimicrophone hearing aids with incoming signal preservation. IEEE/ACM Trans. Audio Speech Lang. Process. 27(4), 679â€“691 (2019). https://doi.org/10.1109/TASLP.2019.2892234
F. Strasser, H. Puder, Adaptive feedback cancellation for realistic hearing aid applications. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2322â€“2333 (2015). https://doi.org/10.1109/TASLP.2015.2479038
A. Spriet, M. Moonen, I. Proudler, in Proc. 11th European Signal Process. Conf. Feedback cancellation in hearing aids: an unbiased modelling approach (2002), pp. 1â€“4
A. Spriet, M. Moonen, J. Wouters, Evaluation of feedback reduction techniques in hearing aids based on physical performance measures. J. Acoust. Soc. Amer. 128(3), 1245â€“1261 (2010)
G. Bernardi, T. van Waterschoot, J. Wouters, M. Moonen, Subjective and objective soundquality evaluation of adaptive feedback cancellation algorithms. IEEE/ACM Trans. Audio Speech Lang. Process. 26(5), 1010â€“1024 (2018)
J. Benesty, J. Chen, Y.A. Huang, S. Doclo, in Speech Enhancement. Study of the wiener filter for noise reduction (Springer, Berlin Heidelberg, 2005), pp. 9â€“41
J. Benesty, J.R. Jensen, M.G. Christensen, J. Chen, Speech enhancement: a signal subspace perspective (Elsevier, Oxford, 2014)
R. Serizel, M. Moonen, B. Van Dijk, J. Wouters, Lowrank approximation based multichannel wiener filter algorithms for noise reduction with application in cochlear implants. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 785â€“799 (2014)
D. Wang, J. Chen, Supervised speech separation based on deep learning: an overview. IEEE/ACM Trans. Audio Speech Lang. Process. 26(10), 1702â€“1726 (2018). https://doi.org/10.1109/TASLP.2018.2842159
A. Spriet, G. Rombouts, M. Moonen, J. Wouters, Combined feedback and noise suppression in hearing aids. IEEE Trans. Audio Speech Lang. Process. 15(6), 1777â€“1790 (2007). https://doi.org/10.1109/TASL.2007.896670
G. Rombouts, A. Spriet, M. Moonen, Generalized sidelobe canceller based combined acoustic feedback and noise cancellation. Signal Process. 88(3), 571â€“581 (2008). https://doi.org/10.1016/j.sigpro.2007.08.018
A. Bastari, S. Squartini, F. Piazza, in 2008 HandsFree Speech Communication and Microphone Arrays. Joint acoustic feedback cancellation and noise reduction within the prediction error method framework (2008). pp. 228â€“231. https://doi.org/10.1109/HSCMA.2008.4538728
G. Rombouts, T. van Waterschoot, K. Struyve, M. Moonen, Acoustic feedback cancellation for long acoustic paths using a nonstationary source model. IEEE Trans. Signal Process. 54(9), 3426â€“3434 (2006)
H. Schepker, S. Doclo, in Proc. 2019 IEEE Workshop Appls. Signal Process. Audio Acoust. (WASPAA â€™19),. Active feedback suppression for hearing devices exploiting multiple loudspeakers (2019), pp. 60â€“64. https://doi.org/10.1109/WASPAA.2019.8937187
M. Vashkevich, E. Azarov, N. Petrovsky, D. Likhachov, A. Petrovsky, in Proc. 2017 IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP â€™17). Realtime implementation of hearing aid with combined noise and acoustic feedback reduction based on smartphone (2017), pp. 6570â€“6571. https://doi.org/10.1109/ICASSP.2017.8005301
S. Ruiz, T. van Waterschoot, M. Moonen, Distributed combined acoustic echo cancellation and noise reduction in wireless acoustic sensor and actuator networks. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 534â€“547 (2022)
S. Ruiz, T. van Waterschoot, M. Moonen, in Proc. 2022 IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP â€™22). Cascade multichannel noise reduction and acoustic feedback cancellation (2022), pp. 676â€“680. https://doi.org/10.1109/ICASSP43922.2022.9747291
G. Bernardi, T. van Waterschoot, J. Wouters, M. Moonen, in Proc. 2015 IEEE Workshop Appls. Signal Process. Audio Acoust. (WASPAAâ€™ 15). An allfrequencydomain adaptive filter with PEMbased decorrelation for acoustic feedback control. (2015), pp. 1â€“5. https://doi.org/10.1109/WASPAA.2015.7336931
R. Crochiere, A weighted overlapadd method of shorttime Fourier analysis/synthesis. IEEE Trans. Acoust. Speech Signal Process. 28(1), 99â€“102 (1980)
Y. Avargel, I. Cohen, On multiplicative transfer function approximation in the shorttime Fourier transform domain. IEEE Signal Process. Lett. 14(5), 337â€“340 (2007)
F. Jabloun, B. Champagne, in Speech Enhancement. Signal subspace techniques for speech enhancement (Springer, Berlin Heidelberg, 2005), pp. 135â€“159
A. Bertrand, M. Moonen, Robust distributed noise reduction in hearing aids with external acoustic sensor nodes. EURASIP J. Adv. Signal Process. 2009, 1â€“14 (2009)
E. De Sena, N. Antonello, M. Moonen, T. van Waterschoot, On the modeling of rectangular geometries in room acoustic simulations. IEEE/ACM Trans. Audio Speech Lang. Process. 23(4), 774â€“786 (2015)
Bang, Olufsen, Music for archimedes. Compact Disc B &O (1992)
T. Dietzen, R. Ali, M. Taseska, T. van Waterschoot. MYRiAD: A multiarray room acoustic database. ESATSTADIUS Tech. Rep. TR 22118, KU Leuven, Belgium (submitted for publication) (2022)
T. Gerkmann, R.C. Hendriks, Unbiased mmsebased noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383â€“1393 (2012). https://doi.org/10.1109/TASL.2011.2180896
S. Gannot, I. Cohen, Speech enhancement based on the general transfer function GSC and postfiltering. IEEE Trans. Speech Audio Process. 12(6), 561â€“571 (2004)
R. Aichner, Acoustic blind source separation in reverberant and noisy environments. Ph.D. thesis, FriedrichAlexanderUniversitat ErlangenNurnberg (2007)
ITUT Rec. P.862, Perceptual evaluation of speech quality (pesq): An objective method for endtoend speech quality assessment of narrowband telephone networks and speech codecs. (International Telecommunication Union, Geneva, 2001)
C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, in Proc. 2010 IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP â€™10),. A shorttime objective intelligibility measure for timefrequency weighted noisy speech. (2010), pp. 4214â€“4217. https://doi.org/10.1109/ICASSP.2010.5495701
J. Donley, pesqmex (2017).Â https://github.com/ludlows/pesqmex.git.Â Accessed 12 Apr 2021
S. Ruiz, AFCNR (2022). https://github.com/rogaits/AFCNR
Acknowledgements
Not applicable.
Funding
This research work was carried out at the ESAT Laboratory of KU Leuven, in the frame of Research Council KU Leuven Project C31900221 â€śCooperative Signal Processing Solutions for IoTbased MultiUser Speech Communication Systems,â€ť Fonds de la Recherche Scientifique  FNRS, and the Fonds Wetenschappelijk Onderzoek  Vlaanderen under EOS Project no 30452698 â€™(MUSEWINET) MUltiSErvice WIreless NETworkâ€™ and the European Research Council under the European Unionâ€™s Horizon 2020 Research and Innovation Program/ERC Consolidator Grant: SONORA (no. 773268). This paper reflects only the authorsâ€™ views and the Union is not liable for any use that may be made of the contained information. The scientific responsibility is assumed by its authors.
Author information
Authors and Affiliations
Contributions
SR, TvW, and MM jointly developed the idea of using a (\(M+1\))channel data model in the multichannel Wiener filter formulation for combined acoustic feedback cancelation and noise reduction. SR, TvW, and MM jointly developed the research methodology to turn this concept into a usable and effective algorithm. SR, TvW, and MM jointly designed and interpreted the computer simulations. SR implemented the computer simulations. All authors contributed in writing the manuscript and further read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisherâ€™s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ruiz, S., vanÂ Waterschoot, T. & Moonen, M. Cascade algorithms for combined acoustic feedback cancelation and noise reduction. J AUDIO SPEECH MUSIC PROC. 2023, 37 (2023). https://doi.org/10.1186/s13636023002965
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13636023002965