A multichannel diffuse power estimator for dereverberation in the presence of multiple sources
 Sebastian Braun^{1}Email author and
 Emanuël A. P. Habets^{1}
https://doi.org/10.1186/s1363601500772
© Braun and Habets. 2015
Received: 17 August 2015
Accepted: 19 November 2015
Published: 4 December 2015
Abstract
Using a recently proposed informed spatial filter, it is possible to effectively and robustly reduce reverberation from speech signals captured in noisy environments using multiple microphones. Late reverberation can be modeled by a diffuse sound field with a timevarying power spectral density (PSD). To attain reverberation reduction using this spatial filter, an accurate estimate of the diffuse sound PSD is required. In this work, a method is proposed to estimate the diffuse sound PSD from a set of reference signals by blocking the direct signal components. By considering multiple plane waves in the signal model to describe the direct sound, the method is suitable in the presence of multiple simultaneously active speakers. The proposed diffuse sound PSD estimator is analyzed and compared to existing estimators. In addition, the performance of the spatial filter computed with the diffuse sound PSD estimate is analyzed using simulated and measured room impulse responses in noisy environments with stationary noise and nonstationary babble noise.
Keywords
1 Introduction
In speech communication scenarios, reverberation can degrade the speech quality and, in severe cases, the speech intelligibility [1]. Stateoftheart devices such as mobile phones, laptops, tablets, or smart TVs already feature multiple microphones to reduce reverberation and noise. Multichannel approaches are generally superior to singlechannel approaches, since they are able to exploit the spatial diversity of the sound scene.
In general, there exist several very different classes of dereverberation algorithms. Algorithms of the first class identify the acoustic system and then equalize it (cf. [1] and the references therein). Given a perfect estimate of the acoustic system described by a finite impulse response, perfect dereverberation can be achieved by applying the multiple input/output inverse theorem [2] (i.e., by applying a multichannel equalizer). However, this approach is not robust against estimation errors of the acoustic impulse responses. As a consequence, this approach is also sensitive to changes in the room and to position changes of the microphones and sources. For a single source, more robust equalizers were recently developed in [3, 4]. Additive noise is usually not taken into account. It should be noted that many multisource dereverberation algorithms also separate the speech signals of multiple speakers [5], which might not be necessary in some applications.
Algorithms of the second class are proposed, e. g., in [6–9], where the acoustic system was described using an autoregressive model. The approach proposed in [6] estimates the clean speech for a single source based on multichannel linear prediction by enhancing the linear prediction residual of the clean speech. In [7–9], the received signal is expressed using an autoregressive model and the regression coefficients are estimated from the observations. The clean speech is then estimated using the regression coefficients. While in [8, 9] multisource models were employed, the algorithm in [8] is evaluated only for a singletalk scenario. Linear predictionbased dereverberation algorithms are typically computationally complex and sensitive to noise. It is, for example, shown in [9] that the complexity and convergence time greatly increases with the number of sources.
Algorithms of the third class are used to compute spectral and spatial filters that can also be combined. Exclusively spectral filters are typically singlechannel approaches. While early reflections add spectral coloration and can even improve the speech intelligibility, late reverberation mainly deteriorates the speech intelligibility due to overlapmasking [10]. The majority of singlechannel dereverberation approaches aim at suppressing only late reverberation using spectral enhancement techniques as proposed in [11, 12] or more recently in [13, 14]. The late reverberant power spectral density (PSD) can be estimated using a statistical model of the room impulse response [15, 16]. The model parameters consist of the reverberation time and in some cases also the directtoreverberation ratio (DRR) and need to be known or estimated.
In the multichannel case, spatial or spectrospatial filters can achieve joint noise reduction and dereverberation, typically in a higher quality than singlechannel filters. Recently, an informed spatial minimum mean square error (MMSE) filter based on a multisource sound field model was proposed in [17]. The reverberation is modeled by a diffuse sound field with a highly timevarying PSD and known spatial coherence. The filter is expressed in terms of the model parameters which include time and frequencydependent direction of arrivals (DOAs) and the diffuse sound PSD. As these parameters can be estimated online almost instantaneously, the filter can quickly adapt to changes in the sound field. This spatial filter provides an optimal tradeoff between dereverberation and noise reduction and provides a predefined spatial response for multiple simultaneously active sources. The dereverberation performance is determined by the estimation accuracy of the diffuse sound PSD which is a challenging task because the direct sound and reverberation cannot be observed separately.
There exist already some techniques to estimate the late reverberant or diffuse sound PSD or the signaltodiffuse ratio (SDR), such as the singlechannel method based on Polack’s model, that requires prior knowledge about the reverberation time [11] or additionally the DRR [16]. Further suitable methods are the coherencebased SDR estimator proposed in [18] or a linearly constrained minimum variance (LCMV) beamformer placing nulls in the direction of direct sound sources while extracting the ambient sound [19]. In [20], we proposed a method to estimate the diffuse sound PSD using multiple reference signals, while we assumed at most one active source at a known position. In [21], a direct maximum likelihood estimate of the diffuse sound PSD given the observed signals was derived by assuming a noisefree signal model and using prior knowledge of the source position and the diffuse coherence. As the estimator presented in [21] considers only one sound source and no additive noise, we do not consider the estimator in the present work.
In this paper, the aim is to dereverberate multiple simultaneously active sources in the presence of noise without prior knowledge of the position of the sources. The processing is done in the shorttime Fourier transform (STFT) domain using the informed spatial filter presented in [17]. In this work, we derive a diffuse sound PSD estimator similar to the one presented in [20] but extended for multiple simultaneously active sources and analyze it in detail. In addition, the influence of the blocking matrix used to create the reference signals is investigated. The PSD estimator depends only on the narrowband DOAs and the noise PSD matrix that can be estimated in advance using existing techniques [22–25]. While we investigate the influence of estimation errors of the DOAs and the noise PSD, these estimators are beyond the scope of this paper. The proposed dereverberation and noise reduction solution is suitable for online processing as the estimators and filters use only current and past observations and the introduced latency depends only on the STFT parameters.
The paper is structured as follows. In Section 2, the signal model is introduced, the spatial filter is derived, and the problem is formulated. Section 3 reviews some existing estimators for the diffuse sound PSD for comparison and derives the proposed estimator. The diffuse sound PSD estimators and the dereverberation system are evaluated in Section 4, and conclusions are drawn in Section 5.
2 Problem formulation
2.1 Signal model
where θ _{ l }(k,n) is the DOA of the lth plane wave, r _{ m }=∥r _{ m }∥_{2}−∥r _{ref}∥_{2} is the signed distance between the microphone at position r _{ m } and the reference microphone at position r _{ref}, both given in cartesian coordinates, and \(\lambda (k) = 2\pi \frac {k f_{\mathrm {s}}}{N c}\) is the spatial frequency with N, f _{s}, and c being the STFT length, the sampling frequency, and the speed of sound, respectively.
Each of the L plane waves models a directional sound component, which are mutually uncorrelated. Due to the spectral sparsity of speech signals and the modeling of the plane waves independently per timefrequency instant, the number of modeled plane waves L does not have to match the number of physical broadband sound sources exactly. The reverberation is modeled by the diffuse sound component d(k,n). In principle, d(k,n) can contain also other nonstationary diffuse noise components such as babble speech that can be observed for example in a cafeteria. The signal component v(k,n) models stationary or slowly timevarying additive components such as sensor noise and ambient noise.
where Φ _{ x }(k,n) is the PSD matrix of the plane wave signals, Φ _{ d }(k,n) is the PSD matrix of the diffuse sound, and Φ _{ v }(k,n) denotes the noise PSD matrix. Since the L plane waves originate from uncorrelated plane waves, Φ _{ x }(k,n) is a diagonal matrix with the PSDs ϕ _{ l }(k,n)=E{X _{ l }(k,n)^{2}} on its main diagonal. Note that ϕ _{ l }(k,n) is the PSD, at the reference microphone, of the lth plane wave arriving from θ _{ l }(k,n).
where \(\text {sinc}(x) = \frac {\sin (x)}{x}\) for x≠0 and sinc(x)=1 for x=0.
where 1= [ 1,1,…1]^{T} is a vector of ones with size L×1. In the following section, we derive a spatial filter that is applied to y(k,n) to obtain an estimate of Z(k,n).
2.2 Spatial filter design
where vec{·} are the columns of a matrix stacked into a column vector and the L ^{2}×L matrix \(\mathbf {C} = \left [\operatorname {vec}\left \{\mathbf {a}_{1} \mathbf {a}_{1}^{\mathrm {H}}\right \}, \hdots, \operatorname {vec}\left \{\mathbf {a}_{L} \mathbf {a}_{L}^{\mathrm {H}}\right \}\right ]\). The L×1 vector obtained by (12) contains the estimated plane wave PSDs that are on the main diagonal of the matrix Φ _{ x }(k,n), and all offdiagonal elements are zero since we assume uncorrelated plane waves.
The remaining challenge is to estimate the interference PSD matrix Φ _{ u }(k,n). The stationary or slowly timevarying noise PSD matrix Φ _{ v }(k,n) is observable when the speakers are inactive and can be estimated using, e. g., [22–25]. In contrast, the diffuse sound PSD matrix Φ _{ d }(k,n) that originates from reverberation cannot be observed separately from the desired speech. Assuming that we know the spatial coherence of the diffuse sound field, our aim is to estimate the diffuse sound PSD ϕ _{d}(k,n). Given ϕ _{d}(k,n) and Γ _{diff}(k), we can then calculate Φ _{ d }(k,n) using (4).
3 Estimation of the diffuse sound PSD
In this section, we first review some estimators that can be used to obtain an estimate of the PSD of diffuse or reverberant sound and then derive a novel estimator that takes the presence of multiple plane waves as given by the signal model (1) into account.
3.1 Existing estimators
3.1.1 Based on a statistical reverberation model
3.1.2 Based on the spatial coherence
3.1.3 Based on an ambient beamformer
3.2 Discussion of the existing estimators

The estimator presented in Section 3.1.1 requires prior information about the frequencydependent reverberation time and DRR. In [34], it is shown that existing T _{60} estimators are strongly biased at low signaltonoise ratios (SNRs). Furthermore, T _{60} estimators typically require a few seconds of data and therefore cannot adapt quickly to changes in the reverberation time.

The singlesource model as assumed in the approach presented in Section 3.1.2 has been shown to be inaccurate in multitalk scenarios in [35].

The single and dualchannel approaches presented in Section 3.1.1 and 3.1.2 do not directly take all microphones into account.

The estimator presented in Section 3.1.3 is suboptimal as it aims not directly to estimate the diffuse sound PSD. Furthermore, it requires a specific look direction.
 1.
is able to respond immediately to changes in the sound field and is independent of the reverberation time and DRR,
 2.
is based on the multiwave signal model (1), and
 3.
directly estimates the diffuse sound PSD using all microphones.
3.3 Maximum likelihood estimator using reference signals
In this section, we derive an estimator for the diffuse sound PSD ϕ _{d}(k,n) based on multiple reference signals. In Section 3.3.1, the computation of the reference signals is described. In Section 3.3.2, a maximum likelihood estimator (MLE) for the diffuse sound PSD is derived based on the computed reference signals.
3.3.1 Generating the reference signals
where the matrices \(\widetilde {\boldsymbol {\Gamma }}_{\text {diff}}(\text {\textit {k,n}})\) and \(\widetilde {\boldsymbol {\Phi }}_{\mathbf {v}}(k,n)\) denote the diffuse coherence matrix and the noise PSD matrix at the output of the blocking matrix, respectively. The direct sound PSD is zero due to (20).
3.3.2 Derivation of the maximum likelihood estimator
where the max{·} operation is included to ensure that the estimated PSD is positive also in the presence of estimation errors. Although we excluded the imaginary diagonal elements, it can be shown that the result is mathematically equivalent to the solution obtained in [20].
3.4 Dereverberation system overview
4 Performance evaluation
For all simulations, the following parameters were used: a sampling frequency of f _{s}=16 kHz, a hamming window of length of N _{win}=32 ms, a FFT length of N=2N _{win}, a hop size of N _{hop}=0.25 N _{win} and recursive averaging for the online estimated PSD matrices with a time constant of 70 ms. The stationary noise PSD matrix was calculated in advance during periods of speech absence.
4.1 Analysis of the blocking matrices
A detailed evaluation of the eigenspace and sparse BM is given in [38]. There it is shown that for accurately estimated propagation vectors, the blocking ability of both BMs is in theory equal, but if the estimation accuracy is low, the blocking ability of the sparse BM is slightly lower compared to the eigenspace BM.
4.2 Estimation considering multiple waves
We now analyze the performance of the proposed diffuse PSD estimator while varying the number of estimated simultaneous arriving plane waves \(\hat {L}\) that might differ in practice from the actual number of directional sources L. For this experiment, four directional sound components are simulated. All source signals consist of independent white Gaussian noise, and the sources are randomly distributed around the array on the horizontal half plane with a random distance in the farfield of the array. The diffuse sound signals d(k,n) are generated using independent and identically distributed (i. i. d.) noise signals using the method proposed in [40]. The spatial coherence between the signals d(k,n) is chosen as the coherence of an ideal diffuse field (5) and are added with an SDR of 10 dB. The additive noise signals v(k,n) are simulated as well as i. i. d. processes with an SNR of 50 dB.
The soundfield is captured by a ULA of M=8 microphones with an intermicrophone spacing of 2 cm. In this experiment, the DOAs of the L directional sound sources are known and are successively taken into account plus one extra DOA to investigate the effect of overestimation of L, i.e., \(\hat L \in \{1,\hdots,L+1\}\). At the position of the extra DOA, no source is active. Note that the number of reference signals K, i.e., the length of vector \(\widetilde {\mathbf {u}}(k,n)\), decreases with an increasing number of plane waves \(\hat {L}\) taken into account.
where the ideal diffuse PSD is obtained as the spatial average of the instantaneous diffuse sound power over all microphones, i. e., ϕ _{d}(k,n)=d ^{H}(k,n)d(k,n)/M, and \((\text {\textit {n,k}}) \in \mathcal {T}\) is the set of timefrequency points, where the ideal diffuse PSD is above a certain threshold. The errors \(\text {LE}_{\mathrm {o}}(\hat {\phi }_{\mathrm {d}})\) and \(\text {LE}_{\mathrm {u}}(\hat {\phi }_{\mathrm {d}})\) are plotted on top of each other, such that the total bar height shows the total error \(\text {LE}(\hat {\phi }_{\mathrm {d}})\).
The estimation accuracy increases by increasing the number of directional constraints \(\hat L\) for the BM. When the number of DOAs exceeds the actual number of plane waves (\(\hat {L}>4\)), we observe no significant performance degradation. The eigenspace BM is slightly more suited for L=1, whereas the sparse BM performs slightly better for L>1. However, for unknown L, there is no significant performance difference between both tested BMs. In the remainder of this work, we use the eigenspace BM which has been found to be more robust against DOA estimation errors [38].
4.3 Robustness against estimation errors
The accuracy of the proposed estimator depends basically on two parameters. The estimated DOAs and the estimated noise PSD matrix. The performance of the DOA estimation is mainly degraded by strong reverberation and noise. The robustness in the presence of estimation errors is analyzed using two experiments.
4.4 Performance in timevarying diffuse noise fields
4.5 Comparison to existing diffuse PSD estimators
In this section, we evaluate the performance of the proposed diffuse PSD estimator and the three estimators described in Sections 3.1.1–3.1.3, denoted by LRSV, CSDRE, and ABF, respectively. A ULA of M=8 microphones with 2 cm spacing was simulated in a reverberant room of size 6 × 5 × 4 m with a T _{60}=500 ms using the wellknown image method [42]. Two speech sources are located at 20° and −45° from the broadside direction of the array at distances of 2.7 and 1.9 m, respectively. White noise was added with different levels, described by the iSNR.
The LRSV requires in addition to the noise PSD an estimate of the typically frequencydependent reverberation time (which is here almost frequency independent due to the simulated impulse responses), the DRR, and the start time of the late reverberation, which are here assumed to be known. Especially at low iSNRs, online estimates of these parameters are strongly biased and hard to obtain [34], which is not reflected in the evaluation in Fig. 8. Note that the DOAdependent approaches in this scenario use estimated DOAs without prior information and therefore contain estimation errors.
4.6 Performance of the overall system
In this section, we evaluate the performance of the complete dereverberation system described by (10) for different acoustic scenarios.
In the first experiment, one, two, or three speakers were active simultaneously. The first speech signal was obtained by concatenating 6 speech signals of about 20 s (3 male, 3 female) from the EBU SQAM database [43], and the second and third signals were obtained by permutation of the speakers. The sources were positioned at θ= {5°, −68°, 54°} at distances of {2.7 m, 1.9 m, 2.3 m} from the broadside direction of a ULA with M=8 and microphone spacing 1.75 cm. The room was again simulated by the image method with a T _{60}=500 ms. Uncorrelated white Gaussian noise was added with iSNR=40 dB. Either \(\hat L=1\) or \(\hat L=2\) DOAs were estimated per timefrequency instant using TLSESPRIT.
The performance is evaluated using four objective measures, namely, the perceptual evaluation of speech quality (PESQ) [44], the cepstral distance (CD) [45], the speechtoreverberation modulation ratio (SRMR) [46, 47], and the segmental signaltointerference ratio enhancement (ΔsegSIR) given in decibels. The desired reference signal for the objective measures is the sum of the direct signal components (7) plus early reflections up to 40 ms after the direct sound; the interference is calculated as the sum of stationary noise and the late reverberation after 40 ms.
Objective measures for simulated rooms, 1 active source
Method  \(\hat L\)  PESQ  CD  SRMR  ΔsegSIR [dB] 

Unprocessed    2.08  4.54  2.26  – 
MWF MLE  1  2.27  4.00  2.91  2.04 
2  2.22  4.06  2.96  1.88  
MWF LRSV  1  2.38  3.85  2.90  2.10 
2  2.35  3.85  2.95  1.99 
Objective measures for simulated rooms, 2 active sources
Method  \(\hat L\)  PESQ  CD  SRMR  ΔsegSIR [dB] 

Unprocessed    2.06  3.72  1.88  – 
MWF MLE  1  2.28  3.54  2.41  2.34 
2  2.25  3.39  2.46  2.20  
MWF LRSV  1  2.37  3.46  2.36  2.20 
2  2.34  3.22  2.43  2.16 
Objective measures for simulated rooms, 3 active sources
Method  \(\hat L\)  PESQ  CD  SRMR  ΔsegSIR [dB] 

Unprocessed    2.05  3.47  1.73  – 
MWF MLE  1  2.17  3.40  2.22  2.18 
2  2.15  3.26  2.26  1.98  
MWF LRSV  1  2.33  3.36  2.13  2.12 
2  2.31  3.07  2.19  2.05 
In terms of most performance measures, the LRSV slightly outperforms the MLE in Tables 1, 2, and 3. It should however be noted that the LRSV was computed using prior knowledge of the reverberation time and DRR.
Objective measures for measured rooms with 2 active sources and babble noise using \(\hat L=2\)
Room  Method  PESQ  CD  SRMR  ΔsegSIR 

[dB]  
M  Unprocessed  2.08  4.05  1.55  – 
MWF MLE  2.09  3.69  2.27  3.24  
MWF LRSV  2.27  3.51  2.27  2.98  
P  Unprocessed  2.22  3.54  1.59  – 
MWF MLE  2.28  3.13  2.11  4.57  
MWF LRSV  2.41  3.13  2.11  3.51 
5 Conclusions
We proposed a system for joint dereverberation and noise reduction for multiple simultaneously active desired direct sound plane waves. The system consists of an informed spatial filter that is computed using multiple DOAs per timefrequency bin and the PSD matrices of the diffuse sound and the noise. An estimator for the diffuse PSD was developed that uses a set of reference signals that are created by simultaneously blocking multiple active plane waves. The proposed estimator was compared to three existing estimators. The proposed estimator shows comparable or slightly more robust performance compared to all estimators under test except the wellestablished singlechannel LRSV estimator. However, the LRSV estimator was computed with prior knowledge of the reverberation time and DRR, which might be difficult to estimate in noisy environments and in scenarios where the source positions and the room characteristics change over time. The objective measures of the dereverberation system show a comparable performance by using the proposed estimator or the LRSV estimator.
Declarations
Acknowledgements
This research was partly funded by the GermanIsraeli Foundation for Scientific Research and Development (GIF).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 PA Naylor, ND Gaubitch (eds.), Speech Dereverberation (Springer, London, UK, 2010).Google Scholar
 M Miyoshi, Y Kaneda, Inverse filtering of room acoustics. IEEE Trans. Speech Audio Process. 36(2), 145–152 (1988).View ArticleGoogle Scholar
 I Kodrasi, S Doclo, in ICASSP. Robust partial multichannel equalization techniques for speech dereverberation (Kyoto, Japan, 2012).Google Scholar
 F Lim, PA Naylor, in ICASSP. Robust lowcomplexity multichannel equalization for dereverberation, (2013), pp. 689–693.Google Scholar
 Y Huang, J Benesty, J Chen, A blind channel identificationbased twostage approach to separation and dereverberation of speech signals in a reverberant environment. IEEE Trans. Speech Audio Process. 13(5), 882–895 (2005).View ArticleGoogle Scholar
 M Delcroix, T Hikichi, M Miyoshi, Dereverberation and denoising using multichannel linear prediction. Audio Speech Lang. Process. IEEE Trans. 15(6), 1791–1801 (2007).View ArticleGoogle Scholar
 T Nakatani, T Yoshioka, K Kinoshita, M Miyoshi, J BiingHwang, Speech dereverberation based on variancenormalized delayed linear prediction. IEEE Trans. Audio Speech Lang. Process. 18(7), 1717–1731 (2010).View ArticleGoogle Scholar
 T Yoshioka, T Nakatani, Generalization of multichannel linear prediction methods for blind MIMO impulse response shortening. IEEE Trans. Audio, Speech, Lang. Process. 20(10), 2707–2720 (2012).View ArticleGoogle Scholar
 M Togami, Y Kawaguchi, R Takeda, Y Obuchi, N Nukaga, Optimized speech dereverberation from probabilistic perspective for time varying acoustic transfer function. IEEE Trans. Audio, Speech, Lang. Process. 21(7), 1369–1380 (2013).View ArticleGoogle Scholar
 K Kokkinakis, PC Loizou, The impact of reverberant selfmasking and overlapmasking effects on speech intelligibility by cochlear implant listeners (L). J. Acoust. Soc. Am. 130(3), 1099–1102 (2011).View ArticleGoogle Scholar
 K Lebart, JM Boucher, PN Denbigh, A new method based on spectral subtraction for speech dereverberation. Acta Acoustica. 87, 359–366 (2001).Google Scholar
 EAP Habets, Single and multimicrophone speech dereverberation using spectral enhancement (PhD thesis, Technische Universiteit Eindhoven, 2007). http://alexandria.tue.nl/extra2/200710970.pdf.
 X Bao, J Zhu, An improved method for latereverberant suppression based on statistical models. Speech Commun. 55(9), 932–940 (2013).View ArticleGoogle Scholar
 S Mosayyebpour, M Esmaeili, TA Gulliver, Singlemicrophone early and late reverberation suppression in noisy speech. IEEE Trans. Audio Speech Lang. Process. 21(2), 322–335 (2013).View ArticleGoogle Scholar
 JD Polack, La transmission de l’énergie sonore dans les salles (PhD thesis, Université du Maine, Le Mans, France, 1988).Google Scholar
 EAP Habets, S Gannot, I Cohen, Late reverberant spectral variance estimation based on a statistical model. IEEE Signal Process. Lett. 16(9), 770–774 (2009).View ArticleGoogle Scholar
 O Thiergart, M Taseska, EAP Habets, An informed MMSE filter based on multiple instantaneous directionofarrival estimates, (Marrakesh, Morocco, 2013).Google Scholar
 O Thiergart, O Del Galdo, EAP Habets, in ICASSP. Signaltoreverberant ratio estimation based on the complex spatial coherence between omnidirectional microphones, (2012).Google Scholar
 O Thiergart, EAP Habets, in ICASSP. An informed LCMV filter based on multiple instantaneous directionofarrival estimates, (2013).Google Scholar
 S Braun, EAP Habets, in EUSIPCO. Dereverberation in noisy environments using reference signals and a maximum likelihood estimator (IEEE, 2013).Google Scholar
 A Kuklasinski, S Doclo, SH Jensen, J Jensen, in EUSIPCO. Maximum likelihood based multichannel isotropic reverberation reduction for hearing aids (Lisbon, Portugal, 2014), pp. 61–65.Google Scholar
 R Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9, 504–512 (2001).View ArticleGoogle Scholar
 T Gerkmann, RC Hendriks, Unbiased MMSEbased noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio, Speech, Lang. Process. 20(4), 1383–1393 (2012).View ArticleGoogle Scholar
 M Souden, J Chen, J Benesty, S Affes, An integrated solution for online multichannel noise tracking and reduction. IEEE Trans. Audio, Speech, Lang. Process. 19(7), 2159–2169 (2011).View ArticleGoogle Scholar
 M Taseska, EAP Habets, in IWAENC. MMSEbased blind source extraction in diffuse noise fields using a complex coherencebased a priori SAP estimator, (2012).Google Scholar
 F Jacobsen, T Roisin, The coherence of reverberant sound fields. J. Acoust. Soc. Am. 108, 204–210 (2000).View ArticleGoogle Scholar
 S Gergen, C Borss, N Madhu, R Martin, in Proc. IEEE Intl. Conf. on Signal Processing, Communication and Computing (ICSPCC). An optimized parametric model for the simulation of reverberant microphone signals (IEEE,Hong Kong, 2012), pp. 154–157.Google Scholar
 MS Brandstein, DB Ward (eds.), Microphone Arrays: Signal Processing Techniques and Applications (Springer, Berlin, Germany, 2001).Google Scholar
 Z Chen, GK Gokeda, Y Yu, Introduction to DirectionofArrival Estimation (Artech House, London, UK, 2010).Google Scholar
 TE Tuncer, B Friedlander (eds.), Classical and Modern DirectionofArrival Estimation (Academic Press, Burlington, USA, 2009).Google Scholar
 EAP Habets, Single and multimicrophone speech dereverberation using spectral enhancement (Ph.D. Thesis, Technische Universiteit Eindhoven, 2007).Google Scholar
 O Thiergart, G Del Galdo, EAP Habets, On the spatial coherence in mixed sound fields and its application to signaltodiffuse ratio estimation. J. Acoust. Soc. Am. 132(4), 2337–2346 (2012).View ArticleGoogle Scholar
 M Jeub, CM Nelke, C Beaugeant, P Vary, in EUSIPCO. Blind Estimation of the CoherenttoDiffuse Energy Ratio From Noisy Speech Signals (Barcelona, Spain, 2011).Google Scholar
 ND Gaubitch, HW Löllmann, M Jeub, TH Falk, PA Naylor, P Vary, M Brookes, in IWAENC. Performance Comparison of Algorithms for Blind Reverberation Time Estimation from Speech (Aachen, Germany, 2012).Google Scholar
 O Thiergart, EAP Habets, in IWAENC. Sound field model violations in parametric spatial sound processing, (2012).Google Scholar
 S Markovich, S Gannot, I Cohen, Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals. IEEE Trans. Audio, Speech, Lang. Process. 17(6), 1071–1086 (2009).View ArticleGoogle Scholar
 S MarkovichGolan, S Gannot, I Cohen, in IEEEI. A weighted multichannel Wiener filter for multiple sources scenario, (2012).Google Scholar
 MarkovichGolan, S, S Gannot, I Cohen, A sparse blocking matrix for multiple constraints GSC beamformer (IEEE, Kyoto, Japan, 2012).View ArticleGoogle Scholar
 HQ Dam, S Nordholm, HH Dam, SY Low, in AsiaPacific Conference on Communications. Maximum likelihood estimation and cramerrao lower bounds for the multichannel spectral evaluation in handsfree communication (IEEE,Perth, Australia, 2005).Google Scholar
 EAP Habets, I Cohen, S Gannot, Generating nonstationary multisensor signals under a spatial coherence constraint. J. Acoust. Soc. Am. 124(5), 2911–2917 (2008).View ArticleGoogle Scholar
 R Roy, T Kailath, ESPRIT  estimation of signal parameters via rotational invariance techniques. IEEE Trans. Acoust., Speech, Signal Process. 37, 984–995 (1989).View ArticleGoogle Scholar
 JB Allen, DA Berkley, Image method for efficiently simulating smallroom acoustics. J. Acoust. Soc. Am. 65(4), 943–950 (1979).View ArticleGoogle Scholar
 EB Union, Sound Quality Assessment Material Recordings for Subjective Tests. http://tech.ebu.ch/publications/sqamcd.
 ITUT, Perceptual Evaluation of Speech Quality (PESQ), an Objective Method for Endtoend Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs. International Telecommunications Union (ITUT), 2001.Google Scholar
 N Kitawaki, H Nagabuchi, K Itoh, Objective quality evaluation for low bitrate speech coding systems. IEEE J. Sel. Areas Commun. 6(2), 262–273 (1988).View ArticleGoogle Scholar
 T Falk, C Zheng, WY Chan, A nonintrusive quality and intelligibility measure of reverberant and dereverberated speech. IEEE Trans. Audio, Speech, Lang. Process. 18(7), 1766–1774 (2010).View ArticleGoogle Scholar
 JF Santos, M Senoussaoui, TH Falk, in IWAENC. An updated objective intelligibility estimation metric for normal hearing listeners under noise and reverberation (Antibes, France, 2014).Google Scholar