 Research
 Open Access
 Published:
Wind noise reduction for a closely spaced microphone array in a car environment
EURASIP Journal on Audio, Speech, and Music Processing volume 2018, Article number: 7 (2018)
Abstract
This work studies a wind noise reduction approach for communication applications in a car environment. An endfire array consisting of two microphones is considered as a substitute for an ordinary cardioid microphone capsule of the same size. Using the decomposition of the multichannel Wiener filter (MWF), a suitable beamformer and a singlechannel post filter are derived. Due to the known array geometry and the location of the speech source, assumptions about the signal properties can be made to simplify the MWF beamformer and to estimate the speech and noise power spectral densities required for the post filter. Even for closely spaced microphones, the different signal properties at the microphones can be exploited to achieve a significant reduction of wind noise. The proposed beamformer approach results in an improved speech signal regarding the signaltonoiseratio and keeps the linear speech distortion low. The derived post filter shows equal performance compared to known approaches but reduces the effort for noise estimation.
Introduction
Handsfree communication applications in a car environment always face the problem of unwanted noise components in the microphone signals. Commonly, singlechannel algorithms like the Wiener filter and spectral subtraction are used for noise suppression [1, 2]. Multichannel approaches are able to improve the speech quality further [3–6]. Considering more than one microphone, closely spaced microphones are often used in communication systems for signal augmentation by forming a differential microphone array [7–11]. This allows to create a directivitydependent beam pattern to augment a desired signal direction, while suppressing noise coming from other incident angles.
The use of microelectromechanical system (MEMS) microphones as a replacement for ordinary microphone capsules has gained interest in [12–14], especially for the application of directive beamforming [15, 16] due to its reduced size and cost compared with an ordinary microphone capsule. However, differential microphone arrays are not ideal in the presence of wind noise. The directional beam pattern may lead to a significant amplification of the wind noise due to the correlation properties of the noise terms [17]. The required firstorder lowpass filter for the equalization regarding the speech signal makes this behavior even worse. One proposed solution for a differential microphone array is to switch to a single microphone with an omnidirectional response if wind noise is detected [17].
Besides car noise, wind noise components often occur in handsfree communication applications in a car environment, caused by open windows, fans, or open convertible hoods that create airflow turbulence over the microphone membranes and result in low frequency signal components of high amplitude [18].
Noise reduction algorithms in car environments are typically based on the assumption that the noise is stationary or varies only slowly in time. In [19], Wilson et al. demonstrated that wind noise consists of local shorttime disturbances which are highly nonstationary. This makes the reduction of wind noise a challenging task. The suppression of wind noise is mostly covered in the context of digital hearing aids or mobile devices in the literature [17, 20, 21]. For singlechannel wind noise reduction, often the different power spectral density (PSD) properties of speech and wind noise are exploited [17, 20, 22]. Several other methods exist that aim to reduce wind noise for a single microphone [23–27].
The utilization of more than one microphone allows to take the diversity of the sound field into account to indicate wind noise and reduce it successfully. In [20], a spectral weighting filter based on the coherence between two microphones is proposed. The coherence is also used in [28], where in addition to the magnitude squared coherence (MSC) the information that relies on the phase component is applied to synthesize a spectral filter function.
In [29], the decomposition of the multichannel Wiener filter into a minimum variance distortionless response (MVDR) beamformer and a singlechannel Wiener post filter for an arbitrary microphone arrangement is presented. The approach is based on the assumption that the wind noise is uncorrelated at the microphones, while having equal noise power spectral densities, but arbitrary acoustic transfer functions (ATFs). From these assumptions follows for closely spaced microphones that a simple delayandsum (DS) beamformer achieves maximum signaltonoiseratio (SNR) beamforming, because equal ATFs from the speech source to the microphones can be assumed for low frequencies.
In this work, we propose a wind noise reduction approach for a closely spaced microphone array consisting of two MEMS microphones, which is considered as a substitute for an ordinary cardioid microphone capsule. The decomposition of the MWF in a beamformer and a singlechannel post filter is used similar to [29] as well as the assumption that the wind noise is uncorrelated at the microphones. But in contrast to [29], we assume that the noise powers at the microphones may differ. Since the geometry of the microphone array and the location of the desired speech source are known, additional assumptions about the speech and noise signal properties can be made to design a lowcomplexity wind noise reduction algorithm. Even for distances of only a few centimeters, the variation in the microphone signals can be used to reduce wind noise significantly. The coherence properties of speech and wind noise signals are exploited to form a beamformer, as well as to obtain estimates of the speech and noise PSDs for the post filter. Simulations with recorded wind noise show that the proposed approach improves the signaltonoiseratio, while keeping the linear distortion of the speech signal low.
The remainder of this paper is structured as follows. The signal model and the notation are briefly introduced in Section 2. In Section 3, the proposed wind noise reduction approach is presented. Simulation results are discussed in Section 4, followed by a conclusion in Section 5.
Signal model and notation
In the following, the signal model and the notation is briefly explained. We consider a linear MEMS microphone array, which is mounted in a car in front of the speaker’s seat in an endfire configuration. The acoustics in the car environment are considered as linear and time invariant. Using the the subsampled time index κ and the frequency bin index ν, the spectrum Y_{i}(κ,ν) of the ith microphone can be written in the shorttime frequency domain as
where X(κ,ν) corresponds to the shorttime spectrum of the speech signal. H_{i}(ν) denotes the acoustic transfer function, S_{i}(κ,ν)=H_{i}(ν)X(κ,ν) is the spectrum of the speech component, and N_{i}(κ,ν) is the spectrum of the noise at the ith microphone. For two microphones, the signals can be written as vectors
Vectors and matrices are written in bold, and scalars are normal letters. ^{T} denotes the transpose of a vector, ^{∗} denotes the complex conjugate, and ^{†} denotes the conjugate transpose.
We assume that the speech and noise signals are zeromean random processes with the shorttime power spectral densities \({\Phi _{N_{i}}^{2}}(\kappa,\nu)\) and \({\Phi _{S_{i}}^{2}}(\kappa,\nu)\) at the ith microphone. It is assumed that the speech and noise terms are uncorrelated. The noise correlation matrix can be expressed as
and similar the speech correlation matrix as
where \(\mathbb {E}\) denotes the mathematical expectation and \({\Phi _{X}^{2}}(\kappa,\nu)\) the PSD of the clean speech signal. Due to the shorttime PSD fluctuations, the PSDs are time and frequency dependent. However, for briefness, the indices (κ,ν) are often omitted in the following.
Wind noise reduction algorithm
In this section, the proposed noise reduction algorithm is derived. The filtering is only applied in the low frequency range which is affected by wind noise. It should be noted that the noise signal consists of wind as well as car noise components. However, in the presence of wind noise, the wind noise components are dominant at low frequencies. In the following, we consider only the nonstationary wind noise components at low frequencies and neglect the slowly varying driving noise. Such stationary noise components can be estimated and reduced by stateoftheart noise reduction approaches.
The proposed wind noise reduction approach is derived from the commonly used speech distortion weighted multichannel Wiener filter [3], which is defined as
where \({\tilde {H}}\) is the acoustic transfer function of an arbitrary chosen microphone channel. μ is a noise overestimation parameter which allows a tradeoff between noise reduction and speech distortion. The output signal Z_{MWF} of the Wiener filter is obtained by
In [30, 31], it is shown that G^{MWF} can be decomposed into an MVDR beamformer
and a singlechannel Wiener post filter
as
The term γ^{out} is the narrowband SNR at the beamformer output which is defined as
where tr(·) denotes the trace operator. We exploit this decomposition for the proposed wind noise reduction. Firstly, we derive a beamformer for the considered microphone setup.
Beamformer
In the following, we consider timealigned signals where the alignment compensates the different times of arrival for the speech signal. This is achieved by delaying the front microphone with a suitable sample delay τ to be in phase with the rear microphone,
where L denotes the block length of the shorttime Fourier transform. After this alignment, we assume that the ATFs in H are identical, because the low frequency speech components have a large wavelength compared with the microphone distance.
which leads to the speech correlation matrix depending only on the PSD of the speech signal at one of the microphones
Furthermore, it can be assumed that the wind noise terms for both microphone signals are uncorrelated even for small distances of the microphones [28, 32]. This simplifies the noise correlation matrix as well as its inverse since the crossterms can be neglected
The numerator term of the G^{MVDR} in (10) can be written as
and the denominator as
Since H is not known, it is set to H=1. This results in the minimum variance (MV) beamformer coefficients
which can be interpreted as a noisedependent weighting of the input signals. Note that the MV beamformer achieves the same narrowband output SNR as the MVDR beamformer but no distortionfree response [5]. Finally, the output of the beamformer can be written as
Using (17) and (18), we are able to calculate the narrowband output SNR of the beamformer as
where \({\Phi _{N_{\text {beam}}}^{2}}\) denotes the noise PSD at the beamformer output. This PSD can be calculated as
Special cases
In the following, we consider some special cases for the beamformer derived in (22). Assuming \({\Phi _{N_{1}}^{2}} = {\Phi _{N_{2}}^{2}}\) and uncorrelated noise terms as in [29], then \({G^{MV}_{i}}\) reduces to the simple weighting of a delayandsum beamformer (a simple summing of the aligned signals)
which results in the output signal
A delayandsum beamformer is also proposed in [17] for closely spaced microphones with wind noise.
We keep the condition of uncorrelated noise terms and assume a special case where the shorttime noise PSDs are varying over time and frequency. This is motivated by the highly nonstationary local shorttime wind noise disturbances [19] and implies that only one microphone is affected by wind noise at any given time and frequency index κ and ν
or
Then, the noise PSDdependent weighting in (21) reduces to a selection approach of the dedicated frequency bins by comparing the shorttime PSDs of the microphone signals \({\Phi _{Y_{i}}^{2}}\), because the speech signal PSDs \({\Phi _{S_{i}}^{2}}\) are assumed to be identical for both microphones. Therefore, the resulting output signal Y_{FBS} can be written as
PSD estimation
Next, we derive estimates for the speech and noise PSDs which are required for the beamformer and post filter. As mentioned in [29], most singlechannel noise estimation procedures (i.e., [33–35]) rely on the assumption that the noise signal PSDs are varying more slowly in time than the speech signal PSD. This is not the case for wind noise. The fast varying shorttime PSDs make noise estimation a challenging task for a single microphone. However, using more than one microphone, the different correlation properties for speech and wind noise can be used for the estimation.
A reference for the wind noise can be obtained by exploiting the fact that the wind noise components in the two microphones are incoherent while the speech components are coherent. To block the speech signal, a delayandsubtract approach is used to obtain a noise reference
which depends only on incoherent wind noise terms. The PSD of this noise reference is
The crossterms vanish, because the wind noise terms are uncorrelated. Hence, we obtain
Note that the delayandsubtract signal in (30) is used in other applications as the output of a differential microphone array [17]. Obviously, this is not suitable for microphone positions that are sensitive to wind noise, because the noise terms are heavily amplified.
By summing the aligned signals according to (26), we augment coherent signal components. The combined signal Y_{DS} has the PSD
Again, the noise crossterms vanish and we obtain
Combining (35) and (40) yields the PSD of the clean speech signal
and the noise PSD at the ith microphone
Note that this derivation only holds for uncorrelated noise terms. \({\Phi _{S}^{2}}\) may still contain correlated noise. However, we neglect the correlated driving noise as stated at the beginning of this section. In contrast to Zelinskis post filter [36], which also assumes zero correlation between the microphone signals, we assume the shorttime noise PSDs to be different \(\left ({\Phi _{N_{1}}^{2}} \neq {\Phi _{N_{2}}^{2}}\right)\).
Post filter
As described in (12), the beamformer is followed by a singlechannel Wiener post filter to achieve additional noise suppression. We use the post filter
with the SNR estimate
That is, the noise PSD is estimated according to (35) instead of (23), because this estimate showed a better performance in the simulations regarding SNR and speech distortion. Note that \({\Phi _{N}^{2}}\geq {\Phi _{N_{\text {beam}}}^{2}}\) holds, with equality if \({\Phi _{N_{1}}^{2}}={\Phi _{N_{2}}^{2}}\). Hence, the noise estimation in (44) results in an overestimation of the noise power if the shorttime PSDs at the microphones vary. This is similar to using an overestimation parameter μ>1.
Finally, the output of the complete wind noise reduction algorithm is
This wind noise reduction algorithm is only applied for frequencies below a cutoff frequency f_{c}, because wind noise mostly contains low frequency components and the assumptions about the signal properties are only valid for low frequencies. Figure 1 shows the block diagram of the signal processing structure.
Simulation results
In the following, simulation results for the algorithm proposed in Section 3 are presented for wind noise in a car. For the signal measurements, a linear MEMS microphone array in an endfire configuration was mounted above the sun visor at the driver seat position. To investigate varying microphone distances, an array with four sensors was used. The microphone distances were 7.1, 14.3, and 21.4 mm.
The noise recordings and the speech recordings were done separately and mixed in the simulation. For the noise recordings, the driving speed was 100 km/h and both front windows at the driver side as well as the codriver side were completely open to allow a turbulence airflow over the MEMS array. The speech signals for testing were four ITU speech signals convolved with the impulse responses, which were measured from the mouth reference point of an artificial head (HMS II.5 from HEAD acoustics) at the driver’s position to the MEMS array microphones.
For the simulations, a sampling rate fs=16 kHz and an fast Fourier transform (FFT) size of 512 samples was used. The FFT shift was 128 samples, and each block was windowed before it was transformed into the frequency domain. The cutoff frequency f_{c} was set to 1 kHz.
As quality measures, we consider the segmental signaltonoise ratio (SSNR), the log spectral distance (LSD), as well as shorttime objective intelligibility measure (STOI) as described in [37]. The STOI is a metric for speech intelligibility.
It should be noted that the SSNR and LSD measures are calculated for the frequency region below the cutoff frequency f_{c} since the frequency region above f_{c} is not affected by the proposed wind noise reduction approach. Therefore, the signals are transformed back into the time domain and are lowpass filtered to calculate the SSNR and LSD values. The STOI is calculated over the complete frequency range with 15 thirdoctave bands.
The LSD measures the linear speech distortion and is calculated as the average logarithmic spectral distance of two PSDs. These are the signals under test, i.e., the speech component of the filtered output signal and the clean speech reference X. The PSDs are calculated over all speech active blocks using an ideal voice activity detector. For further details regarding the LSD calculation, we refer to [38].
The SSNR is calculated based on [39]. However, we calculate the SSNR by the ratio of the signal energy of the speech and the noise components in speech active frames as
\(\tilde {s}(k)\) and \(\tilde {n}(k)\) are the speech and noise components at the output of the dedicated noise reduction approach in the time domain. k is the time index, M is the frame length, R is the frame shift, and K is the total number of considered frames. The frame length was 512 samples, and the frame shift was 256 samples. The SSNR values are limited between −10 and 35 dB.
Car noise, which is also present in the microphone signals, is not considered in our algorithm. Thus, the SSNR improvements in absolute value can be lower compared with measured noise signals which contain wind noise only.
Coherence properties
Figure 2 shows the results of the magnitude squared coherence calculation of speech and noise for varying microphone distances. The magnitude squared coherence for two signals u_{1}(k) and u_{2}(k) is calculated as
where U_{1} and U_{2} denote the corresponding shorttime spectra. The mathematical expectation values of the input signals are estimated by the Welch periodogram using recursive smoothing. A very high smoothing factor of 0.9995 was chosen to average over many signal frames. An MSC value close to one means the signals are highly correlated, whereas a value close to zero indicates that the signals are uncorrelated.
As can be observed, the assumption that noise is uncorrelated while speech is highly correlated is fulfilled for frequencies below 600 Hz for all microphone distances, which justifies the assumptions made in Section 3.
Beamformer output
In Table 1, the SSNR gain of the beamformer output is compared with a single microphone. This comparison is considered, because the approach in [17] suggest to switch from a differential microphone array to a single omnidirectional microphone if wind noise is detected. The SSNR of the single microphone is 2.14 dB. For further comparison, the results of the delayandsum beamformer Y_{DS} are shown, which is the summing of the aligned signals as described in (26) (and also proposed in [17] for combining of wind noiseaffected signals). Also, the output of a frequency bin selection (Y_{FBS}) approach as stated in (29) is examined. The noise estimates in (42), as derived in Section 3.3, are used for the beamformer. Moreover, the ideal noise PSDs are used to get a benchmark. Since the noise signals where recorded separately for the simulations, the ideal noise PSDs are obtained by using the noise only signals for the PSD calculation. The PSDs are calculated by the Welch periodogram using recursive smoothing. However, the shorttime recursive PSD smoothing was omitted, because this achieved the best results due to the high nonstationarity of the wind noise.
As can be observed, all beamformer approaches are able to improve the SSNR in the considered frequency region compared with a single microphone, where all SNR gains are getting larger as the distance between the microphones is increased. It is interesting to see that the delayandsum approach Y_{DS} has the worst performance for all microphone distances, whereas the frequency bin selection approach shows results similar to the MV beamformer. This indicates that the shorttime PSDs at the microphones vary heavily. Comparing the performance with estimated noise PSDs with that of the beamformer with the actual noise PSDs, we observe that the results regarding the SSNR are similar, i.e., the PSD estimates are sufficiently accurate.
Post filter output
Now, the SSNR as well as the LSD for the complete MWF including the post filter (as derived in (46)) are examined. To compare the post filter of (43) with other approaches, a wind noise reduction filter by Franz et al. [20] that defines a filter function based on the magnitude squared coherence is used as a reference. The proposed post filter in (43) as well as the post filter derived in [20] are applied to the beamformer output Y_{MV} which uses the noise estimates. As can be seen in Table 2, the SSNR can be further improved while keeping the speech distortion below 1 dB compared with the single microphone signal Y_{1}.
For the post filter comparison, the noise overestimation parameter μ was set to achieve a similar LSD value as the post filter in [20]. The shorttime PSDs used for the post filter, as well as the calculated MSC needed for the filter design in [20], were recursively smoothed by the same factor of 0.85 to make a fair comparison. As can be seen, both post filters are able to achieve the same noise reduction.
Table 2 also contains values for the STOI. The STOI is closely related to the percentage of correctly understood words averaged across a group of users. The maximum STOI value is one and larger values indicate better speech intelligibility. The noisy speech signals are compared with the time domain signal of the clean speech X. It can be seen in Table 2 that the STOI is increased for the beamformer output Y_{MV} compared with the single microphone Y_{1}. The results indicate that additional post filtering improves the STOI, where the post filters obtain similar STOI values.
Figure 3 shows the spectrogram for the omnidirectional reference microphone, as well as the output Z of our proposed wind noise reduction algorithm with a microphone distance of 21.4 mm. It can be observed that the high energetic noise terms in the low frequencies are successfully suppressed. Above 600 Hz the noise reduction is not as strong, i.e., the assumptions for the wind noise signal properties with this noise recording are only valid for frequencies below 600 Hz (cf. Fig. 2).
Wind noise only scenario
Finally, the wind noise reduction is considered in a scenario containing only wind noise and no driving noise. The SSNR of the single microphone Y_{1} is 4.86 dB in this scenario. The results can be seen in Table 3. Again, the beamformer output Y_{MV} with noise estimation is used with both post filter approaches as in Section 4.3. All parameters except for the overestimation parameter are the same. The table contains results for two different values of the overestimation parameter for the Wiener post filter in order to demonstrate the tradeoff between speech distortion and noise reduction. With μ=8, the Wiener filter and the filter from [20] obtain similar performance values. Reducing the overestimation parameter to μ=1 also reduces the SNR gain, but results in better LSD and STOI values. Comparing the results with the gains in Table 2, the achieved SSNR values are higher due to the absence of the driving noise.
Figure 4 shows the spectrogram of the output Z for the wind noise only scenario. The noise is significantly reduced over a wide frequency range. Since the coherent driving noise terms are not present in this scenario, noise reduction can also be observed for frequencies above 600 Hz.
Conclusions
In this paper, a wind noise reduction approach for a compact endfire array was examined. Based on the decomposition of the MWF, a beamformer and a post filter were derived. Due to the known geometry of the MEMS microphone array in endfire configuration and knowledge about the position of the speech source, assumptions about the signal properties of the speech and wind noise components were made. The acquired estimates of the PSDs for the wind noise as well as the speech signals are used to design a beamformer as well as a post filter for wind noise reduction. The simulations based on noise recordings in a car environment show that a significant wind noise reduction is possible while keeping the speech distortion low.
Further investigations should be made to combine the proposed wind noise reduction approach with the reduction of car noise. The driving noise is neglected in our study. The compact microphone array can be part of an array of more widely spaced microphones, where the spatial diversity of the sound field can be exploited for further noise reduction. Since the nonstationary noise terms are mostly reduced with the proposed approach, stateoftheart noise estimation procedures can be chosen that rely on the assumption that the driving noise is only slowly varying.
Wind noiseinduced disruptions are a commonly known problem with differential beamforming, e.g., with the closely spaced microphone arrangements in hearing aids [17]. Hence, the proposed noise reduction approach may also be applicable for hearing aids.
Abbreviations
 ATF:

Acoustic transfer function
 DS:

Delayandsum
 FFT:

Fast Fourier transform
 LSD:

Log spectral distance
 MEMS:

Microelectromechanical system
 MSC:

Magnitude squared coherence
 MV:

Minimum variance
 MVDR:

Minimum variance distortionless response
 MWF:

Multichannel Wiener filter
 PSD:

Power spectral density
 SNR:

Signaltonoiseratio
 SSNR:

Segmental signaltonoiseratio
 STOI:

Shorttime objective intelligibility measure
References
P Vary, R Martin, Digital Speech Transmission: Enhancement, Coding and Error Concealment (Wiley, Chichester, 2006).
E Hänsler, G Schmidt, Acoustic Echo and Noise Control: A Practical Approach (Wiley, New Jersey, 2004).
S Doclo, A Spriet, M Moonen, J Wouters, in Speech Enhancement. Speech distortion weighted multichannel Wiener filtering techniques for noise reduction (SpringerBerlin, 2005). Chap. 9. https://doi.org/10.1007/3540274898_9.
S Doclo, A Spriet, J Wouters, M Moonen, Frequencydomain criterion for the speech distortion weighted multichannel Wiener filter for robust noise reduction. Speech Comm.49(78), 636–656 (2007). https://doi.org/10.1016/j.specom.2007.02.001.
S Stenzel, J Freudenberger, Blind matched filtering for speech enhancement with distributed microphones. J. Electr. Comput. Eng.2012:, 636 (2012). Article ID 169853.
T Matheja, M Buck, T Fingscheidt, A dynamic multichannel speech enhancement system for distributed microphones in a car environment. EURASIP J. Adv. Signal Proc.2013: (2013).
J Benesty, C Jingdong, Study and Design of Differential Microphone Arrays (Springer, Berlin, 2013).
GW Elko, Differential microphone arrays. In: Y Huang, J Benesty. (eds) Audio Signal Processing for NextGeneration Multimedia Communication System (Springer, Boston, 2004).
H Teutsch, GW Elko, in International Workshop on Acoustic Signal Enhancement. First and secondorder adaptive differential microphone arrays, (2001), pp. 35–38.
J Benesty, M Souden, Y Huang, A perspective on differential microphone arrays in the context of noise reduction. IEEE Trans. Audio, Speech, Lang. Process.20(2), 699–704 (2012). https://doi.org/10.1109/TASL.2011.2163396.
GW Elko, Microphone array systems for handsfree telecommunication. Speech Commun.20(3), 229–240 (1996). https://doi.org/10.1016/S01676393(96)00057X. Acoustic Echo Control and Speech Enhancement Techniques.
M Turqueti, J Saniie, E Oruklu, in 2010 53rd IEEE International Midwest Symposium on Circuits and Systems. MEMS acoustic array embedded in an FPGA based data acquisition and signal processing system, (2010), pp. 1161–1164. https://doi.org/10.1109/MWSCAS.2010.5548866.
I Hafizovic, CIC Nilsen, M Kjølerbakken, V Jahr, Design and implementation of a MEMS microphone array system for realtime speech acquisition. Appl. Acoust.73(2), 132–143 (2012). https://doi.org/10.1016/j.apacoust.2011.07.009.
J Tiete, F Domínguez, Bd Silva, L Segers, K Steenhaut, A Touhafi, Soundcompass: a distributed mems microphone arraybased sensor for sound source localization. Sensors. 14(2), 1918–1949 (2014). https://doi.org/10.3390/s140201918.
G Elko, Small directional microelectromechanical systems (MEMS) microphone arrays. Proc. Meet. Acoust.19(1), 030033 (2013). https://doi.org/10.1121/1.4799608. http://asa.scitation.org/doi/pdf/10.1121/1.4799608.
A Palla, L Fanucci, R Sannino, M Settin, in 2015 10th International Conference on Design Technology of Integrated Systems in Nanoscale Era (DTIS). Wearable speech enhancement system based on MEMS microphone array for disabled people, (2015), pp. 1–5. https://doi.org/10.1109/DTIS.2015.7127384.
JW Kates, Digital Hearing Aids (Plural Publishing, San Diego, 2008).
S Bradley, T Wu, S von Hünerbein, J Backman, in Audio Engineering Society Convention 114. The mechanisms creating wind noise in microphones, (2003).
DK Wilson, MJ White, Discrimination of wind noise and sound waves by their contrasting spatial and temporal properties. Acta Acustica United Acustica. 96(96), 991–1002 (2010).
S Franz, J Blitzer, in International Workshop on Acoustic Signal Enhancement (IWAENC). Multichannel algorithms for wind noise reduction and signal compensation in binaural hearing aids, (2010).
CM Nelke, P Vary, in International Workshop on Acoustic Signal Enhancement (IWAENC). Measurement, analysis and simulation of wind noise signals for mobile communication devices, (2014).
CM Nelke, N Chatlani, C Beaugeant, P Vary, in IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP). Single microphone wind noise PSD estimation using signal centroids, (2014).
S Kuroiwa, Y Mori, S Tsuge, M Takashina, F Ren, in International Conference on Communication Technology. Wind noise reduction method for speech recording using multiple noise templates and observed spectrum fine structure, (2006).
B King, L Atlas, in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC). Coherent modulation comb filtering for enhancing speech in wind noise, (2008).
E Nemer, W Leblanc, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Singlemicrophone wind noise reduction by adaptive postfiltering, (2009).
C Hofman, T Wolff, M Buck, T Haulik, W Kellermann, in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC). A morphological approach to singlechannel windnoise suppression, (2012).
CM Nelke, N Nawroth, M Jeub, C Beaugeant, P Vary, in Proceedings of European Signal Processing Conference (EUSIPCO). Single microphone wind noise reduction using techniques of artificial bandwidth extension, (2012).
CM Nelke, P Vary, in Proceedings of Speech Communications  11. ITG Symposium. Dual microphone wind noise reduction by exploiting the complex coherence, (2014).
P Thüne, G Enzner, in ITG Conference on Speech Communication. Maximumlikelihood approach to adaptive multichannelWiener postfiltering for windnoise reduction, (2016).
KU Simmer, J Bitzer, C Marro, in Microphone Arrays: Signal Processing Techniques and Applications, ed. by MS Brandstein. Postfiltering techniques (SpringerBerlin Heidelberg, 2001), pp. 39–60.
KU Simmer, J Bitzer, in Jahrestagung für Akustik (DAGA), Aachen. Multimicrophone noise reduction — theoretical optimum and practical realization, (2003).
GM Corcos, The structure of the turbulent pressure field in boundarylayer flows. J. Fluid Mech.18(3), 353–378 (1964). https://doi.org/10.1017/S002211206400026X.
R Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process.9:, 504–512 (2001).
J Freudenberger, S Stenzel, B Venditti, in Proc. European Signal Processing Conference (EUSIPCO), Glasgow. Spectral combining for microphone diversity systems, (2009), pp. 854–858.
J Freudenberger, S Stenzel, in IEEE Workshop on Statistical Sig. Proc. (SSP). Timefrequency dependent voice activity detection based on a simple threshold test (IEEENice, 2011).
R Zelinski, in ICASSP88., International Conference on Acoustics, Speech, and Signal Processing. A microphone array with adaptive postfiltering for noise reduction in reverberant rooms, (1988), pp. 2578–25815. https://doi.org/10.1109/ICASSP.1988.197172.
CH Taal, RC Hendriks, R Heusdens, J Jensen, An algorithm for intelligibility prediction of timefrequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process.19(7), 2125–2136 (2011). https://doi.org/10.1109/TASL.2011.2114881.
PA Naylor, ND Gaubitch, Speech Dereverberation, 1st edn. (Springer, London, 2010).
K Kondo, Subjective Quality Measurement of Speech (Springer, Berlin, 2012).
Acknowledgements
We thank the Daimler AG, Department Enabling Technologies for Communication, Ulm, for providing the measurement data.
Availability of data and materials
The measurement data was used by courtesy of Daimler AG. It is not available for public access.
Authors’ information
Simon Grimm (SG) is a member of the signal processing group at the Institute for System Dynamics at the HTWG Konstanz since 2014. His work is primarily concerned with the development of signal processing algorithms for multichannel noise reduction approaches in noisy acoustic environments. He received his B. Eng. in 2012 and his M. Eng. in 2014.
Dr. Jürgen Freudenberger (JF) is a professor at the HTWG Konstanz since 2006, where he is the head of the signal processing group at the Institute for System Dynamics. His work is primarily concerned with the development of algorithms in the field of signal processing and coding for reliable data transmission as well as efficient algorithm implementation for hardware and software.
Author information
Authors and Affiliations
Contributions
Both authors developed the idea of the proposed algorithm. JF initiated the theoretical description, while SG implemented the algorithm and refined it in the simulations. The simulations and a majority of the manuscript writing were done by SG, while JF supervised the simulations and helped in improving the text. Both authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Grimm, S., Freudenberger, J. Wind noise reduction for a closely spaced microphone array in a car environment. J AUDIO SPEECH MUSIC PROC. 2018, 7 (2018). https://doi.org/10.1186/s136360180130z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s136360180130z
Keywords
 MVDR beamforming
 Multichannel Wiener filter
 Wind noise reduction
 Speech signal processing
 MEMS microphones