Using information theoretic distance measures for solving the permutation problem of blind source separation of speech signals
 Eugen Hoffmann^{1}Email author,
 Dorothea Kolossa^{1},
 BertUwe Köhler^{1} and
 Reinhold Orglmeister^{1}
https://doi.org/10.1186/16874722201214
© Hoffmann et al; licensee Springer. 2012
Received: 31 October 2011
Accepted: 3 April 2012
Published: 3 April 2012
Abstract
The problem of blind source separation (BSS) of convolved acoustic signals is of great interest for many classes of applications. Due to the convolutive mixing process, the source separation is performed in the frequency domain, using independent component analysis (ICA). However, frequency domain BSS involves several major problems that must be solved. One of these is the permutation problem. The permutation ambiguity of ICA needs to be resolved so that each separated signal contains the frequency components of only one source signal. This article presents a class of methods for solving the permutation problem based on information theoretic distance measures. The proposed algorithms have been tested on different realroom speech mixtures with different reverberation times in conjunction with different ICA algorithms.
Keywords
blind source separation independent component analysis permutation problem1 Introduction
Blind source separation (BSS) is a technique of recovering the source signals using only observed mixtures when both the mixing process and the sources are unknown. Due to a large number of applications for example in medical and speech signal processing, BSS has gained great attention. This article considers the case of BSS for acoustic signals observed in a real environment, i.e., convolutive mixtures, focusing on speech signals in particular. In recent years, the problem has been widely studied and a number of different approaches have been proposed [1, 2]. Many stateoftheart unmixing methods of acoustic signals are based on independent component analysis (ICA) in the frequency domain, where the convolutions of the source signals with the room impulse response are reduced to multiplications with the corresponding transfer functions. So for each frequency bin, an individual instantaneous ICA problem arises [2].
Due to the nature of ICA algorithms, obtaining a consistent ordering of the recovered signals is highly unlikely. In case of frequency domain source separation, this means that the ordering of outputs may change for each frequency bin. In order to correctly estimate source signals in the time domain, all separated frequency bins need to be put in a consistent order. This problem is also known as the permutation problem.
There exist several classes of algorithms giving a solution for the permutation problem. Approaches presented in [3–6] try to find permutations by considering the cross statistics (such as cross correlation or cross cumulants etc.) of the spectral envelopes of adjacent frequency bins. In [7] algorithms were proposed, that make use of the spectral distance between neighboring bins and try to make the impulse response of the mixing filters short, which corresponds to smooth transfer functions of the mixing system in the frequency domain. The algorithm proposed by Kamata et al. [8] solves the problem using the continuity in power between adjacent frequency components of the same source. A similar method was presented by Pham et al. [9]. Baumann et al. [10] proposed a solution by comparing the directivity patterns resulting from the estimated demixing matrix in each frequency bin. Similar algorithms were presented in [11–13]. In [14] it was suggested to use the direction of arrival (DOA) of source signals, determined from the estimated mixing matrices, for the problem solution. The approach in [15] is to exploit the continuity of the frequency response of the mixing filter. A similar approach was presented in [16] using the minimum of the L_{1}norm of the resulting mixing filter and in [17] using the minimum distance between the adjacent filter coefficients. In [18] the authors suggest to use the cosine between the demixing coefficients of different frequencies as a cost function for the problem solution. Sawada et al. [19] proposed an approach based on basis vector clustering of the normalized estimated mixing matrices. In [20] a hybrid approach combines spectral continuity, temporal envelope and beamforming alignment with a psychoacoustic postfilter, and in [21] the permutation problem was solved using a maximumlikelihoodratio between the adjacent frequency bins.
However with growing number of the independent components, the complexity of the solution grows. This is true not only because of the factorial increase of permutations to be considered, but also because of the degradation of the ICA performance. So not all of the approaches mentioned above perform equally well for an increasing number of sources.
The goal of this article is to investigate the usefulness of information theoretic distance measures for the solution of the permutation ambiguity problem. For this purpose it is assumed that the amplitudes of the estimated independent signals possess a Rayleigh distribution [22] and the logarithms of the amplitudes possess a generalized Gaussian distribution (GGD). It should be noted that the approach in [23] is based on a similar assumption, namely that the extracted signals are generalized Gaussian distributed. The authors handle the problem by comparing the parameters of the GGD of each frequency bin. However the resulting algorithm solves the permutation problem only partially and requires a combination with another approach, for instance [24].^{a} In contrast, the algorithms proposed in this article deal with the problem in a selfcontained way and require no completion by other approaches.
The resulting approaches will be tested on different speech mixtures recorded in real environments with different reverberation times in combination with different ICA algorithms, such as JADE [25], INFOMAX [4, 26], and FastICA [27, 28].
2 Problem formulation
This section provides an introduction into the problem of blind separation of acoustic signals.
In other words, for the estimated vector y(t) and the source vector s(t), y(t) ≈ s(t) should hold.
This problem is also known as cocktailpartyproblem. A common way to deal with the problem is to reduce it to a set of instantaneous separation problems, for which efficient approaches exist.
3 Permutation correction
This section gives an overview over the applied permutation correction methods. To resolve the permutations, the probability density functions (pdfs) of the magnitudes or of the logarithms of the magnitudes of the resulting frequency bins are compared. At this point, the assumption is made that adjacent frequency bins of the same source signal possess similar distributions.
3.1 Speech density modeling
3.1.1 Distribution of the speech magnitudes
where σ is a shape parameter that can be estimated e.g., by using the maximum likelihood estimator [30].
is a vector of the decorrelated random variables and ${\stackrel{\u0303}{\sigma}}_{i}$ is the shape parameter for the signal ${\stackrel{\u0303}{x}}_{i}$[31][32].^{b}
3.1.2 Distribution of the logarithms of the speech magnitudes
The βparameter describes the distribution shape and σ is the standard deviation of x. However, the βparameter is unknown and needs to be estimated e.g., by using the maximum likelihood estimator [33] or the moment estimator [34, 35].
where Σ is the covariance matrix of x and $\stackrel{\u0303}{\mathbf{x}}$ is a vector of the decorrelated random variables (Equation (9)) [33].
3.2 Distance measures
is a permutation of Ŷ(Ω_{ k }, τ), π(x) defines a permutation of the components of the vector x and N is the number of separated signals. The total distance D between a permutated vector of frequency bins, Ŷ^{ P }(Ω_{ k }, τ), and a reference vector in bin k + 1, is a sum of distances between each pair ${\u0176}_{n}^{P}\left({\Omega}_{k},\tau \right)$ and Ŷ_{ n }(Ω_{k+1}, τ).
Below, several information theoretic similarity measures will be considered, which seem to be suitable for the solution of the permutation ambiguity problem. But first a definition of entropy or "selfinformation" is necessary.
where f(x) is the multivariate pdf.

Rényi generalized divergence between two distributions f(x) and g(x) of order α, where α ≥ 0, is defined [36] as${d}_{\alpha}\left(f\left(x\right)\left\rightg\left(x\right)\right)=\frac{1}{\alpha 1}\text{log}\left(\int {f}^{\alpha}\left(x\right){g}^{1\alpha}\left(x\right)dx\right).$(21)

Bhattacharyya coefficient${d}_{1/2}\left(f\left(x\right)\left\rightg\left(x\right)\right)=2\text{log}\left(\int \sqrt{f\left(x\right)g\left(x\right)}dx\right),$(22)

KullbackLeibler divergence${d}_{1}\left(f\left(x\right)\left\rightg\left(x\right)\right)=\int f\left(x\right)\text{log}\frac{f\left(x\right)}{g\left(x\right)}dx,$(23)

Log distance${d}_{2}\left(f\left(x\right)\left\rightg\left(x\right)\right)=\text{log}E\left[\frac{f\left(x\right)}{g\left(x\right)}\right],$(24)

and log of the maximum ratio${d}_{\infty}\left(f\left(x\right)\left\rightg\left(x\right)\right)=\text{log}\underset{x}{\text{sup}}\frac{f\left(x\right)}{g\left(x\right)}.$(25)

Mutual information for a vector of random variables X = (X_{1}, X_{2}, ..., X_{ K }) is defined as the KullbackLeibler divergence between the product of the distribution functions ${\prod}_{i=1}^{K}{f}_{{X}_{i}}\left({x}_{i}\right)$ and the multivariate distribution f_{ x }(x)$I\left(\mathbf{X}\right)={d}_{1}\left({f}_{\mathbf{x}}\left(\mathbf{x}\right)\u2225\prod _{i=1}^{K}{f}_{{X}_{i}}\left({x}_{i}\right)\right)$(26)$=\int {f}_{\mathbf{x}}\left(\mathbf{x}\right)\text{log}\frac{{f}_{\mathbf{X}}\left(\mathbf{x}\right)}{{\prod}_{i=1}^{K}{f}_{{X}_{i}}\left({x}_{i}\right)}d\mathbf{X}$(27)$=\int {f}_{\mathbf{x}}\left(\mathbf{x}\right)\text{log}{f}_{\mathbf{x}}\left(\mathbf{x}\right)d\mathbf{x}\int {f}_{\mathbf{X}}\left(\mathbf{x}\right)\text{log}\prod _{i=1}^{K}{f}_{{X}_{i}}\left({x}_{i}\right)d\mathbf{x}$(28)$=\sum _{i=1}^{K}{H}_{1}\left({f}_{{X}_{i}}\left({x}_{i}\right)\right){H}_{1}\left({f}_{\mathbf{x}}\left(\mathbf{x}\right)\right)$(29)
where ${H}_{1}\left({f}_{{X}_{i}}\left({x}_{i}\right)\right)$ is the marginal entropy and H_{1}(f_{ x }(x)) is the joint entropy of X.

The JensenRényi divergence of the vector of random variables X = (X_{1}, X_{2}, ..., X_{ K }) of order α, where α ≥ 0, is defined [40] as${d}_{J{R}_{\alpha}}\left(\mathbf{X}\right)={H}_{\alpha}\left(\frac{1}{K}\sum _{i=1}^{K}{f}_{{X}_{i}}\left({x}_{i}\right)\right)\frac{1}{K}\sum _{i=1}^{K}{H}_{\alpha}\left({f}_{{X}_{i}}\left(x\right)\right).$(30)

The modified JensenRényi divergence. The JensenRényidivergence from the Equation (30) measures the distance between two distributions f_{ X }(x) and f_{ Y }(x) in respect to a third point in the distribution space. In this case, the third point is chosen as the average of the two distributions. This approach is justified because of the concavity of the entropy in distribution space$H\alpha \left(\frac{{f}_{X}\left(x\right)+{f}_{Y}\left(x\right)}{2}\right)\ge \frac{{H}_{\alpha}\left({f}_{X}\left(x\right)\right)+{H}_{\alpha}\left({f}_{Y}\left(x\right)\right)}{2}.$(31)
In principle, it is possible to define the distance in respect to any other point, if the assumption of the concavity for this point holds. Such a point can be chosen as an average over the random variables, the distributions of which are currently analyzed.
where $\stackrel{\u0304}{X}=\frac{1}{K}{\sum}_{i=1}^{K}{X}_{i}$ In the way the modified JensenRényi divergence is used here, this distance measure describes the amount of new information coming to a spectrogram if an adjacent frequency bin Y(Ω_{k+1}, τ) is included. The lesser the new information provided, the closer the frequency bins are. This modification has less computational burden than the classical JensenRényi divergence, since for ${H}_{\alpha}\left({f}_{\stackrel{\u0303}{X}}\left(x\right)\right)$, only one pdf has to be calculated instead of K in the JensenRényi divergence. Furthermore, for the entropy ${H}_{\alpha}\left({f}_{\stackrel{\u0304}{X}}\left(x\right)\right)$ there exists an analytical solution, which improves the accuracy of the results.
3.3 The Permutation correction algorithm
In this section the actual permutation correction algorithm will be discussed. As mentioned before, it will be assumed that subsequent frequency bins of the same source signal possess similar distributions. The similarity between the frequency bins is measured by applying the measures given in Equations (21),(29), (30), and (37) in the optimization of Equation (16).
where L is the number of the already corrected frequency bins to be used for the averaging. Then the correction algorithm can be implemented as described in Algorithm 1.
Algorithm 1
 1.
Initialization: Start with the frequency^{d} Set k = N_{FFT}/2.
 2.
Estimate the parameters of the Rayleigh distribution of Ŷ(Ω_{ k }, τ) and of the average of L already corrected bins $\frac{1}{\widehat{L}}{\sum}_{l=k+1}^{\widehat{L}}\left{\u0176}_{n}\left({\Omega}_{l},\tau \right)\right$, with $\widehat{L}=\text{min}\left(k+1+L,{N}_{\text{FFT}}/2+1\right)\left(k+1\right)$ using Equations (6)(9).
 3.
Calculate $D\left[f\left(\left{\widehat{\mathbf{Y}}}^{P}\left({\Omega}_{k},\tau \right)\right\right),f\left(\frac{1}{\widehat{L}}{\sum}_{l=k+1}^{\widehat{L}}\left\widehat{\mathbf{Y}}\left({\Omega}_{l},\tau \right)\right\right)\right]$ as defined in Equation (38) for all possible permutations of Ŷ(Ω_{ k }, τ).
 4.
Choose the permutation π_{+}(Ŷ(Ω_{ k }, τ)) with the most dependent value of D.
 5.
Correct the current frequency bin in order with the best permutation π_{+}(Ŷ(Ω_{ k }, τ)).
 6.
Decrement k and if k ≠ 0 go to Step 2.
The same scheme can be applied on the logarithms of the spectral magnitudes of the signals log Ŷ(Ω_{ k }, τ) instead of Ŷ(Ω_{ k }, τ) and using generalized Gaussian instead of Rayleigh distributions. In that case Algorithm 2 results.
Algorithm 2
 1.
Initialization: Start with the frequency k = N_{FFT}/2.
 2.
Estimate the GGD parameters of log Ŷ(Ω_{ k }, τ) and of the average of L already corrected bins $\text{log}\left(\frac{1}{\widehat{L}}{\sum}_{l=k+1}^{\widehat{L}}\left{\u0176}_{n}\left({\Omega}_{l},\tau \right)\right\right)$, with $\widehat{L}=\text{min}\left(k+1+L,{N}_{\text{FFT}}/2+1\right)\left(k+1\right)$ using Equations (10)(13).^{e}
 3.
Calculate $D\left[f\left(\text{log}\left{\widehat{\mathbf{Y}}}^{P}\left({\Omega}_{k},\tau \right)\right\right),f\left(\text{log}\left(\frac{1}{\widehat{L}}{\sum}_{l=k+1}^{\widehat{L}}\left\widehat{\mathbf{Y}}\left({\Omega}_{l},\tau \right)\right\right)\right)\right]$ as defined in Equation (38) for all possible permutations of Ŷ(Ω_{ k }, τ).
 4.
Choose the permutation π_{+}(log Ŷ(Ω_{ k }, τ)) with the most dependent value of D.
 5.
Correct the current frequency bin in order with the best permutation π_{+}(log Ŷ(Ω_{ k }, τ)).
 6.
Decrement k and if k ≠ 0 go to Step 2.
The Algorithms 1 and 2 will be used in the following sections for the experimental comparison of the distance measures given in Equations (21),(29), (30), and (37).
4 Experiments and results
4.1 Conditions
Mixture characteristics
Mixture  Mix. 1  Mix. 2  Mix. 3 

TU Berlin  TU Berlin  TU Berlin  
Reverberation time T_{ R }  159 ms  159 ms  159 ms 
Distance between two sensors d  3 cm  3 cm  3 cm 
Sampling rate f_{ S }  11 kHz  11 kHz  11 kHz 
Number of speakers N  2  3  4 
Number of microphones M  2  3  4 
Distance between speaker i and array center  L_{1} = L_{2} = 0.9 m  L_{1} = L_{2} = L_{3} = 0.9 m  L_{1} = L_{2} = L_{3} = L_{4} = 0.9 m 
Angular position of the speaker i  θ_{1} = 50°  θ_{1} = 30°  θ_{1} = 25° 
θ_{2} = 115°  θ_{2} = 80°  θ_{2} = 80°  
θ_{3} = 135°  θ_{3} = 130°  
θ_{4} = 155°  
Mean input SIR in [dB]  0.1 dB  3 dB  5 dB 
Mixture characteristics
Mixture  Mix. 4  Mix. 5  Mix. 6 

TU Berlin  TU Berlin  TU Berlin  
Reverberation time T_{ R }  189 ms  189 ms  189 ms 
Distance between two sensors d  3 cm  3 cm  3 cm 
Sampling rate f_{ S }  11 kHz  11 kHz  11 kHz 
Number of speakers N  2  3  4 
Number of microphones M  2  3  4 
Distance between speaker i and array center  L_{1} = L_{2} = 2.0 m  L_{1} = L_{2} = L_{3} = 2.0 m  L_{1} = L_{2} = L_{3} = L_{4} = 2.0 m 
Angular position of the speaker i  θ_{1} = 75°  θ_{1} = 35°  θ_{1} = 30° 
θ_{2} = 165°  θ_{2} = 80°  θ_{2} = 75°  
θ_{3} = 165°  θ_{3} = 125°  
θ_{4} = 165°  
Mean input SIR in [dB]  0.04 dB  3.4 dB  6.9 dB 
Mixture characteristics
Mixture  Mix. 7  Mix. 8  Mix. 9 

NTT  NTT  NTT  
Reverberation time T_{ R }  130 ms  130 ms  130 ms 
Distance between two sensors d  4 cm  4 cm  4 cm 
Sampling rate f_{ S }  8 kHz  8 kHz  8 kHz 
Number of speakers N  2  3  4 
Number of microphones M  2  3  4 
Distance between speaker i and array center  L_{1} = L_{2} = 1.2 m  L_{1} = L_{2} = L_{3} = 1.2 m  L_{1} = L_{2} = L_{3} = L_{4} = 1.2 m 
Angular position of the speaker i  θ_{1} = 75°  θ_{1} = 35°  θ_{1} = 30° 
θ_{2} = 165°  θ_{2} = 80°  θ_{2} = 75°  
θ_{3} = 165°  θ_{3} = 125°  
θ_{4} = 165°  
Mean input SIR in [dB]  0.02 dB  2.9 dB  4.7 dB 
Mixture characteristics
Mixture  Mix. 10  Mix. 11  Mix. 12 

TU Berlin  TU Berlin  TU Berlin  
Reverberation time T_{ R }  159 ms  159 ms  159 ms 
Distance between two sensors d  12 cm  12 cm  12 cm 
Sampling rate f_{ S }  11 kHz  11 kHz  11 kHz 
Number of speakers N  2  3  4 
Number of microphones M  2  3  4 
Distance between speaker i and array center  L_{1} = L_{2} = 0.9 m  L_{1} = L_{2} = L_{3} = 0.9 m  L_{1} = L_{2} = L_{3} = L_{4} = 0.9 m 
Angular position of the speaker i  θ_{1} = 30°  θ_{1} = 30°  θ_{1} = 30° 
θ_{2} = 70°  θ_{2} = 70°  θ_{2} = 70°  
θ_{3} = 150°  θ_{3} = 115°  
θ_{4} = 170°  
Mean input SIR in [dB]  0.02 dB  2.5 dB  4.2 dB 
4.2 Parameter settings
The algorithms were tested on all recordings, which were first transformed to the frequency domain at a resolution of N_{FFT} = 1, 024. For calculating the spectrogram, the signals were divided into overlapping frames with a Hanning window and an overlap of 3/4 · N_{FFT}.
4.3 ICA performance measurement
as a measure of the signal quality. Here ${y}_{i,{s}_{j}}$ is the ith separated signal with only the source s_{ j } active, and ${x}_{k,{s}_{j}}$ is the observation obtained by microphone k when only s_{ j } is active. α and δ are parameters for phase and amplitude chosen to optimally compensate the difference between ${y}_{i,{s}_{j}}$ and ${x}_{k,{s}_{j}}$ [19].
were used, where N is the number of speakers in the considered mixture.
4.4 Experimental results
In this section the experimental results of the signal separation will be compared. All the mixtures from Tables 1, 2, 3, and 4 were separated by JADE, INFOMAX, and the FastICA algorithm and the permutation problem was solved using either Algorithm 1 or 2 from Section 3.3 and distance measures from Equations (21), (29), (30), and (37). For each result the performance is calculated using Equations (39) and (40).
Average values of the obtained results of Algorithm 1 in terms of Δ SIR and SDR for each distance measure
Distance measure  ΔSIR  SDR  

M _{ 2 }  M _{ 3 }  M _{ 4 }  M _{ 2 }  M _{ 3 }  M _{ 4 }  
Bhattacharyya coefficient  0.97  1.32  2.5  3.21  2.77  1.78 
KullbackLeibler divergence  0.86  2.12  1.93  5.03  3.01  1.02 
Log of the maximum ratio  0.62  2.14  1.22  4.63  2.76  0.82 
JensenRényi divergence, α = 0.5  3.00  3.65  5.44  5.33  3.17  1.43 
JensenRényi divergence, α = 1  4.00  4.50  6.44  6.08  3.49  2.15 
JensenRényi divergence, α = 2  4.01  3.94  6.09  5.75  3.29  1.45 
Mod. JensenRényi divergence, α = 0.5  7.89  7.27  8.99  8.12  4.86  2.78 
Mod. JensenRényi divergence, α = 1  7.89  7.27  8.97  8.12  4.87  2.74 
Mod. JensenRényi divergence, α = 2  7.89  7.28  8.98  8.12  4.87  2.78 
Mutual information  7.35  7.66  8.15  7.79  5.23  2.59 
Average values of the obtained results of Algorithm 2 in terms of Δ SIR and SDR for each distance measure
Distance measure  ΔSIR  SDR  

M _{ 2 }  M _{ 3 }  M _{ 4 }  M _{ 2 }  M _{ 3 }  M _{ 4 }  
Bhattacharyya coefficient  2.21  3.23  3.54  5.53  3.65  1.10 
KullbackLeibler divergence  3.78  5.23  5.47  5.97  4.2  1.46 
Log of the maximum ratio  3.52  4.99  4.14  6.32  4.12  1.14 
JensenRényi divergence, α = 0.5  3.83  4.93  5.77  6.19  3.92  1.64 
JensenRényi divergence, α = 1  4.00  5.04  5.45  6.39  4.12  1.44 
JensenRényi divergence, α = 2  2.84  4.42  5.34  6.04  4.14  1.41 
Mod. JensenRényi divergence, α = 0.5  7.31  8.14  8.53  8.01  5.63  2.44 
Mod. JensenRényi divergence, α = 1  7.35  8.15  8.61  8.07  5.67  2.47 
Mod. JensenRényi divergence, α = 2  7.40  8.27  8.43  8.12  5.76  2.50 
Mutual information  7.31  8.50  8.37  8.18  6.00  2.60 
4.5 Discussion
The calculated results show the usefulness of the proposed method for permutation correction, though not all of the applied distance measures perform equally. As already mentioned above, the best results were achieved using mutual information and the modified JensenRényi divergence, while results obtained using generalized Rényi divergence are rather poor. This is especially the case, if α = 2 is used. Of all the applied distance measures based on the generalized Rényi divergence, the best performance was achieved in the case of the Bhattacharyya coefficient, i.e., α = 0.5. A similar tendency can be seen with "classical" JensenRényi divergence. Here the best results were achieved using α = 1. In contrast, correction based on mutual information and the modified JensenRényi divergence provides stable good results.
Furthermore, the effects of the various reverberation times on the performance of different distance measures are going to be studied in the future. While, as it can be seen in Figures 3 and 4, the performance of the separation system decreases with a growing reverberation time of the environment, the effect of reverberation time on different methods of permutation correction should be analyzed in a more exact way.
On the other hand, the assumption of the Rayleigh distribution and GGD is good enough for permutation correction with mutual information and the modified JensenRényi divergence, since these distance measures are not as sensitive to the inter frequencybin pdf perturbations as the generalized Rényi divergence. Furthermore, in this case there exists an analytical solution for modified JensenRényi divergence, which reduces the computational burden of the algorithm and improves the accuracy of the solution.
As it can be seen in Tables 5 and 6, the separation performance of Algorithm 1 is slightly better than the performance of Algorithm 2. A possible explanation for this issue is the fact that for the Rayleigh pdf just one parameter has to be estimated instead of 2 parameters in case of GGD. Furthermore the estimation of the GGD parameters is more complicated than the estimation of the σ in case of Rayleigh distribution. These might cause the uncertainties and errors in the permutation correction.
Average values of the obtained results in terms of Δ SIR and SDR for each distance measure
Algorithms  Δ SIR  SDR  

M2  M3  M4  M2  M3  M4  
Proposed Algorithm 1 with JensenRényi div., α = 2  7.90  7.28  8.98  8.12  4.87  2.78 
Proposed Algorithm 2 with JensenRényi div., α = 0.5  7.31  8.14  8.53  8.01  5.63  2.44 
Proposed Algorithm 1 with mutual information  7.35  7.66  8.15  7.79  5.23  2.59 
Proposed Algorithm 2 with mutual information  7.31  8.50  8.37  8.18  6.00  2.60 
Permutation correction based on phase difference [50]  7.38  6.77  7.99  8.11  4.87  2.69 
Crosscorrelation [4]  7.44  7.61  7.76  8.16  5.35  2.63 
Power ratio [51]  7.90  7.86  8.53  8.37  5.42  2.68 
Continuity of the impulse response of the calculated mixing system [15]  3.93  1.83  2.14  5.67  2.98  0.89 
Amplitutde modulation decorrelation [3]  6.89  7.55  8.02  7.83  5.24  2.64 
Crosscumulants [4]  3.10  2.16  2.42  5.53  2.80  0.61 
Continuity of the mixing system [17]  0.33  0.65  0.93  4.37  2.52  0.77 
Minimum of the L_{1}norm of the mixing system [16]  0.04  0.65  1.41  3.94  3.32  1.02 
pNorm distance (p = 1) [8]  6.05  7.60  7.68  7.37  5.51  2.38 
Clustering of the amplitudes [9]  6.93  5.17  5.22  8.00  4.56  1.77 
Likelihood ratio criterion between the frequency bins [21]  7.47  8.33  8.74  8.07  6.17  2.67 
Basis vector clustering [19]  7.08  5.40  6.49  7.63  4.32  1.92 
Minima of the beampattern [10]  7.40  3.74  2.81  7.74  3.24  0.82 
Cosine distance [18]  5.33  4.78  4.56  6.76  4.26  1.74 
GGD parameter comparison and crosscorrelation [23]  0.45  4.64  6.39  4.13  3.71  1.77 
5 Conclusions
In this article, a method for the permutation correction in convolutive source separation has been presented. The approach is based on the assumption that magnitudes of speech signals adhere to a Rayleigh distribution and the logarithms of magnitudes can be modeled by a GGD. The assumption of Rayleigh or GG distributed signals allows to use information theoretic similarity measures. The information theoretic distance measures are used to detect similarities in subsequent frequency bins after binwise source separation is completed, in order to group the frequency bins coming from the same source. Beside the existing information theoretic distance measures, a modification of the JensenRényi divergence is proposed. This modified distance measure shows very good results for the considered problem.
The proposed method has been tested on different reverberant speech mixtures in connection with different ICA algorithms. The experimental results and the comparison with today's stateoftheart approaches for permutation correction show the usefulness of the proposed method. Further, the experimental results have shown that the method performs best using either the mutual information or the modified JensenRényi divergence criterion (Tables 5 and 6). This fact may be explained at least partially by the ability of the JensenRenyi divergence and the mutual information to utilize temporal dependence structure, which puts these two criteria ahead of the Rényi generalized divergence and its special cases of the KullbackLeibler divergence and the log maximum ratio, which we considered as alternatives.
Appendix 1
where γ is the EulerMascheroni constant γ ≈ 0.57722.
The solution of the Equation (46) is given in [38]. The solutions of the Equations (44) and (48) were derived using MATHEMATICA. For the distance measures without an analytical solution the trapezoidal rule for numerical integration was applied [49].
Appendix 2
Since information theoretic similarity measures make use only of the pdfs of the signals, a question may arise, as to whether temporal dependence structures of the signals are utilized at all in the suggested framework. The temporal structure is taken into account indirectly in the applied similarity measures, since each of the measures contains a term where either the joint probability, the pdf of the mean value of the random variables (Equation (37)), the mean of the pdf or a quotient of the pdfs is considered. These are the terms where the values of the distribution functions produced at the same time domain window are "compared".
Comparison of signal pairs 〈 U_{ 1 }( τ ), U_{ 2 }( τ ) 〉 and 〈 U_{ 1 }( τ ), U_{ 3 }( τ ) 〉 with each distance measure
Distance measure  〈U_{1}(τ), U_{2}(τ)〉  〈U_{1}(τ), U_{3}(τ)〉 

Bhattacharyya coefficient  0,09  0,59 
KullbackLeibler divergence  0,00  0,16 
Log of the maximum ratio  0,00  0,16 
JensenRényi divergence with α = 0.5  0,43  0,23 
JensenRényi divergence with α = 1  1,92  0,22 
JensenRényi divergence with α = 2  68,35  45,4 
Modified JensenRényi divergence with α = 0.5  0,12  0,02 
Modified JensenRényi divergence with α = 1  0,35  0,04 
Modified JensenRényi divergence with α = 2  31,06  2,50 
Mutual information  15,53  16,46 
Comparison of signal pairs 〈log( U_{ 1 }( τ )), log( U_{ 2 }( τ ))〉 and 〈log( U_{ 1 }( τ )), log( U_{ 3 }( τ ))〉 with each distance measure
Distance measure  〈log(U_{1}(τ)), log(U_{2}(τ))〉  〈log(U_{1}(τ)), log(U_{3}(τ))〉 

Bhattacharyya coefficient  0,03  0,71 
KullbackLeibler divergence  0,0  0,03 
Log of the maximum ratio  0,0  0,30 
JensenRényi divergence, α = 0.5  17,83  7,48 
JensenRényi divergence, α = 1  7,54  1,58 
JensenRényi divergence, α = 2  0,99  0,01 
Mod. JensenRényi divergence, α = 0.5  4,79  1,26 
Mod. JensenRényi divergence, α = 1  1,23  0,26 
Mod. JensenRényi divergence, α = 2  0,11  0,01 
Mutual information  4,59  6,43 
As can be seen, for this example each similarity measure that was considered in this article rates U_{1}(τ) more similar to U_{3}(τ) than to U_{2}(τ),^{f} which implies that the temporal dependencies and correlations were not ignored during the computation of the probability distribution functions.
In contrast to the other measures, in the case of the Rényi generalized divergence defined in Equation (21), and in its special cases of the KullbackLeibler divergence and the log maximum ratio, the time dependency can not be taken into account in this manner. Still, these similarity measures can also be used for permutation correction, since the situation we considered in the example above is rather artificial and cannot be expected for realistic situations with two speech signals as the desired sources.
Endnotes
^{a}In the cases where no permutation correction by the means of the comparison of the GGD parameters is possible, the problem is handled by applying the correlation based permutation correction approach. ^{b}Equation (7) is a special case of the multivariate Weibull distribution with α = 1 and c_{ i } = 2 [32, Equation (14)]. ^{c}E.g. ${d}_{J{R}_{\alpha}}\left({X}_{1},{X}_{2}\right)={d}_{J{R}_{\alpha}}\left({X}_{2},{X}_{1}\right)$. ^{d}The proposed algorithm solves the permutations problem starting with the higher frequency bins. The first frequency bin in this case is the bin with k = N_{FFT}/2 + 1. Since there is no other definition of the correct order of the signals, the signal order in frequency bin k = N_{FFT}/2+1 will be assumed as correct. ^{e}For the experiments from the Section 4 the parameter β was calculated using the approximation for the inverse function as proposed in [34]. ^{f}The more dependent the signals are, the higher the value of the mutual information Equation (29) becomes, while simultaneously, the values of the similarity measures from (21), (30), and (37) decrease.
Declarations
Authors’ Affiliations
References
 Mansour A, Kawamoto M: ICA papers classified according to their applications and performances. IEICA Trans. Fundam 2003, E86A(3):620633.Google Scholar
 Pedersen MS, Larsen J, Kjems U, Parra LC: Convolutive blind source separation methods. In Springer Handbook of Speech Processing and Speech Communication. Springer Verlag, Berlin/Heidelberg; 2008:10651094.View ArticleGoogle Scholar
 Anemüller J, Kollmeier B: Amplitude modulation decorrelation for convolutive blind source separation. In Proc. ICA 2000. Helsinki; 2000:215220.Google Scholar
 Mejuto C, Dapena A, Castedo L: Frequencydomain infomax for blind separation of convolutive mixtures. In Proc. ICA 2000. Helsinki; 2000:315320.Google Scholar
 Murata N, Ikeda S, Ziehe A: An approach to blind source separation based on temporal structure of speech signals. Neurocomputing 2001, 41(14):124.MATHView ArticleGoogle Scholar
 Reju VG, Koh SN, Soon IY: Partial separation method for solving permutation problem in frequency domain blind source separation of speech signals. Neurocomputing 2008, 71: 20982112.View ArticleGoogle Scholar
 Parra L, Spence C, De Vries B: Convolutive blind source separation based on multiple decorrelation. In Proc. IEEE NNSP Workshop. Cambridge, UK; 1998:2332.Google Scholar
 Kamata K, Hu X, Kobatake H: A new approach to the permutation problem in frequency domain blind source separation. In Proc. ICA 2004. Granada, Spain; 2004:849856.Google Scholar
 Pham DT, Servière C, Boumaraf H: Blind separation of speech mixtures based on nonstationarity. IEEE Signal Processing and Its Applications, Proceedings of the Seventh International Symposium 2003, 7376.Google Scholar
 Baumann W, Kolossa D, Orglmeister R: Maximum likelihood permutation correction for convolutive source separation. ICA 2003 2003, 373378.Google Scholar
 Kurita S, Saruwatari H, Kajita S, Takeda K, Itakura F: Evaluation of frequencydomain blind signal separation using directivity pattern under reverberant conditions. ICASSP2000 2000, 31403143.Google Scholar
 Ikram M, Morgan D: A beamforming approach to permutation alignment for multichannel frequencydomain blind speech separation. ICASSP02 2002, 881884.Google Scholar
 Mitianoudis N, Davies M: Permutation alignment for frequency domain ICA using subspace beamforming methods. Proc. ICA 2004, LNCS 3195 2004, 669676.Google Scholar
 Sawada H, Mukai R, Araki S, Makino S: A robust approach to the permutation problem of frequencydomain blind source separation. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003) 2003, V: 381384.Google Scholar
 Pham DT, Servière C, Boumaraf H: Blind separation of convolutive audio mixtures using nonstationarity. Proc. ICA2003 2003, 981986.Google Scholar
 Sudhakar P, Gribonval R: A sparsitybased method to solve permutation indeterminacy in frequencydomain convolutive blind source separation. In Independent Component Analysis and Signal Separation: 8th International Conference, ICA 2009, Proceedings. Paraty, Brazil; 2009.Google Scholar
 Baumann W, Köhler BU, Kolossa D, Orglmeister R: Real time separation of convolutive mixtures. In Independent Component Analysis and Blind Signal Separation: 4th International Symposium, ICA 2001, Proceedings. San Diego, USA; 2001.Google Scholar
 Asano F, Ikeda S, Ogawa M, Asoh H, Kitawaki N: Combined approach of array processing and independent component analysis for blind separation of acoustic signals. IEEE Trans. Speech Audio Proc 2003, 11(3):204215.View ArticleGoogle Scholar
 Sawada H, Araki S, Mukai R, Makino S: Blind extraction of a dominant source from mixtures of many sources using ICA and timefrequency masking. Proc. ISCAS 2005 2005, 58825885.Google Scholar
 Wang W, Chambers JA, Sanei S: A novel hybrid approach to the permutation problem of frequency domain blind source separation. In Proc. 5th International Conference on Independent Component Analysis and Blind Signal Separation, ICA 2004. Granada, Spain; 2004:530537.Google Scholar
 Mitianoudis N, Davies ME: Audio source separation of convolutive mixtures. IEEE Trans. Audio Speech Process 2003, 11(5):489497.View ArticleGoogle Scholar
 Ephraim Y, Malah D: Speech enhancement using a minimum mean square error logspectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process 1985, 33: 443445.View ArticleGoogle Scholar
 Mazur R, Mertins A: Solving the permutation problem in convolutive blind source separation. Proc. ICA 2007, LNCS 4666 2007, 512519.Google Scholar
 Ikeda S, Murata N: A method of blind separation based on temporal structure of signals. Proc. Int. Conf. on Neural Information Processing 1998, 737742.Google Scholar
 Cardoso JF: High order contrasts for independent component analysis. Neural Comput 1999, 11: 157192.View ArticleGoogle Scholar
 Bell A, Sejnowski T: An informationmaximization approach to blind separation and blind deconvolution. Neural Comput 1995, 7: 11291159.View ArticleGoogle Scholar
 Hyvärinen A, Oja E: A fast fixedpoint algorithm for independent component analysis. Neural Comput 1997, 9: 14831492.View ArticleGoogle Scholar
 Mitianoudis N, Davies M: New fixedpoint solutions for convolved mixtures. In Proc. ICA2001. San Diego, CA; 2001:633638.Google Scholar
 Allen JB, Rabiner LR: A unified approach to shorttime Fourier analysis and synthesis. Proc. IEEE 1977, 65: 15581564.View ArticleGoogle Scholar
 Lee KR, Kapadia CH, Brock DB: On estimating the scale parameter of the Rayleigh distribution from doubly censored samples. Statistische Hefte 1980, 21(1):1429.MATHMathSciNetView ArticleGoogle Scholar
 Hoffman WC: The joint distribution of n successive outputs of a linear detector. J. Appl. Phys 1954, 25: 10061007.MATHMathSciNetView ArticleGoogle Scholar
 Darbellay GA, Vajda I: Entropy expressions for multivariate continuous distributions. IEEE Trans. Inf. Theory 2000, 46(2):709712.MATHMathSciNetView ArticleGoogle Scholar
 Boubchir L, Fadili JM: Multivariate statistical modeling of images with the curvelet transform. IEEE Signal Processing and Its Applications, 2005. Proc. of the Eighth International Symposium 2005, 2: 747750.View ArticleGoogle Scholar
 DominguezMolina JA, GonzalezFarias G, RodriguezDagnino RM: A practical procedure to estimate the shape parameter in the generalized Gaussian distribution. CIMAT Tech. Rep. I0118_eng.pdf [Online] [http://www.cimat.mx/reportes/enlinea/I0118_eng.pdf]
 Prasad R: Fixedpoint ICA based speech signal separation and enhancement with generalized Gaussian model. PhD Thesis 2005. [http://citeseer.ist.psu.edu/prasad05fixedpoint.html]Google Scholar
 Rényi A: On measures of entropy and information. In Selected Papers of Alfred Rényi. Volume 2. Akaemia Kiado, Budapest; 1976:565580.Google Scholar
 Principe JC, Xu D, Fisher JW III: Informationtheoretic learning. In Unsupervised Adaptive Filtering. Edited by: Haykin S. Wiley, New York; 2000:265319.Google Scholar
 Cover TM, Thomas JA: Elements of Information Theory. Wiley, New York; 1991.MATHView ArticleGoogle Scholar
 Hero AO, Ma B, Michel O, Gorman JD: Alpha divergence for classification, indexing and retrieval. Technical Report 328, Comm. and Sig. Proc. Lab., Dept. EECS, Univ. Michigan 2001.Google Scholar
 Hamza AB, Krim H: JensenRényi divergence measure: theoretical and computational perspectives. In Proc. ISlT 2003. Yokohama, Japan; 2003.Google Scholar
 Martins AFT, Figueiredo MAT, Aguiar PMQ, Smith NA, Xing EP: Nonextensive entropic kernels. ICML 08: Proc. of the 25th International Conference on Machine Learning, ACM 2008, 307: 640647.View ArticleGoogle Scholar
 He Y, Hamza AB, Krim H: A generalized divergence measure for robust image registration. IEEE Trans. Signal Process 2003, 51(5):12111220.MathSciNetView ArticleGoogle Scholar
 Arndt C: Information measures: information and its description in science and engineering. In Signals and Communication Technology. 2nd edition. Springer, Berlin; 2004.Google Scholar
 Barthe F: Optimal Youngs inequality and its converse: a simple proof. Geom. Funct. Anal 1998, 8(2):234242.MATHMathSciNetView ArticleGoogle Scholar
 Bercher JF, Vignat C: A Renyi entropy convolution inequality with application. In Proc. EUSIPCO. Tolouse, France; 2002.Google Scholar
 Leonard RG: A Database for speakerindependent digit recognition. Proc. ICASSP 84 1984, 3: 42.11.Google Scholar
 Sawada H[http://www.kecl.ntt.co.jp/icl/signal/sawada/demo/bss2to4/index.html]
 Jafari MG, Plumbley MD: The role of high frequencies in convolutive blind source separation of speech signals. In Proc. 7th Int. Conf. on Independent Component Analysis and Signal Separation, ICA 2007. London, UK; 2007.Google Scholar
 Schwarz HR: Numerische Mathematik. B.G. Teubner, Stuttgart; 1997.MATHView ArticleGoogle Scholar
 Hoffmann E, Kolossa D, Orglmeister R: A batch algorithm for blind source separation of acoustic signals using ICA and timefrequency masking. In Proc. 7th Int. Conf. on Independent Component Analysis and Signal Separation, ICA 2007. London, UK; 2007.Google Scholar
 Sawada H, Araki S, Makino S: Measuring dependence of binwise separated signals for permutation alignment in frequencydomain BSS. Circuits and Systems, 2007. ISCAS 2007. IEEE International Symposium on (2007) 2007, 32473250.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.