Bayesian group sparse learning for music source separation
- Jen-Tzung Chien^{1}Email author and
- Hsin-Lung Hsieh^{1}
https://doi.org/10.1186/1687-4722-2013-18
© Chien and Hsieh; licensee Springer. 2013
Received: 28 October 2012
Accepted: 13 May 2013
Published: 5 July 2013
Abstract
Nonnegative matrix factorization (NMF) is developed for parts-based representation of nonnegative signals with the sparseness constraint. The signals are adequately represented by a set of basis vectors and the corresponding weight parameters. NMF has been successfully applied for blind source separation and many other signal processing systems. Typically, controlling the degree of sparseness and characterizing the uncertainty of model parameters are two critical issues for model regularization using NMF. This paper presents the Bayesian group sparse learning for NMF and applies it for single-channel music source separation. This method reconstructs the rhythmic or repetitive signal from a common subspace spanned by the shared bases for the whole signal and simultaneously decodes the harmonic or residual signal from an individual subspace consisting of separate bases for different signal segments. A Laplacian scale mixture distribution is introduced for sparse coding given a sparseness control parameter. The relevance of basis vectors for reconstructing two groups of music signals is automatically determined. A Markov chain Monte Carlo procedure is presented to infer two sets of model parameters and hyperparameters through a sampling procedure based on the conditional posterior distributions. Experiments on separating single-channel audio signals into rhythmic and harmonic source signals show that the proposed method outperforms baseline NMF, Bayesian NMF, and other group-based NMF in terms of signal-to-interference ratio.
Keywords
1 Introduction
Many problems in audio, speech and music processing can be tackled through matrix factorization. Different cost functions and constraints may lead to different factorized matrices. This procedure can identify underlying sources from the mixed signals through blind source separation [1]. Nonnegative matrix factorization (NMF) is designed to find an approximate factorization X≈A S for a data matrix X into a basis matrix A and a weight matrix S which are all nonnegative [2]. Some divergence measures have been proposed to derive solutions to NMF [3, 4]. NMF provides a useful learning tool for clustering as well as for classification. When a portion of labeled data are available, the semi-supervised NMF was developed for an improved classification system [5]. Different from standard principal component analysis (PCA) and independent component analysis (ICA), NMF only allows additive combination due to the nonnegative constraints on matrices A and S. Nevertheless, nonnegative PCA and nonnegative ICA were proposed for blind source separation in the presence of nonnegative image and music sources [6].
On the other hand, NMF conducts a parts-based sparse representation where only a few components or bases are relevant for representation of input nonnegative matrix X. The sparseness constraint is imposed in objective function [2]. An automatic relevance determination (ARD) scheme [7–9] is developed to determine relevant bases for sparse representation. Such sparse coding is efficient and robust. However, controlling the sparseness or smoothness is influential for system performance. Bayesian learning is beneficial to deal with sparse representation [9] and model regularization [7]. In [10], Bayesian learning was performed for sparse representation of image data where Laplacian distribution was used as prior density. The ℓ_{1}-regularized optimization was comparably performed. In addition, the group-based NMF [11] was proposed to capture the intra-subject variations and the inter-subject variations in EEG signals. In [12], the group sparse NMF was proposed by minimizing the Itakura-Saito divergence between X and AS. In [13], NMF was applied for drum source separation where the factorized components were partitioned into rhythmic sources and harmonic sources. No Bayesian learning was performed in [11–13].
More recently, a Bayesian NMF approach [14] was proposed for model selection and image reconstruction. This approach inferred NMF model by a variational Bayes method and a Markov chain Monte Carlo (MCMC) algorithm. In [15], a Bayesian NMF with gamma priors for source signals and mixture weights was implemented through a MCMC algorithm. In [16], the Bayesian NMF with Gaussian likelihood and exponential prior was constructed for image feature extraction where the posterior distribution was approximated by Gibbs sampling procedure. In [17], a Bayesian approach for blind separation of linear mixtures of sources was developed. The Student t distribution for mixture weights was introduced to achieve sparse basis representation. The underdetermined noisy mixtures were separated. However, the case of nonnegative source was not applied. Besides, single-channel source separation is known as an underdetermined problem. In [18], the harmonic structure information was adopted to estimate the demixed instrumental sources. In [19], the NMF was applied for single-channel speech separation where the speech of target speaker over that of masking speaker was enhanced by using sparse dictionaries learned on a phoneme level for individual speakers.
This paper addresses the problem of underdetermined source separation based on NMF for an application to music source separation [20]. The uses of NMF and Bayesian theory to source separation are not new since they have been many papers [11–13, 15]. But, to our best knowledge, the novelty of this paper is to propose Bayesian group sparse (BGS) learning using Laplacian distribution and Laplacian scale mixture (LSM) distribution and apply it for single-channel music signal separation. We present a group-based NMF where the groups of common bases and individual bases are estimated for blind separation of rhythmic sources and harmonic sources, respectively. Bayesian sparse learning is developed by introducing LSM distributions as the priors for two groups of reconstruction weights. Gamma priors are used to represent two groups of nonnegative basis components. The BGS-NMF algorithm is accordingly established. A MCMC algorithm is derived to infer BGS-NMF parameters and hyperparameters according to full Bayesian theory. The rhythmic sources and harmonic sources are reconstructed through the relevant bases in common subspace and individual subspace, respectively. In the experiments, the proposed BGS-NMF is evaluated and compared with the other NMF methods for single-channel separation of audio signals into rhythmic signals and harmonic signals. From comparative study, we find that the improvement of separation performance benefits from Bayesian modeling, group basis representation, and sparse signal reconstruction. Sparser priors identify fewer but more relevant bases and correspondingly lead to a better performance in terms of signal-to-interference ratio.
The remaining of this paper is organized as follows. In the next section, the related studies on NMF and group basis representation are surveyed. Some Bayesian learning approaches are addressed. Section 3 highlights on the construction of BGS-NMF model as well as the inference procedure based on MCMC algorithm. The conditional posterior distributions of different parameters and hyperparameters are derived in the sampling procedure. Section 4 reports a series of experiments on underdetermined music source separation with different music sources. The convergence condition in MCMC sampling is investigated. The evaluation of demixed signals in terms of signal-to-interference ratio is reported. Finally, the conclusions drawn by this study are provided in Section 5.
2 Background survey
In what follows, nonnegative matrix factorization (NMF) and its extensions to different regularization functions are introduced. Several approaches to group basis representation are addressed. Group sparse coding is surveyed. Then Bayesian learning methods for matrix factorization and other related tasks are introduced.
2.1 Nonnegative matrix factorization
where ⊗ and ⊘ denote element-wise multiplication and division, respectively.
2.2 Group basis representation
In (5), the second and third terms are seen as the ℓ_{2} regularization functions, the fourth term enforces the distance between different common bases to be small, and the fifth term enforces the distance between different individual bases to be large. Regularization parameters $\{{\eta}_{\mathrm{a}},{\eta}_{{\mathrm{a}}_{\mathrm{r}}},{\eta}_{{\mathrm{a}}_{\mathrm{h}}}\}$ are used. The NMPCFs in [21, 22] and GNMF in [11] did not consider sparsity in group basis representation.
All the instances within a group $\mathcal{G}$ share the same dictionary D with basis vectors ${\left\{{A}_{j}\right\}}_{j=1}^{\left|D\right|}$. The weight matrix ${\left\{{S}_{j}\right\}}_{j=1}^{\left|D\right|}$ consists of nonnegative vectors ${S}_{j}={[{S}_{j}^{1},\dots ,{S}_{j}^{\left|\mathcal{G}\right|}]}^{T}$. The weight parameters $\left\{{S}_{j}^{k}\right\}$ are estimated for different group instances $k\in \mathcal{G}$ using different bases $j\in \mathcal{D}$. In (6), ℓ_{1} regularization term is incorporated to carry out group sparse coding. The group sparsity was further extended to structural sparsity for dictionary learning and basis representation. Nevertheless, nonnegative constraints were not imposed on bases {A_{ j }} and observed signals {X_{ k }}. Basically, all the above-mentioned methods [2, 11, 21–24] did not apply probabilistic framework. No Bayesian learning was considered.
2.3 Bayesian learning approaches
Model regularization is critical for improving the generalization of a learning machine to new data [7]. Conducting Bayesian learning shall compensate the variations of the estimated parameters and accordingly improve model regularization. Typically, NMF and group basis representation are viewed as learning machine which is based on a set of bases. Following the perspective of relevance vector machines [8, 9], Bayesian sparse learning is beneficial to identify relevant bases for regularized basis representation. To do so, sparse priors based on Student t distribution [17] and Laplacian distribution [10, 25] could act as regularization functions and merged with likelihood function to come up with a posteriori probability. Maximizing the logarithm of a posteriori probability is equivalent to minimizing the ℓ_{1}-regularized error function if Laplacian prior is applied. Hyperparameters of sparse priors are then used as the regularization parameter which controls the trade-off between a reconstruction error function and a sparsity-favorable penalty function.
The regularization terms are determined from hyperparameters by η_{a}=α_{a}/α and η_{s}=α_{s}/α. Bayesian learning of PMF was performed through MCMC algorithm where Gaussian-Wishart priors for Gaussian mean vectors and precision matrices were assumed. There was no constraint on nonnegative matrices by using PMF. No sparse learning was considered.
In [27], a full Bayesian NMF was implemented to determine the number of bases according to the marginal likelihood. Furthermore, Bayesian nonparametric approach to NMF was proposed in [28] where model structure was determined through Gamma process NMF. This method was applied to find both latent sources in spectrograms and their number. In [25], the group sparse coding [23] was upgraded with Bayesian interpretation. Bayesian sparse learning was only developed for single-sample basis representation. In [29], the group sparse priors were presented for maximum a posteriori estimation of covariance matrix which was used in Gaussian graphical model. More recently, the group sparse hidden Markov models (HMMs) [30] were proposed to represent a sequence of observations and have been successfully applied for speech recognition. A set of common bases were shared for representation of speech samples across HMM states, while a set of individual bases were employed to represent speech samples within individual HMM states. Bayesian group sparse learning was performed for speech recognition [30] and signal separation [20] by using Laplacian scale mixture distribution.
3 Bayesian group sparse matrix factorization
Previous NMF methods [11, 13, 21] were developed to extract task-specific nonnegative factors, but they did not simultaneously consider the uncertainty of model parameters and control the sparsity of weight parameters. In [23, 25], the group sparse coding and its Bayesian extension did not impose nonnegative constraints in data matrix X and factorized matrices A and S. This paper presents a new Bayesian group sparse learning for NMF (denoted by BGS-NMF) and applied it for single-channel music source separation.
3.1 Model construction
BGS-NMF model is therefore constructed with parameters ${\Theta}^{\left(l\right)}=\{{A}_{\mathrm{r}},{A}_{\mathrm{h}}^{\left(l\right)},{S}_{\mathrm{r}}^{\left(l\right)},{S}_{\mathrm{h}}^{\left(l\right)},{\Sigma}^{\left(l\right)}\}$.
3.2 Priors for Bayesian group sparse learning
where ${\Phi}_{\mathrm{a}}^{\left(l\right)}=\left\{\right\{{\alpha}_{\mathit{\text{rj}}},{\beta}_{\mathit{\text{rj}}}\},\{{\alpha}_{\mathit{\text{hj}}}^{\left(l\right)},{\beta}_{\mathit{\text{hj}}}^{\left(l\right)}\left\}\right\}$ denotes the hyperparameters of gamma distributions and {D_{r},D_{h}} denote the numbers of common bases and individual bases, respectively. Gamma distribution is an exponential family distribution for nonnegative data. Its two parameters {α,β} can be adjusted to fit different shapes of distributions. In (11) and (12), all entries in matrices A_{r} and ${A}_{\mathrm{h}}^{\left(l\right)}$ are assumed to be independent.
which is controlled by a positive continuous mixture parameter ${\lambda}_{\mathit{\text{rj}}}^{\left(l\right)}\ge 0$. Considering a gamma distribution for inverse scale parameter, i.e., $p\left({\lambda}_{\mathit{\text{rj}}}^{\left(l\right)}\right)=\mathcal{G}\left({\lambda}_{\mathit{\text{rj}}}^{\left(l\right)}\right|{\gamma}_{\mathit{\text{rj}}}^{\left(l\right)},{\delta}_{\mathit{\text{rj}}}^{\left(l\right)})$, the marginal distribution of a reconstruction weight can be calculated by [25]
where $\{{\eta}_{\mathrm{a}},{\eta}_{{\mathrm{s}}_{\mathrm{r}}},{\eta}_{{\mathrm{s}}_{\mathrm{h}}}\}$ denote the regularization parameters for two groups of bases and reconstruction weights. Some BGS-NMF parameters or hyperparameters have been absorbed in these regularization parameters. Comparing with the objective functions (3) for NMPCF, (5) for GNMF, and (8) for PMF, the optimization of (15) for BGS-NMF shall lead to two groups of signals which are reconstructed from the sparse common bases A_{r} and sparse individual bases ${A}_{\mathrm{h}}^{\left(l\right)}$. The regularization terms due to two gamma bases are additionally considered. Different from the Bayesian NMF (BNMF) [15], BGS-NMF conducts group sparse learning which does not only characterize the within-segment harmonic information but also represent the across-segment rhythmic regularity. Sparse sets of basis vectors are further determined for sparse representation. Basically, BGS-NMF follows a general objective function. By applying different hyperparameter values $\{{\alpha}_{\mathit{\text{rj}}},{\beta}_{\mathit{\text{rj}}},{\alpha}_{\mathit{\text{hj}}}^{\left(l\right)},{\beta}_{\mathit{\text{hj}}}^{\left(l\right)}\}$, probability structures, and prior distributions for $\{{A}_{\mathrm{r}},{A}_{\mathrm{h}}^{\left(l\right)},{S}_{\mathrm{r}}^{\left(l\right)},{S}_{\mathrm{h}}^{\left(l\right)}\}$, BGS-NMF can be realized to find solutions to NMF [2], NMPCF [21], GNMF [11], PMF [26], and BNMF [15]. Notably, the objective function in (15) is written for comparative study among different methods. This function only considers BGS-NMF based on Laplacian prior. BGS-NMF algorithms with Laplacian prior and LSM prior shall be both implemented in the experiments. Nevertheless, in what follows, we address the model inference procedure for BGS-NMF with LSM prior.
3.3 Model inference
The full Bayesian framework for BGS-NMF model based on the posterior distribution of parameters and hyperparameters p(Θ,Φ|X) is not analytically tractable. A stochastic optimization scheme is adopted. We develop a MCMC sampling algorithm for approximate inference through iteratively generating samples of parameters Θ and hyperparameters Φ according to the posterior distribution. This algorithm converges by those samples. The key idea of MCMC sampling is to simulate a stationary ergodic Markov chain whose samples asymptotically follow the posterior distribution p(Θ,Φ|X). The estimates of parameters Θ and hyperparameters Φ are then computed via Monte Carlo integrations on the simulated Markov chains. For simplicity, the segment index l is neglected in derivation of MCMC algorithm for BGS-NMF. At each new iteration t+1, the BGS-NMF parameters Θ^{(t+1)} and hyperparameters Φ^{(t+1)} are sequentially sampled in an order of {A_{r},S_{r},A_{h},S_{h},Σ,α_{r},β_{r},α_{h},β_{h},λ_{r},λ_{h},γ_{r},δ_{r},γ_{h},δ_{h}} according to their corresponding conditional posterior distributions. In this subsection, we describe the calculation of conditional posterior distributions under BGS-NMF parameters {A_{r},S_{r},A_{h},S_{h},Σ}. The conditional posterior distributions for hyperparameters {α_{r},β_{r},α_{h},β_{h},λ_{r},λ_{h},γ_{r},δ_{r},γ_{h},δ_{h}} are derived in the Appendix.
In (19), the mode ${\mu}_{{A}_{\mathit{\text{rij}}}}^{\text{inst}}$ is obtained by finding the roots of a quadratic equation of [A_{r}]_{ i j } which appears in the exponent of the posterior distribution in (18). Derivation for the mode ${\mu}_{{A}_{\mathit{\text{rij}}}}^{\text{inst}}$ is detailed in the Appendix. In case of complex-valued root or negative-valued root, the mode is forced by ${\mu}_{{A}_{\mathit{\text{rij}}}}^{\text{inst}}=0$. The width of instrumental distribution is controlled by ${\left[{\sigma}_{{A}_{\mathit{\text{rij}}}}^{\text{inst}}\right]}^{2}={\left[{\sigma}_{{A}_{\mathit{\text{rij}}}}^{\text{post}}\right]}^{2}$.
where ${\mu}_{{S}_{\mathit{\text{rjk}}}}^{\text{post}}={\mu}_{{S}_{\mathit{\text{rjk}}}}^{\text{likel}}-{\lambda}_{\mathit{\text{rj}}}^{\left(t\right)}{\left[{\sigma}_{{S}_{\mathit{\text{rjk}}}}^{\text{likel}}\right]}^{2}$ and ${\left[{\sigma}_{{S}_{\mathit{\text{rjk}}}}^{\text{post}}\right]}^{2}={\left[{\sigma}_{{S}_{\mathit{\text{rjk}}}}^{\text{likel}}\right]}^{2}$. Notably, the hyperparameters $\{{\gamma}_{\mathit{\text{rj}}}^{(t+1)},{\delta}_{\mathit{\text{rj}}}^{(t+1)}\}$ in LSM prior are also sampled and used to sample LSM parameter ${\lambda}_{\mathit{\text{rj}}}^{(t+1)}$ based on a gamma distribution. Here, Metropolis-Hastings algorithm is applied again. The best instrumental distribution q([S_{ r }]_{ j k }) is selected to fit (22). This distribution is derived as a truncated Gaussian distribution ${\mathcal{N}}_{+}\left({\left[{S}_{r}\right]}_{\mathit{\text{jk}}}\right|{\mu}_{{S}_{\mathit{\text{rjk}}}}^{\text{inst}},{\left[{\sigma}_{{S}_{\mathit{\text{rjk}}}}^{\text{inst}}\right]}^{2})$ where the mode ${\mu}_{{S}_{\mathit{\text{rjk}}}}^{\text{inst}}$ is derived by finding the root of a quadratic equation of [S_{r}]_{ j k } and the width is obtained by ${\left[{\sigma}_{{S}_{\mathit{\text{rjk}}}}^{\text{inst}}\right]}^{2}={\left[{\sigma}_{{S}_{\mathit{\text{rjk}}}}^{\text{post}}\right]}^{2}$. In addition, the conditional posterior distributions for sampling the individual basis parameter ${\left[{A}_{\mathrm{h}}^{(t+1)}\right]}_{\mathit{\text{ij}}}$ and its reconstruction weight ${\left[{S}_{\mathrm{h}}^{(t+1)}\right]}_{\mathit{\text{jk}}}$ are similar to those for sampling ${\left[{A}_{\mathrm{r}}^{(t+1)}\right]}_{\mathit{\text{ij}}}$ and ${\left[{S}_{\mathrm{r}}^{(t+1)}\right]}_{\mathit{\text{jk}}}$, respectively. We do not address these two distributions.
With these posterior estimates, the rhythmic source and the harmonic source are calculated by ${\xc2}_{\mathrm{r}}{\u015c}_{\mathrm{r}}$ and ${\xc2}_{\mathrm{h}}{\u015c}_{\mathrm{h}}$, respectively. The BGS-NMF algorithm is completed. Different from BNMF [15], the proposed BGS-NMF conducts a group sparse learning based on LSM distribution. Common bases A_{r} are shared for different data segments l. The group sparse learning performs well in our experiments.
4 Experiments
In this study, BGS-NMF is implemented to estimate two audio source signals from a single-channel mixed signal. One source signal contains rhythmic pattern which is constructed by the bases shared for all audio segments while the other source contains harmonic information which is represented via bases from individual segments. Bayesian sparse learning is performed to conduct probabilistic reconstruction based on the relevant group bases. Some experiments are reported to evaluate the performance of model inference and signal reconstruction.
4.1 Experimental setup
The interference was measured by the Euclidean distance between original signal $\left\{{X}_{k}^{\left(l\right)}\right\}$ and reconstructed signal $\left\{{\widehat{X}}_{k}^{\left(l\right)}\right\}$ for different samples k in different segments l. These signals include rhythmic signals $\left\{{\left[{\xc2}_{\mathrm{r}}{\u015c}_{\mathrm{r}}^{\left(l\right)}\right]}_{k}\right\}$ and harmonic signals $\left\{{\left[{\xc2}_{\mathrm{h}}^{\left(l\right)}{\u015c}_{\mathrm{h}}^{\left(l\right)}\right]}_{k}\right\}$.
For system initialization at t=0, we detected two short segments with only rhythmic signal and harmonic signal and applied them for finding rhythmic parameters $\{{A}_{\mathrm{r}}^{\left(0\right)},{S}_{\mathrm{r}}^{\left(0\right)}\}$ and harmonic parameters $\{{A}_{\mathrm{h}}^{\left(0\right)},{S}_{\mathrm{h}}^{\left(0\right)}\}$, respectively. This prior information was used to implement five NMF methods for single-channel source separation. We carried out baseline NMF [2], Bayesian NMF (BNMF) [15], group-based NMF (GNMF) [11] (or NMPCF [22]), and the proposed BGS-NMF under consistent experimental conditions. To evaluate the effect of sparse priors in BGS-NMF for music source separation, we additionally realized BGS-NMF by applying Laplacian distribution. For this realization, the sampling steps of LSM parameters $\{{\gamma}_{{r}_{j}},{\delta}_{{r}_{j}},{\gamma}_{{h}_{j}},{\delta}_{{h}_{j}}\}$ were ignored. The BGS-NMFs with Laplacian distribution (denoted by BGS-NMF-LP) and LSM distribution (BGS-NMF-LSM) were compared. All these NMFs were implemented for different segments l. Basically, the NMF model [2] was realized by using multiplicative updating algorithm in (4). The BNMF [15] conducted Bayesian learning of NMF model where MCMC sampling was performed, and gamma distributions were assumed for bases and reconstruction weights. No group sparse learning was considered in NMF and BNMF. Using NMPCF [22] or GNMF [11], the common bases and individual bases were constructed by applying multiplicative updating algorithm. No probabilistic framework was involved. The ℓ_{2}-norm regularization for basis parameters A_{r} and ${A}_{\mathrm{h}}^{\left(l\right)}$ was considered. There was no sparseness constraint imposed on reconstruction weight parameters ${S}_{\mathrm{r}}^{\left(l\right)}$ and ${S}_{\mathrm{h}}^{\left(l\right)}$. Only the result of GNMF method was reported. Using GNMF, the regularization parameters in (5) were empirically determined as $\{{\eta}_{\mathrm{a}}=0.35,{\eta}_{{\mathrm{a}}_{\mathrm{r}}}=0.2,{\eta}_{{\mathrm{a}}_{\mathrm{h}}}=0.2\}$. Nevertheless, the Bayesian group sparse learning is presented in BGS-NMF-LP and BGS-NMF-LSM algorithms. Using this algorithm, the uncertainties of bases and reconstruction weights are represented by gamma distributions and LSM distributions, respectively. MCMC algorithm is developed to sample BGS-NMF parameters Θ^{(t+1)} and hyperparameters Φ^{(t+1)}. The groups of common bases A_{r} and individual bases A_{h} are estimated to capture between-segment repetitive patterns and within-segment residual information, respectively. The relevant bases are detected via sparse priors in accordance with Laplacian or LSM distributions. Using BGS-NMF-LP, we sampled the parameters and hyperparameters by using different frames from six music signals and automatically calculated the averaged values of regularization parameters in (15) as $\{{\eta}_{\mathrm{a}}=0.41,{\eta}_{{\mathrm{s}}_{\mathrm{r}}}=0.31,{\eta}_{{\mathrm{s}}_{\mathrm{h}}}=0.26\}$. The regularization parameters in (5) and (15) reflect different physical meanings in objective function. The computational cost and the model size are also examined. The computation times of running MATLAB codes were measured by a personal computer with Intel Core 2 Duo 2.4-GHz CPU and 4-GB RAM. In our investigation, the computation times of demixing an audio signal with 21 s long were measured as 3.1, 12.1, 16.2, 20.9, and 21.2 min by using NMF, BNMF, GNMF, and the proposed BGS-NMF-LP and BGS-NMF-LSM respectively. In addition, BNMF, GNMF, BGS-NMF-LP, and BGS-NMF-LSM were measured to be 2.5, 4.5, 5.2, and 5.3 times the model size of the baseline NMF respectively.
4.2 Evaluation for MCMC iterative procedure
4.3 Evaluation for single-channel music source separation
Comparison of SIR (in dB) of the reconstructed rhythmic signal and harmonic signal based on NMF, BNMF, GNMF, BGS-NMF-LP and BGS-NMF-LSM
NMF | BNMF | GNMF | BGS-NMF-LP | BGS-NMF-LSM | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Rhythmic | Harmonic | Rhythmic | Harmonic | Rhythmic | Harmonic | Rhythmic | Harmonic | Rhythmic | Harmonic | |
Music 1 | 6.47 | 4.17 | 6.33 | 4.29 | 9.19 | 6.10 | 9.61 | 8.32 | 9.86 | 8.63 |
Music 2 | 6.30 | 1.10 | 8.08 | 5.18 | 8.22 | 3.03 | 8.33 | 7.13 | 8.55 | 7.45 |
Music 3 | 3.89 | -1.11 | 5.16 | 3.80 | 6.01 | 3.22 | 8.44 | 8.52 | 8.63 | 8.79 |
Music 4 | 2.66 | 6.03 | 3.28 | 6.28 | 3.59 | 8.36 | 7.97 | 9.52 | 8.20 | 9.78 |
Music 5 | 1.85 | 3.71 | 3.03 | 2.55 | 3.97 | 6.44 | 8.11 | 8.22 | 8.35 | 8.50 |
Music 6 | 1.06 | 6.37 | 3.34 | 5.56 | 2.78 | 7.10 | 5.00 | 6.93 | 5.19 | 7.23 |
Average | 3.71 | 3.38 | 4.87 | 4.61 | 5.63 | 5.71 | 7.91 | 8.11 | 8.13 | 8.40 |
5 Conclusions
This paper has presented the Bayesian group sparse learning and applied it for single-channel nonnegative source separation. The basis vectors in NMF were grouped into two partitions. The first group was the common bases which were used to explore the inter-segment repetitive characteristics, while the second was the individual bases which were applied to represent the intra-segment harmonic information. The LSM distribution was introduced to express sparse reconstruction weights for two groups of basis vectors. Bayesian learning was incorporated into group basis representation with model regularization. The MCMC algorithm or the Metropolis-Hastings algorithm was developed to conduct approximate inference of model parameters and hyperparameters. Model parameters were used to find the decomposed rhythmic signals and harmonic signals. Hyperparameters were used to control the sparsity of reconstructed weights and the generation of basis parameters. In the experiments, we implemented the proposed BGS-NMFs for underdetermined source separation. The convergence condition of sampling procedure for approximate inference was investigated. The performance of BGS-NMF-LP and BGS-NMF-LSM was shown to be robust to the different kinds of rhythmic and harmonic sources and mixing conditions. BGS-NMF-LSM outperformed the other NMFs in terms of SIRs. The BGS-NMF controlled by LSM distribution performed better than that controlled by Laplacian distribution. In the future, the system performance of BGS-NMF may be further improved by some other considerations. For example, the numbers of common bases and individual bases could be automatically selected according to Bayesian framework by using marginal likelihood. The group sparse learning could be extended for constructing hierarchical NMF where hierarchical grouping of basis vectors is examined. The underdetermined separation under different number of sources and sensors could be tackled. Also, the online learning could be involved to update segment-based parameters and hyperparameters [33, 34]. The evolutionary BGS-NMFs shall work for nonstationary single-channel blind source separation. In addition, more evaluations shall be conducted by using realistic data with larger amount of mixed speech signals from different application domains, such as meetings and call centers.
Appendix
Derivations for inference of BGS-NMF parameters and hyperparameters
On the other hand, following the model inference in Section 3.3, we continue to describe the MCMC sampling algorithm and the calculation of conditional posterior distributions for the remaining BGS-NMF hyperparameters {α_{r},β_{r},α_{h},β_{h},λ_{r},λ_{h},γ_{r},δ_{r},γ_{h},δ_{h}}.
where ${\lambda}_{{\alpha}_{\mathit{\text{rj}}}}^{\text{post}}=ln\underset{\mathit{\text{rj}}}{\overset{\left(t\right)}{\beta}}+(1/{D}_{\mathrm{r}})\sum _{j=1}^{{D}_{\mathrm{r}}}ln{\left[\underset{\mathrm{r}}{\overset{(t+1)}{A}}\right]}_{\mathit{\text{ij}}}-(1/{D}_{\mathrm{r}}){\lambda}_{{\alpha}_{\mathit{\text{rj}}}}$. This distribution does not belong to a known family, so the Metropolis-Hastings algorithm is applied. An instrumental distribution q(α_{ r j }) is obtained by fitting the term within the brackets of (30) through a gamma distribution as detailed in [15].
The resulting distribution is arranged as a new gamma distribution $\mathcal{G}\left({\beta}_{\mathit{\text{rj}}}\right|{\alpha}_{{\beta}_{\mathit{\text{rj}}}}^{\text{post}},{\beta}_{{\beta}_{\mathit{\text{rj}}}}^{\text{post}})$ where ${\alpha}_{{\beta}_{\mathit{\text{rj}}}}^{\text{post}}=1+{D}_{\mathrm{r}}{\alpha}_{\mathit{\text{rj}}}^{(t+1)}+{\alpha}_{{\beta}_{\mathit{\text{rj}}}}$ and ${\beta}_{{\beta}_{\mathit{\text{rj}}}}^{\text{post}}=\sum _{j=1}^{{D}_{\mathrm{r}}}{\left[{A}_{r}^{(t+1)}\right]}_{\mathit{\text{ij}}}+{\beta}_{{\beta}_{\mathit{\text{rj}}}}$. Here, we do not describe the sampling of ${\alpha}_{\mathit{\text{hj}}}^{(t+1)}$ and ${\beta}_{\mathit{\text{hj}}}^{(t+1)}$ since the conditional posterior distributions for sampling these two hyperparameters are similar to those for sampling of ${\alpha}_{\mathit{\text{rj}}}^{(t+1)}$ and ${\beta}_{\mathit{\text{rj}}}^{(t+1)}$.
where ${\lambda}_{{\gamma}_{\mathit{\text{rj}}}}^{\text{post}}=ln\underset{\mathit{\text{rj}}}{\overset{\left(t\right)}{\delta}}+\frac{{\gamma}_{\mathit{\text{rj}}}-1}{{\gamma}_{\mathit{\text{rj}}}}ln\underset{\mathit{\text{rj}}}{\overset{(t+1)}{\lambda}}-{\lambda}_{{\gamma}_{\mathit{\text{rj}}}}$. Again, we need to find an instrumental distribution q(γ_{ r j }) which optimally fits the conditional posterior distribution $p\left({\gamma}_{\mathit{\text{rj}}}\right|{\lambda}_{\mathit{\text{rj}}}^{(t+1)},{\delta}_{\mathit{\text{rj}}}^{\left(t\right)})$. An approximate gamma distribution is found accordingly. The Metropolis-Hastings algorithm is then applied.
This distribution can be arranged as a new gamma distribution $\mathcal{G}\left({\delta}_{\mathit{\text{rj}}}\right|{\alpha}_{{\delta}_{\mathit{\text{rj}}}}^{\text{post}},{\beta}_{{\delta}_{\mathit{\text{rj}}}}^{\text{post}})$ where ${\alpha}_{{\delta}_{\mathit{\text{rj}}}}^{\text{post}}={D}_{\mathrm{r}}{\gamma}_{\mathit{\text{rj}}}^{(t+1)}+{\alpha}_{{\delta}_{\mathit{\text{rj}}}}$ and ${\beta}_{{\delta}_{\mathit{\text{rj}}}}^{\text{post}}={\lambda}_{\mathit{\text{rj}}}^{(t+1)}+{\beta}_{{\delta}_{\mathit{\text{rj}}}}$. Similarly, the conditional posterior distributions for sampling ${\gamma}_{\mathit{\text{hj}}}^{(t+1)}$ and ${\delta}_{\mathit{\text{hj}}}^{(t+1)}$ could be formulated by referring those for sampling ${\gamma}_{\mathit{\text{rj}}}^{(t+1)}$ and ${\delta}_{\mathit{\text{rj}}}^{(t+1)}$, respectively.
Declarations
Acknowledgments
The authors acknowledge anonymous reviewers for their constructive feedback and helpful suggestions. This work has been partially supported by the National Science Council, Taiwan, Republic of China, under contract NSC 100-2628-E-009-028-MY3.
Authors’ Affiliations
References
- Cichocki A, Zdunek R, Amari S: New algorithms for non-negative matrix factorization in applications to blind source separation. In Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP). Piscataway: IEEE,; 2006:621-624.Google Scholar
- Hoyer PO: Non-negative matrix factorization with sparseness constraints. J. Mach. Lear. Res 2004, 5: 1457-1469.MATHMathSciNetGoogle Scholar
- Chien J-T, Hsieh H-L: Convex divergence ICA for blind source separation. IEEE Trans. Audio, Speech, Language Process 2012, 20(1):290-301.View ArticleGoogle Scholar
- Kompass R: A generalized divergence measure for nonnegative matrix factorization. Neural Comput 2007, 19: 780-791. 10.1162/neco.2007.19.3.780MATHMathSciNetView ArticleGoogle Scholar
- Lee H, Yoo J, Choi S: Semi-supervised nonnegative matrix factorization. IEEE Signal Process. Lett 2010, 17(1):4-7.View ArticleGoogle Scholar
- Plumbley MD: Algorithms for nonnegative independent component analysis. IEEE Trans. Neural Netw 2003, 14(3):534-543. 10.1109/TNN.2003.810616View ArticleGoogle Scholar
- Bishop CM: Pattern Recognition and Machine Learning. New York: Springer Science; 2006.MATHGoogle Scholar
- Saon G, Chien J-T: Bayesian sensing hidden Markov models. IEEE Trans. Audio, Speech Language, Process 2012, 20(1):43-54.View ArticleGoogle Scholar
- Tipping ME: Sparse Bayesian learning and the relevance vector machine. J Mach. Learn. Res 2001, 1: 211-244.MATHMathSciNetGoogle Scholar
- Babacan SD, Molina R, Katsaggelos AK: Bayesian compressive sensing using Laplace priors. IEEE Trans. Image Process 2010, 19(1):53-63.MathSciNetView ArticleGoogle Scholar
- Lee H, Choi S: Group nonnegative matrix factorization for EEG classification. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS). JMLR; 2009:320-327.Google Scholar
- Lefevre A, Bach F, Fevotte C, Itakura-Saito: Nonnegative matrix factorization with group sparsity. In Proceedings of the International Conference on Acoustic, Speech and Signal Processing (ICASSP). Prague Congress Center; 22–27 May 2011:21-24.Google Scholar
- Kim M, Yoo J, Kang K, Choi S: Blind rhythmic source separation: nonnegativity and repeatability. In Proceedings of the International Conference on Acoustic, Speech and Signal Processing (ICASSP). Piscataway: IEEE,; 2010:2006-2009.Google Scholar
- AT: Bayesian inference for nonnegative matrix factorization models. University of Cambridge, Technical Report CUED/F-INFENG/TR.609, 2008Google Scholar
- Moussaoui S, Brie D: Mohammad-A Djafari, C Carteret, Separation of non-negative mixture of non-negative sources using a Bayesian approach and MCMC sampling. IEEE Trans. Signal Process 2006, 54(11):4133-4145.View ArticleGoogle Scholar
- Schmidt MN, Winther O, Hansen LK: Bayesian non-negative matrix factorization. In Proceedings of the International Conference on Independent Component Analysis and Signal Separation, Paraty, March 2009. Lecture Notes in Computer Science. Heidelberg: Springer,; 2009:540-547.Google Scholar
- Fevotte C, Godsill SJ: A Bayesian approach for blind separation of sparse sources. IEEE Trans. Audio, Speech, Language Process 2006, 14(6):2174-2188.View ArticleGoogle Scholar
- Duan Z, Zhang Y, Zhang C, Shi Z: Unsupervised single-channel music source separation by average harmonic structure modeling. IEEE Trans. on Audio, Speech, Language Process 2008, 16(4):766-778.View ArticleGoogle Scholar
- Schmidt MN, Olsson RK: Single-channel speech separation using sparse non-negative matrix factorization. In Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Pittsburgh; 17–21 September 2006:2614-2617.Google Scholar
- Chien J-T, Hsieh H-L: Bayesian group sparse learning for nonnegative matrix factorization. In Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Portland; 9–13 September 2012:1552-1555.Google Scholar
- Yoo J, Kim M, Kang K, Choi S: Nonnegative matrix partial co-factorization for drum source separation. In Proceedings of the International Conference on Acoustic, Speech and Signal Processing (ICASSP). Piscataway: IEEE,; 2010:1942-1945.Google Scholar
- Kim M, Yoo J, Kang K, Choi S: Nonnegative matrix partial co-factorization for spectral and temporal drum source separation. IEEE J. Sel. Top. Signal Process 2011, 5(6):1192-1204.View ArticleGoogle Scholar
- Bengio S, Pereira F, Singer Y, Strelow D: Group sparse coding. In Advances in Neural Information Processing Systems (NIPS). La Jolla: NIPS; 2009:82-89.Google Scholar
- Jenatton R, Mairal J, Obozinski G, Bach F: Proximal methods for sparse hierarchical dictionary learning. In Proceedings of the International Conference on Machine Learning (ICML). Haifa; 21–25 June 2010.Google Scholar
- Garrigues PJ, Olshausen BA: Group sparse coding with a Laplacian scale mixture prior. In Advances in Neural Information Processing Systems (NIPS). La Jolla: NIPS; 2010:676-684.Google Scholar
- Salakhutdinov R, Mnih A: Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the International Conference on Machine Learning (ICML). Helsinki; 5–9 July 2008:880-887.View ArticleGoogle Scholar
- Zhong M, Girolami M: Reversible jump MCMC for non-negative matrix factorization. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS). Clearwater Beach; 16–18 April 2009:663-670.Google Scholar
- Hoffman MD, Blei DM, Cook PR: Bayesian nonparametric matrix factorization for recorded music. In Proceedings of the International Conference on Machine Learning (ICML). Haifa; 21–24 June 2010.Google Scholar
- Marlin M, Murphy KP, BM: Group sparse priors for covariance estimation. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI). Montreal; 18–21 June 2009:383-392.Google Scholar
- Chien J-T, Chiang C-C: Group sparse hidden Markov models for speech recognition. In Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Portland; 9–13 September 2012:2646-2649.Google Scholar
- Chien J-T, Ting C-W: Factor analyzed subspace modeling and selection. IEEE Trans. Audio, Speech Language Process 2008, 16(1):239-248.View ArticleGoogle Scholar
- Chib S, Greenberg E: Understanding the Metropolis-Hastings algorithm. Am. Statistician 1995, 49(4):327-335.Google Scholar
- Chien J-T, Hsieh H. -L: Nonstationary source separation using sequential and variational Bayesian learning. IEEE Trans. Neural Netw. Learn. Syst 2013, 24(5):681-694.View ArticleGoogle Scholar
- Hsieh H-L, Chien J-T: Nonstationary and temporally-correlated source separation using Gaussian process. In Proceedings of the International Conference on Acoustic, Speech and Signal Processing (ICASSP). Prague Congress Center; 22–27 May 2011:2120-2123.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.