 Research
 Open access
 Published:
Basis approach to estimate the instantaneous frequencies in multicomponent AMFM signals
EURASIP Journal on Audio, Speech, and Music Processing volume 2014, Article number: 8 (2014)
Abstract
In this paper, an analytical approach to estimate the instantaneous frequencies of a multicomponent signal is presented. A nonstationary signal composed of oscillation modes or resonances is described by a multicomponent AMFM model. The proposed method has two main stages. At first, the signal is decomposed into its oscillation components. Afterwards, the instantaneous frequency of each component is estimated. The decomposition stage is performed through the basis expansion exploiting orthogonal rational functions in the complex plane. Orthogonal rational bases are generalized to expand linear timevarying systems. To decompose the nonstationary signal, its equivalent timevarying system is sought. The timevarying poles of this system are required to construct appropriate basis functions. An adaptive data segmentation algorithm is provided for this purpose. The effect of noise is scrutinized analytically and evaluated experimentally to verify the robustness of the new method. The performance of this method in extraction of embedded instantaneous frequencies is asserted by simulations on both synthetic data and realworld audio signal.
1 Introduction
Nonstationary signals which are a compound of constituents with timevarying amplitudes and frequencies can be characterized by amplitudemodulated frequencymodulated (AMFM) models. This modeling is attended to express genuine signals in communications [1, 2], acoustic and speech processing [3–6], biomedical signal processing [7], and image processing [8]. To estimate instantaneous amplitudes (IAs) and instantaneous frequencies (IFs) embedded in a multicomponent signal [9], the first step is decomposing it into oscillation components. This procedure is termed ‘demodulation’, ‘separation’ or ‘decomposition’ in the literature. Each component should represent a valid oscillation for which the definition of instantaneous frequency is physically meaningful [10].
All methods of multicomponent AMFM signal decomposition and parameter estimation can be categorized into nonparametric and parametric methods. One class of nonparametric approaches is based on the joint timefrequency processing. Widespread timefrequency distributions (TFDs) such as shorttime Fourier transform (STFT), WignerVille distribution, and ChoiWilliams distribution are employed [11]. These methods are limited by the wellknown compromise between time and frequency resolution. Moreover, the crossterms appear troublesome. The energy separation algorithm (ESA) which uses a nonlinear differential operator, called TeagerKaiser energy operator (TKEO), is another method [12]. The energy separation algorithm which tracks the energy of the source producing the signal is originally applicable for monocomponent AMFM signals. Nevertheless, they are modified for application in multicomponent cases by designing a bank of filters. MultibandESA (MESA) that consists of bandpass filtering followed by monocomponent energy separation is introduced based on this concept [13]. The separation performed by bandpass filtering is proper when components are spectrally far enough.
Huang et al. [14] proposed an iterative technique known as the empirical mode decomposition (EMD). This technique is an algorithmic way to extract oscillation modes embedded in the signal, named intrinsic mode functions (IMFs). Each IMF gives a valid IF which is estimated by applying the Hilbert transform (HT) [14]. The proposed algorithm for the implementation of EMD called the sifting process showed several drawbacks such as sensitivity to perturbation and mode mixing problem [15]. To overcome these difficulties, the original EMD is modified, and a new algorithm is developed which is named ensemble empirical mode decomposition (EEMD) [15]. This method is indeed the iteration of EMD for noiseadded signals. In each iteration, controlled white noise is added to the data, and EMD is applied. Each individual trial may generate noisy results, but the noise is canceled out by taking the average of the results. Thus, true solution is the ensemble mean of enough trials. Although each trial produces a set of IMFs, the sum of IMFs is not necessarily an IMF. An empirical solution for this issue is suggested in [15]. EMD and EEMD methods suffer from the lack of analytic foundation. Some research has attempted to establish and improve analytical aspects of these empirical approaches [16]. In [17], an alternative algorithm for EMD is introduced based on iterating certain filters, such as Toeplitz filters. The results of iterative filtering are similar to those of the conventional sifting process. Although the authors of [17] do not claim superiority for their method, it lays down a mathematical framework for an alternative approach to EMD. The convergence of iterative filtering EMD is studied in [18]. A variant of EMD to decompose multiscale data is proposed in [19]. This work provides some theoretical understanding of EMD for a class of multiscale data and introduces two algorithms, NewtonRaphsonbased EMD and ODEbased EMD, as the variations of the sifting process. The decomposition of multiscale data based on EMD is pursued in [20] inspired by the compressed sensing theory. The sparsest representation of multiscale data is sought within the largest possible dictionary constructed of IMFs. The problem is formulated as a nonlinear L^{1} optimization, and an iterative algorithm is proposed to solve it. Noise and perturbation in data may cause numerical instability in this method. Daubechies et al. developed a method which captures the philosophy of EMD and decomposes special functions in a defined class [21]. This method employs a combination of wavelet analysis and reallocation technique called synchrosqueezing transform which aim to sharpen a timefrequency representation. Synchrosqueezed wavelet transform is also investigated for signal sampling and denoising applications in multicomponent signal analysis [22]. In [23], an algorithm for AMFM parameter estimation is proposed based on the iterated application of the Hilbert transform to amplitude envelopes obtained by adaptively lowpass filters. Furthermore, the IF of AMFM components can be calculated by a posteriori adaptive segmentation of the acquired phase signal. Another iterative AMFM decomposition is suggested in [24] using the quasiharmonic model (QHM) for quasiharmonic signals such as voiced speech.
There are various parametric approaches to extract IFs of multicomponent signals. One common approach is based on signal segmentation, while some simplifying assumptions such as constant frequency seem logical in each segment. Then, an estimator is designed to estimate the model parameters segment by segment. In [25], the maximum windowed likelihood (MWL) criterion is used to estimate the AMFM components. The high nonlinearity of this method makes the necessary optimization difficult. Another parametric approach is based on the statistical modeling of the signal according to its statistical attributes and assumptions. Speech signals are statistically modeled as AMFM signals, and the extended Kalman filter (EKF) is applied for demodulation [3]. The idea of EKF is also exploited in [26]. Polynomial phase signal (PPS) modeling is another parametric approach which is employed for AMFM signals [27].
The interpretations and estimation of instantaneous frequencies embedded in a multicomponent signal have been controversial [28]. Three different approaches are proposed to estimate the IFs after the decomposition stage. In the first approach, the Hilbert transform is exploited to get the analytic signal whose phase is differentiated to find the IF [10]. The energy operator (TKEO) is utilized in the second approach [12], and the third one is based on TFD [11, 29]. Different definitions of IF are considered in these approaches, and consequently, their results are not necessarily equivalent. In [28] and [30], the different definitions and estimation methods are compared and discussed. The main contribution of this paper is to develop a novel approach based on the expansion of timevarying systems by orthogonal rational functions. The method introduced in [31] is extended and improved to be applied as the essence of the new method for IF estimation. An adaptive segmentation procedure in the proposed algorithm allows us to estimate the IFs locally. The decomposition is performed using orthogonal rational functions.
2 Problem statement
Multicomponent signals are first introduced in [9]. A multicomponent AMFM model describes a nonstationary signal as the combination of oscillation terms with timevarying amplitudes and frequencies:
where N_{c} is the number of components. A_{ k }(t) and θ_{ k }(t) are timevarying envelope and timevarying phase of the k th component respectively, and the instantaneous frequency denoted by f_{ k }(t) is defined from θ_{ k }(t) :
The general model in (1) can be interpreted as the signal expansion by a generalized complex exponential basis, which are exponential functions with timevarying amplitudes and frequencies. The decomposition of the multicomponent AMFM signal is investigated through this point of view. Therefore, we are going to find an appropriate basis to expand the nonstationary signal x(t):
The functions {g_{ k };k = 1, 2, ⋯, K} should represent the oscillation modes in signal, for which the instantaneous frequency is meaningfully definable. Accordingly, each term represents a valid IF of the multicomponent signal. The main idea to attain such decomposition is expanding the corresponding system of the AMFM signal in the complex plane. Since the transfer function of a realistic linear system has a rational representation, it can be expanded by orthogonal rational functions in the complex zplane. Returning back to the timedomain, each rational function is equivalent to a generalized exponential basis and represents one valid oscillation term or resonance. To perform this procedure, we should specify the generating system of the AMFM signal. The corresponding system of a nonstationary signal is modeled by a linear timevarying (LTV) system [32]. LTV models have been applied to describe nonstationary signals [33, 34]. Our proposed method is developed based on this approach of modeling. Let us consider the discretetime AMFM signal x[ n], obtained by time sampling of x(t) at the rate of f_{s}. Its generating system is modeled as a LTV system, which can be described by a bivariate function to characterize the inputoutput linear relationship [32]. Hence, a bivariate discretetime impulse response, h[ m, n], is considered, where n and m are two independent time instants, representing the time variable of the signal and the time variable of the system, respectively. Taking the Ztransform of h[ m, n] with respect to the time variable of the signal, H(m,z) is obtained which denotes the generating system of x[ n]. The orthogonal rational basis has been investigated for the decomposition of linear timeinvariant (LTI) systems [35] and should be generalized to expand the timevarying generating system of the AMFM signal as follows:
where {G_{ k }(m, z);k = 1, 2, ⋯, K} is a rational basis. Fortunately, to find the orthogonal rational functions for system expansion, it is not necessary to have the system’s transfer function with all the details. The knowledge of poles or logical assumptions about them are sufficient to extract a proper basis [35].
3 Orthogonal rational functions
The decomposition step in the proposed method is indeed an expansion of the AMFM signal, which is accomplished through the incorporating expansion of the equivalent timevarying system by orthogonal rational functions. Generally, the knowledge about the poles is sufficient as a priori information to describe the desired space being spanned by a rational basis. Let us consider a set of M timevarying poles, {ξ_{ k }[ m], k = 1, …, M}. We can make a firstorder IIR transfer function by each pole. So, a basis set is constructed including all specified poles, but not orthogonal. The Blaschke products [35] formed by these poles are twodimensional functions,
Applying the GramSchmidt procedure on these rational functions with respect to z, twodimensional functions are obtained:
This is the same routine for finding TakenakaMalmquist functions [36]. Now, these functions are generalized to twodimensional functions for the LTV system expansion. The resultant functions in (6) are orthogonal with respect to z in the complex domain. The inner product of each pair of these functions is a function of time, m:
Utilizing the Cauchy integral implies that d_{ kl }[ m] is zero at each snapshot, m, for k ≠ l. Taking the inverse Ztransform of G_{ k }(m, z) produces g_{ k }[ m, n]. Since the Ztransform and its inverse are homomorphic transforms, these functions would preserve orthogonality with respect to n. Two problems remain to be studied. At first, the underlying timevarying poles of the corresponding LTV system should be determined. Secondly, univariate terms should be extracted from bivariate functions, g_{ k }[ m, n], to express the oscillation modes of x[ n]. These issues are resolved simultaneously by adaptive segmentation which is addressed in the following section.
4 Timevarying modeling
The concept of poles and zeros are also generalized for linear timevarying systems. Concerning the stability and behavior of the LTV systems, several definitions of poles or eigenvalues of such systems have been proposed [37], depending on the characterization method of the LTV system. The notion of timevarying poles in this paper is founded on the timevarying autoregressive model. Parametric models for LTI systems can be generalized for LTV ones by imposing timevarying parameters on the model. The AMFM signal, x[ n], is modeled by a timevarying autoregressive (TVAR) of order M:
{a_{ m }[ n], m = 1, …, M} are the timevarying parameters, and ν[ n] is the zeromean innovation process, also addressed as a modeling error. The most general case of this model is where the parameters are completely uncorrelated at each time sample. Therefore, each time sample of x[ n] would be represented by M unknown coefficients; hence, it is not a practical approach. Based on a common practical assumption, the nonstationary signal is approximately regarded locally stationary or quasistationary. This assumption implies that the parameters of the TVAR model are correlated, and the coefficients are supposed to be constant in subintervals of the total time span, referred to as segments. This model is called a block stationary AR model [34]. For multicomponent AMFM signals whose IAs and IFs are slowly timevarying or piecewiseconstant, the segmentation strategy is applicable. By virtue of this assumption, a real multicomponent AMFM signal over its support is considered as a superposition of temporarily more limited signals with constant frequencies. These intervals can generally have various lengths, and different methods from fixedlength windowing to adaptive segmentation algorithms are introduced to determine the borders of the segments [23]. In the proposed method, the segmentation is performed adaptively from the aspect of TVAR parameter estimation.
4.1 Segmentation procedure
The entire signal of N samples is segmented into L blocks with various lengths:
where ℓ = 1, …, L and n_{0} = 0. The TVAR coefficients are supposed to be constant in each segment. The mean square error in the ℓ th segment is given by
where {a_{ℓ, m}, m = 1, …, M} are the TVAR coefficients of the ℓ th segment. The boundaries of each segment are determined such that the error J_{ ℓ } remains under a specified threshold. The segmentation algorithm operates as follows. At the start of each stage, the length of the current segment (say ℓ) is considered the minimum possible length, equal to the order of the TVAR model, i.e., n_{ ℓ } = n_{ℓ1} + M. The TVAR coefficients, a_{ℓ, m}, are estimated by the recursive least squares (RLS) technique, and the error in (10) is computed. If it is still greater than the prespecified threshold, the length of the segment increases by one sample, and the calculations are repeated. This procedure continues by onesample increment in each stage until the error falls below the threshold. Now, the boundaries and the length of the current segment are determined, and the procedure starts over the next time sample for another segment establishment. This algorithm runs through the entire signal repeatedly and stops at the end of the data batch. The question arises here about the threshold setting and how it can affect the accuracy of the IF estimation. This issue is scrutinized in the succeeding subsection separately. Once the TVAR parameters are estimated, the corresponding timevarying poles, denoted by {ξ_{ k }[ m], k = 1, …, M}, are obtained by applying the Ztransform of (8) with respect to n.
4.2 Error analysis
It is noteworthy to mention the relation between the error caused by segmentation and the error of IF estimation. This analysis leads us to select a reliable error threshold in the adaptive segmentation procedure. Let us consider a discretetime AMFM component:
The error in the instantaneous phase imposed by the TVAR modeling in each segment, denoted by ε_{ θ }, sets off an error in signal:
For very small phase errors, the following approximation is considered using the Maclaurin series:
Substituting this approximation in (12), we have
where
The error e[ n] whose instantaneous amplitude is absolute error of modeling is also an AMFM signal:
The error in phase is now transduced to the error of amplitude. Let us define a timevarying threshold denoted by η[ n] such that e[ n]  is restrained lower than it, i.e., e[ n]  < η[ n]. If we substitute e[ n]  by (16), the following inequality holds:
So, the absolute error of phase depends on the signal envelope. This means that for a fixed threshold, where η[ n] is constant over the entire signal, larger phase errors can occur when IA becomes smaller. Therefore, the threshold should vary adaptively, adjusted to the envelope of the observed signal. In other words, the locally normalized error for each segment is a proper threshold. Since the IA evolves slowly, its mean or minimum amount during the segment can be utilized for normalization. The normalized threshold is denoted by \stackrel{\u0304}{\eta} for brevity:
Thus, the inequality (17) is practically used as the following one:
The square of ε[ n]  is obtained from (15):
Exploiting this relation in the inequality (19) and performing some mathematical reformulations result in a bound for the phase error:
When \stackrel{\u0304}{\eta}\to 0, the righthand side of the above inequality is approximately equal to {\stackrel{\u0304}{\eta}}^{2}. Keeping the phase error (ε_{ θ }) under control, the error of IF is consequently controlled. By definition (2), IF is the derivative of instantaneous phase, which is a difference equation in the discretetime situation:
where ω[ n] = 2π f[ n] is the instantaneous frequency in radian per second, and T_{s} denotes the sampling time. In the worst case, the maximum phase errors of two consecutive instants accumulate. Thus, the maximum error of IF is 2ε_{ θ }f_{s}. For example, if \stackrel{\u0304}{\eta}=1{0}^{3}, then from (21), the maximum phase error is almost 10^{3}, and the absolute error of IF is at most 0.2% of the sampling frequency. This error can be controlled by arbitrary selection of \stackrel{\u0304}{\eta}. A smaller threshold leads to wider segments, in which the assumption of constant frequency is no longer respected. Our experiments verified that the condition of piecewiseconstant frequency for slowly varying IFs is satisfied for \stackrel{\u0304}{\eta} in the order of 10^{3}∼10^{2}.
5 Estimation framework
The main algorithm of IF estimation takes the extracted timevarying poles to construct G_{ k }(m, z) in (6). Then, bivariate functions, {g_{ k }[ m, n], k = 1, …, M}, are produced by taking the inverse Ztransform. Now, the onedimensional basis is extracted from the existing bivariate functions to achieve a onedimensional expansion for x[ n] as in (3). The basis g_{ k }[ n] is constructed by the concatenation of truncated pieces of bivariate functions, g_{ k }[ m, n], based on the result of the segmentation procedure:
where L is the number of total segments, and W_{ ℓ }[ n] is an arbitrary weighting window over the ℓ th segment. It is supposed that during this block, the corresponding pole remains equal to ξ_{ k }[ m_{ ℓ }]. Each resultant function, g_{ k }[ n], is a valid oscillation mode for which IF is definable. Thus, the estimation of the embedded IFs is achieved through the IF estimation of the extracted functions. To estimate the IF, linear regression of the phase for each segment is computed by applying the weighted linear least squares technique on a firstorder polynomial model. The abrupt changes in the phase which can affect the IF estimation severely are an important issue. While the consecutive segments derived from different rows of g_{ k }[ m, n] are concatenated, there may be some phase discontinuities over the resultant bases in the junctions of segments. Such discontinuities in the phase trajectory cause serious deficiency in the IF estimation which appears as spikes over the resultant IFs. To remedy this problem, a proper data window such as the Hamming window is chosen as W_{ ℓ }[ n] in (23), which controls the effect of borderline samples. The Hamming window is commonly utilized as an analysis window in audio and speech processing [24, 27].
When the signal is contaminated by noise, the timevarying poles estimated from noisy observations are misplaced. Thus, the estimated IF incurs more error due to the error in the estimation of poles imposed by the noise. This issue is alleviated by increasing the order of the TVAR model. Each resonance of a clean signal is represented by a pair of timevarying poles; hence, the order of the TVAR model (M) is twice the number of components (N_{c}). Nonetheless, to improve the estimation of timevarying poles in the presence of noise, we should have M>2N_{c}. Therefore, extraneous poles appear besides the valid poles. A minimum distance classifier [38] is applied to assign the poles of the resonances in each segment and distinguish them from the invalid poles. The perturbation of the poles due to the estimation error of TVAR coefficients is investigated mathematically in the succeeding section. The steps of the proposed algorithm are summarized as follows:

1.
Adaptive segmentation of the AMFM signal based on TVAR modeling and computation of underlying timevarying poles.

2.
Assignment of the poles to the components by minimum distance classifier.

3.
Employing the timevarying poles to construct oscillation terms, g _{ k }[ n].

4.
Fitting a linear model to the phase of each segment of g _{ k }[ n] to estimate the IF.
In this novel method of IF extraction, the Hilbert transform that is a global operator is not employed. Additionally, a linear model is applied to the phase of components segment by segment in spite of differentiating throughout. It makes the proposed method less sensitive to phase changes. Therefore, the adaptive segmentation is advantageous for both decomposition and frequency estimation.
6 Pole perturbation
The coefficients of the TVAR model (8), estimated through the RLS technique, are affected by noise. The perturbation in these coefficients leads to the perturbation in the resultant timevarying poles. Let p(ℓ, z) be the polynomial of AR model in the ℓ th segment whose roots are timevarying poles:
where the coefficients are normalized, i.e., a_{ℓ,0} = 1. If the perturbation Δ a_{ℓ,i} occurs in the i th coefficient, a_{ℓ,i}, the polynomial (24) changes to
The roots of this new polynomial differ from ξ_{ k }[ m_{ ℓ }] by Δ ξ_{ k }[ m_{ ℓ }], which may be real or complex. Let us denote the perturbed roots by {\stackrel{~}{\xi}}_{k}\left[\phantom{\rule{0.3em}{0ex}}{m}_{\ell}\right]={\xi}_{k}\left[\phantom{\rule{0.3em}{0ex}}{m}_{\ell}\right]+\Delta {\xi}_{k}\left[\phantom{\rule{0.3em}{0ex}}{m}_{\ell}\right]\phantom{\rule{0.3em}{0ex}}:
From now on, the k th pole and its perturbation are denoted by ξ_{ k } and Δ ξ_{ k }, respectively, and their arguments are neglected for brevity. When Δ ξ_{ k } → 0, the above equation is simplified by employing Taylor’s expansion:
where p^{′}(ℓ, z) is the first derivative of p(ℓ, z) with respect to z. This equation expresses the linear relation between perturbation in poles and perturbation in coefficients:
Considering Δ a_{ℓ,i} and Δ ξ_{ k } as random variables, their variances, respectively {\sigma}_{{a}_{\ell ,i}}^{2} and {\sigma}_{{\xi}_{k}}^{2}, are related linearly,
Obtaining p^{′}(ℓ, ξ_{ k }) from (24), the ratio of the variances is given by
This ratio indicates the sensitivity of the poles to the perturbation of the coefficients. Smaller ratio represents less sensitivity or more robustness of pole. When all poles are inside the unit circle, i.e., ξ_{ k } < 1, k = 1, …, M, the ratio in (30) takes its maximum value for i = M. This means that the perturbation in the coefficient of the highest order in the polynomial has the most effect on misplacing the poles. This maximum value for each pole is defined as its variance ratio:
γ_{ k } is the variance ratio of ξ_{ k } which indicates the robustness of this pole. This parameter only depends on the positions of the underlying poles, which are determined by AMFM signal characteristics.
7 Experimental evaluation
The proposed method is implemented on both synthetic and realworld signals, and its performance is compared to the results of two previously introduced methods.
7.1 Synthetic data
A twocomponent AMFM signal is considered as below:
The parameters of the first and the second components are α_{1} = 2,β_{1,1} = 10^{3}, β_{1,2} = 5 × 10^{3}, β_{1,3} = 20 × 10^{3} and α_{2} = 5, β_{2,1} = 5 × 10^{3}, β_{2,2} =  3 × 10^{3}, β_{2,3} = 0. The two following IFs are embedded in this AMFM signal:
The signal in (32) is sampled at T_{s} = 50 μ s intervals, so the sampling frequency is f_{s} = 20 kHz. Since the absolute values of the instantaneous frequencies increase over time, a limited span of signal is observed to avoid aliasing. The algorithm is run over N = 2,000 samples of the signal. The corresponding system has four timevarying complex poles {ξ_{1}[ m], …, ξ_{4}[ m]} in conjugate pairs. There are two conjugate pairs, and each pair represents one oscillation component. Consequently, four complex functions, g_{1}[ n], …, g_{4}[ n], are extracted whose phases yield desired IFs. g_{1}[ n] and g_{2}[ n] are common resonances, which means that their IFs have equal absolute values, but opposite signs. g_{3}[ n] and g_{4}[ n] determine the second IF likewise. The adaptive segmentation procedure divides the data batch into L = 45 segments with different lengths. The threshold of error in the segmentation procedure is set to 0.001. The coefficients of the TVAR model and, accordingly, the timevarying poles are estimated through the RLS algorithm with the forgetting factor of 0.98. Figure 1 demonstrates the estimated IFs denoted by {\widehat{f}}_{i};i=1,2 besides the original IFs. {\widehat{f}}_{1} and {\widehat{f}}_{2} track the original IFs very closely. The steplike variation, produced by segmentation, is obvious. For quantitative evaluation, the relative mean absolute error (MAE) of IF estimation is computed for {\widehat{f}}_{1} and {\widehat{f}}_{2}, and illustrated in Tables 1 and 2, respectively.
The proposed method is compared with two previous methods, EEMDHT and QHM. The EEMDHT is a nonparametric method which decomposes the components by the EEMD algorithm and estimates the IFs of resultant IMFs utilizing the Hilbert transform [14]. The QHM is a parametric method which has been appraised on speech signals [24]. The result of decomposition of the original signal by the EEMD procedure is displayed in Figure 2. The relative standard deviation of added noise is 0.2, and the ensemble number for each run is 100. Nine IMFs are derived while there are just two embedded oscillation modes. The first and the second IMFs are expected oscillation modes, and the others are false IMFs generated due to deficiencies in the iterative algorithm. EEMD initially extracts faster oscillations. Thus, the first IMF has higher frequency and represents our second component, and reversely, the second IMF corresponds to the first component. The valid IMFs are selected and assigned subjectively or based on the result of estimation [15]. In this simulation, the estimated IFs of the first and the second IMFs are closer to f_{2} and f_{1}, respectively. MAEs of the estimation of IFs are recorded in Tables 1 and 2. The estimation errors of QHM are also provided in these tables for comparison. The analysis window of 64 samples and hop step of one sample is considered for this algorithm.
The simulation is repeated in the presence of additive whiteGaussian noise, and the errors for different SNRs are depicted in the same tables. Each value is the average of 100 iterations. The proposed method outperforms the other algorithms, especially in stronger noise. These results verify the robustness of the proposed algorithm. Although the QHM has smaller error for clean signal, it is more sensitive to noise. The EEMD, which is more robust than EMD, is still affected by the perturbations on the amplitude of the data. Therefore, EEMDHT has the worst performance in the presence of noise. For the clean signal (SNR =∞), the order of the TVAR model, M, is equal to 4. Since higher error is imposed on pole estimation for stronger noise, M should increase to alleviate this problem. We have M = 24 for SNR = 30 dB and SNR = 10 dB. Figure 3 depicts the MAE of IF estimation with respect to M for both components. It demonstrates that the error decreases remarkably as the order increases. However, there are more poles for higher orders which raise the error of classification routine and consequently the total error of estimation. Thus, there is a compromise to select the optimum order. In Figure 3, MAEs of {\widehat{f}}_{1} and {\widehat{f}}_{2} are minimum at M = 24 and M = 28, respectively.
The effect of noise is illustrated by the perturbation variance ratios, γ_{1}, …, γ_{4} (31) for each segment in Figure 4. The variance ratios of the poles in a conjugate pair are very close to each other or even equal in some segments. γ_{3} and γ_{4}, corresponding to ξ_{3} and ξ_{4} or the second resonance, are smaller than γ_{1} and γ_{2}. This means that the poles corresponding to the first oscillation component are more sensitive to the noise. This observation justifies why the MAE for {\widehat{f}}_{1} (Table 1) deteriorates more than the MAE of {\widehat{f}}_{2} (Table 2) due to the noise.
The proposed method can distinguish components and estimate embedded close IFs, even when they have crossovers. To illustrate this capability, the simulation of a similar twocomponent signal is repeated, except that the frequency of the second component, f_{2}, is reduced by 4.8 kHz, i.e., β_{2,1} = 1, 200 in (32). In this case, two IFs are closer and intersect each other. Both IFs are depicted in Figure 5. The proposed method is implemented in the same condition, and the results of estimation are exhibited in Figure 5, besides the original IFs. This method can track both IFs, and the intersection in their trajectories is handled properly. The degradation of estimation, especially around the crossover junction, is obvious by comparing Figures 5 and 1, but the trajectories are not missed and the estimated IFs are improved with time. This advantage arises from the step of pole classification in the proposed algorithm.
7.2 Realworld signal
Acoustic studies have revealed that many natural acoustic signals such as the sounds of songbirds and oceanic mammals fit into the AMFM model [27, 39]. A duration of 68 ms of a song of two songbirds, a canary, and a kinglet, recorded at 22.05 kHz sampling frequency, is considered for the experiment. The spectrum of this signal estimated by STFT with sliding window of 128 samples, an overlap of 100 samples, and frequency resolution bin of 2 Hz is depicted in Figure 6. This signal is composed of two AMFM components and realworld disturbing signals such as ambient and instrumental noises. Four complex poles and consequently four bases are extracted which present two embedded dominant oscillations. Through the adaptive segmentation procedure, the signal is divided into L = 76 unequal blocks. The threshold of the segmentation procedure, \stackrel{\u0304}{\eta}, is set to 0.01. The extracted IFs are illustrated in the same figure of the STFT spectrum by black marks. The estimated IF tracks the frequency of dominant oscillations. In Figure 7, the IFs are demonstrated in addition to the results of EEMDHT and QHM. Nine IMFs are extracted through EEMD, but the first and the second ones are valid resonances whose IFs are displayed. The relative standard deviation of added noise is set to 0.2, and the number of ensembles is 100. It takes several samples for EEMDHT to resolve two IFs. Furthermore, the local variations and spikes in the results of EEMDHT and QHM are noticeable. By contrast, the IFs of the proposed method are piecewiseconstant, and no spike appears due to the segmentation. The lengths of the segments and consequently the variations of the resultant IFs (flatness or volatility) are controllable by the threshold selected in the segmentation procedure.
8 Conclusion
The analytical approach developed for extraction of oscillation modes of multicomponent nonstationary signals is founded on several facts which are summarized herein. Firstly, the decomposition of a multicomponent signal is considered as a signal expansion and investigated from this point of view. Secondly, the equivalent timevarying system of the nonstationary signal is modeled by TVAR and expanded by the orthogonal rational functions. Finally, oscillation modes are constructed, employing these bivariate rational functions, and their IFs are estimated. For this purpose, the evolution of the timevarying poles is considered piecewiseconstant which imposes the same approximation in frequency estimation. The threshold parameter in the adaptive segmentation algorithm controls this approximation and governs the flatness of estimated IF against volatility.
The order of the TVAR model or equivalently the number of timevarying poles is another parameter which is set properly to confront the noise. Because this method utilizes the poles of the generating system of the AMFM signal, the distortions on the amplitude, like noise, affect the estimation results mildly in comparison to those of the empirical methods. Although the noise degrades the estimation of underlying poles, it can be controlled by the order of the TVAR model. Simulations reveal the superiority of this method in the presence of noise. The controlling parameters in the proposed method yield some degree of freedom to adjust it for different realistic applications. Its capability to extract the embedded IFs in audio signals is illustrated.
References
Santhanam B, Maragos P: Multicomponent AMFM demodulation via periodicitybased algebraic separation and energybased demodulation. IEEE Trans Commun 2000, 48(3):473490. 10.1109/26.837050
Jang S, Loughlin P: AMFM interference excision in spread spectrum communications via projection filtering. EURASIP J Appl. Signal Process 2001, 4: 239248.
Lu S, Doerschu P: Nonlinear modeling and processing of speech based on sums of AMFM formant models. IEEE Trans. Signal Process 1996, 44(4):773782. 10.1109/78.492530
Bester M, Collen P, Richard G, David B: Estimation of frequency for AMFM models using the phase vocoder framework. IEEE Trans. Signal Process 2008, 56(2):505517.
Grimaldi M, Cummins F: Speaker identification using instantaneous frequencies. IEEE Trans. Audio Speech Lang. Process 2008, 16(6):10971111.
Kubo Y, Okawa S, Kurematsu A, Shirai K: Temporal AMFM combination for robust speech recognition. Speech Commun 2011, 53: 716725. 10.1016/j.specom.2010.08.012
Loizou C, Murray V, Pattichis M, Seimenis I, Pantziaris M, Pattichis C: Multiscale amplitudemodulation frequencymodulation (AMFM) texture analysis of multiple sclerosis in brain MRI images. IEEE Trans. Inform. Technol. Biomed 2011, 15(1):119129.
Murray V, Rodriguez P, Pattichis M: Multiscale AMFM demodulation and image reconstruction methods with improved accuracy. IEEE Trans. Image Process 2010, 19(5):11381152.
Cohen L: What is a multicomponent signal? Paper presented at the of IEEE international conference on acoustic, speech and signal processing (ICASSP). USA; 23–26 Mar 1992:113116.
Boashash B: Estimating and interpreting the instantaneous frequency of a signalpart1: fundamentals. Proc. IEEE 1992, 80(4):520538. 10.1109/5.135376
Lovell B, Williamson R, Boashash B: The relationship between instantaneous frequency and timefrequency representations. IEEE Trans. Signal Process 1993, 41(3):14581461. 10.1109/78.205756
Maragos P, Kaiser F, Quatieri T: Energy separation in signal modulations with application to speech analysis. IEEE Trans. Signal Process 1993, 41(10):30243051. 10.1109/78.277799
Bovik A, Maragos P, Quatieri T: AMFM energy detection and separation in noise using multiband energy operator. IEEE Trans. Signal Process 1993, 41(12):32453265. 10.1109/78.258071
Huang N, Shen Z, Long S, Wu M, Shih H, Zheng Q, Yen N, Tung C, Liu H: The empirical mode decomposition and the Hilbert spectrum for nonlinear and nonstationary time series analysis. Proc. R. Soc. London 1998, 454: 903995. 10.1098/rspa.1998.0193
Wu Z, Huang N: Ensemble empirical mode decomposition: a noiseassisted data analysis method. Adv. Adaptive Data Anal 2009, 1(1):141. 10.1142/S1793536909000047
Deléchelle E, Lemoine J, Niang O: Empirical mode decomposition: an analytical approach for sifting process. IEEE Signal Process. Lett 2005, 12(11):764767.
Lin L, Wang Y, Zhou H: Iterative filtering as an alternative algorithm for empirical mode decomposition. Adv. Adaptive Data Anal 2009, 1(4):543560. 10.1142/S179353690900028X
Wang Y, Zhou Z: On the convergence of iterative filtering empirical mode decomposition. Excursions in Harmonic Anal 2013, 2: 157172.
Hou T, Yan M: A variant of the EMD method for multiscale data. Adv. Adaptive Data Anal 2009, 1(4):483516. 10.1142/S179353690900031X
Hou T, Shi Z: Adaptive data analysis via sparse timefrequency representation. Adv. Adaptive Data Anal 2011, 3(1):128.
Daubechies I, Lu J, Wu H: Synchrosqueezed wavelet transforms: an empirical mode decompositionlike tool. Appl. Comput. Harmonic Anal 2011, 30(2):243261. 10.1016/j.acha.2010.08.002
Meignen S, Oberlin T, McLaughlin S: A new algorithm for multicomponent signals analysis based on synchrosqueezing: with an application to signal sampling and denoising. IEEE Trans. Signal Process 2012, 60(11):57875798.
Gianfelici F, Biagetti G, Crippa P, Turchetti C: Multicomponent AMFM representations: an asymptotically exact approach. IEEE Trans. Audio Speech Lang. Process 2007, 15(3):823837.
Pantazis Y, Rosec O, Stylianou Y: Adaptive AMFM signal decomposition with application to speech analysis. IEEE Trans. Audio Speech Lang. Process 2011, 19(2):290300.
Gazor S, Rashidi Far R: Adaptive maximum windowed likelihood multicomponent AMFM signal decomposition. IEEE Trans. Audio Speech Lang. Process 2006, 14(2):479491.
Pai W, Doerschuk C: Statistical AMFM models, extended Kalman filter demodulation, CramerRao bounds and speech analysis. IEEE Trans. Signal Process 2000, 48(8):23002313. 10.1109/78.852011
Jabloun M, Leonard F, Vieira M, Martin N: A New flexible approach to estimate the IA and IF of nonstationary signals of longtime duration. IEEE Trans. Signal Process 2007, 55(7):36333644.
Huang N, Wu Z, Long S, Arnold K, Chen Z, Blank K: On instantaneous frequency. Adv. Adaptive Data Anal 2009, 1(2):177229. 10.1142/S1793536909000096
Khan N, Boashash B: Instantaneous frequency estimation of multicomponent nonstationary signals using multiview timefrequency distributions based on the adaptive fractional spectrogram. IEEE Signal Process. Lett 2013, 20(2):157160.
Vakman D: On the analytic signal, the TeagerKaiser energy algorithm, and other methods for defining amplitude and frequency. IEEE Trans. Signal Process 1996, 44(4):791797. 10.1109/78.492532
Sebgahti M, Amindavar H: A novel analytical approach to orthogonal bases extraction from AMFM signals. Paper presented at the IEEE international conference on acoustic, speech and signal processing (ICASSP). Czech Republic; 22–27 May 2011:38203823.
D’Angelo H: Linear TimeVarying Systems, Analysis and Synthesis. Boston: Allyn and Bacon; 1970.
Martin N: An AR spectral analysis of nonstationary signals. Signal Process 1986, 10: 6174. 10.1016/01651684(86)900654
Hopgood J, Rayner P: Blind single channel deconvolution using nonstationary signal processing. IEEE Trans. Speech Audio Process 2003, 11(5):476488. 10.1109/TSA.2003.815522
Heuberger P, Van den Hof P, Wahlberg B: Modeling and Identification with Rational Orthogonal Basis Functions. London: Springer; 2005.
Ninness B, Gustafsson F: A unifying construction of orthonormal bases for system identification. IEEE Trans. Automatic Control 1997, 42(4):515521. 10.1109/9.566661
O’Brien Jr. R, Iglesias P: On the poles and zeros of linear timevarying systems. IEEE T CircuitsI 2001, 48(5):565577. 10.1109/81.922459
Duda R, Hart P, Stork D: Pattern Classification. New York: Wiley; 2000.
Meng Q, Yuan M, Zhao J: Empirical AMFM decomposition of auditory signals. J. Acoust. Soc. Am 2012., 131(4): doi:10.1121/1.4708214
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Sebghati, M., Amindavar, H. & Ritcey, J.A. Basis approach to estimate the instantaneous frequencies in multicomponent AMFM signals. J AUDIO SPEECH MUSIC PROC. 2014, 8 (2014). https://doi.org/10.1186/1687472220148
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/1687472220148