- Research Article
- Open Access
On the Characterization of Slowly Varying Sinusoids
EURASIP Journal on Audio, Speech, and Music Processing volume 2010, Article number: 941732 (2010)
We give a brief discussion on the amplitude and frequency variation rates of the sinusoid representation of signals. In particular, we derive three inequalities that show that these rates are upper bounded by the 2nd and 4th spectral moments, which, in a loose sense, indicates that every complex signal with narrow short-time bandwidths is a slowly varying sinusoid. Further discussions are given to show how this result helps providing extra insights into relevant signal processing techniques.
Sinusoid representations of signals have been widely used in various signal processing areas including speech , music [2, 3], telecommunications . Given any complex variable , its sinusoid representation is
where the real variables and are the amplitude and phase angle (or phase for short) of . By this definition every nonzero complex variable has a unique sinusoid representation, up to the polarization of and shift of , . In practice, these ambiguities are relieved by assuming that and be continuous and smooth .
The parameter variations considered in this paper are the first and second derivatives of , and the second derivative of , which, respectively, characterize amplitude and frequency modulations. We say that a sinusoid representation is slowly varying if these derivatives have small absolute values. Slowly varying sinusoids include many signals in speech, music, and telecommunications that we technically handle as sinusoids [4, 6, 7], and are the central elements of sinusoid modelling.
Spectral properties of slowly varying sinusoids in terms of parameter variation have been well studied, and have been compared to those of stationary sinusoids [8–10]. One result  relates the 2nd moment of the energy spectrum of a time-varying sinusoid to its amplitude and phase variations by
where is the Fourier transform of , and is a real constant. Given the Parseval's equation , we divide both sides of (2) by and get
We introduce an operator , where stands for the operand, to simply if expressions like (3) that will appear all through this paper. Given and its Fourier transform , we define the operator , which maps a real function of t or ω to a real number, by
is interpreted as linear averaging weighted by the energy density of in time or frequency domain, depending on the operand. For example, and are the time and spectral centroids of , respectively. Using this notation, (3) can be written as
where we have omitted the subscript from . At , the term reaches its minimum whose square root defines the -bandwidth of (likewise we call the -bandwidth of at , as it is the -weighted -norm of ). According to (5), the -bandwidth of is upper-bounded if and the range of are. However, as the latter may grow very large over a long time span, (5) does not imply that slowly varying sinusoids have concentrated spectra, as stationary sinusoids do.
On the other hand, if is small, then does not vary much over a short time. In this case, (2) guarantees that has concentrated short-time Fourier transform (STFT), provided that this is calculated with a low-pass window function . In fact, using (2), it is trivial to show that
for any fixed , where is the Fourier transform of (i.e., STFT of with window ) and . We call the bandwidth of the short-time bandwidth of with window . Figure 1 illustrates the concept of short-time bandwidth applied to a linear chirp in the time-frequency plane. The solid line depicts the angular frequency of the chirp as a function of time. Its short-time spectra are evaluated using two rectangular windows, and , whose durations are marked along the time axis. Although the linear chirp is not band limited, each window captures a band-limited portion of it. The frequency content captured by window distributes uniformly over (, ) while that by window distributes uniformly over (, ). If both windows contain plenty periods of the sinusoid, then the bandwidths of the two spectra, and , are roughly proportional to and , which are in turn proportional to the length of and and the chirp rate of the sinusoid.
A loosely related result is given in  by direct comparison of the Fourier transforms of and where is a stationary sinusoid, with , and , and is centred at . It is shown that their difference is upper bounded by
where is the Fourier transform of . Since is concentrated at , (7) shows that is also concentrated at as long as and remain small. Mallet  used the term windowed Fourier ridge to describe this time-frequency distribution as it involves a spectral peak that evolves in time. We call it spectral ridge for short.
Despite all these studies on sinusoid representations, one question has been overlooked: what type of signals can be modelled as slowly varying sinusoids? From the results above, it is obvious that signals with wide short-time bandwidths, such as wide-band noise, cannot be slowly varying sinusoids. In this paper, we consider the inverse: do narrow short-time bandwidths always imply a slowly varying sinusoid? In other words, does a concentrated short-time spectrum necessarily set certain upper bounds on and ? The concentration of a spectrum is measured by the moments of the spectral energy distribution (i.e., normalized energy spectrum), or spectral moments for short. The spectral moment of x with centre is given by and can be interpreted as the biased-bandwidth, as it becomes the -bandwidth if . From (5), it is obvious that upper bounds the average amplitude derivative and average frequency departure from . However, the 2nd moment is not enough to set an upper bound on . In what follows, we provide a new result that employs higher spectral moments to upper bound as well as and .
2. Parameter Variation Rate Upper Bounds in Terms of 2nd and 4th Spectral Moments
High spectral moments are less often used than the 2nd moment. Notably,  employs an operator approach that relates an arbitrary spectral moment to the derivatives of amplitude and phase. In particular, regarding the 4th moment, we have
where can take arbitrary value. From (5) and (8) we can prove that
Equation (9) states that the average parameter variation rates are upper bounded by the 2nd and 4th spectral moments. Two bounds respectively regarding amplitude and phase can also be obtained as
A detailed proof of (9)(11) is given in the appendix. Identities hold in these inequalities if specific couplings exist between amplitude and phase variations (see the appendix). From physics' point of view, (9)(11) states that finite 2nd and 4th spectral moments can only "contain" limited amount of modulation of amplitude and frequency; to allow faster modulations one has to increase the spectral moments. The 2nd and 4th spectral moments are further connected through the biased kurtosis at , defined as
We call it "biased" because it is evaluated using an arbitrary centre instead of the true centroid . (12) gives
The kurtosis is generally understood to represent the "peakedness" of : a small kurtosis indicates bulky peak and sharp tails; a large kurtosis indicates narrow peak and heavy tails. In the context of (13), (9)(11) states that the for the same 2nd moment, more modulation is allowed by larger kurtoses.
Inequalities (9)(11) can also be directly applied to windowed Fourier transforms by replacing with where is the window function. As , if and are upper bounded, then so is . (9)(11) indicate that the sinusoid representation of a signal whose STFT forms a spectral ridge is necessarily slowly varying in terms of short-time average of parameter variation rates. This, together with our comments in the introduction, completes the following statement.
A complex signal has slowly varying sinusoid representation if and only if it has narrow short-time bandwidths.(*)
Here we have changed the term "spectral moment" to "bandwidth" considering that the -bandwidth is simply the p th spectral moment computed with . In (*) the "only if" part comes from the previous studies we summarized by (6) in the introduction; the "if" part comes from our results (9)(11). The plural form in "bandwidths" refers to the values evaluated in both - and -norms at different points over the whole duration. A quantitative presentation of (*) is given by rewriting (6) and (10), (11) employing a sliding window
where is the window function centred at .
We notice that is measured differently in (14) and (16), giving a double meaning to "slowly varying frequency" in (*). For this reason, (*) does not actually give a pair of strictly converse statements, and the equivalence between slowly varying sinusoids and spectral ridges, as established by (14)(16), should only be considered qualitatively. Nevertheless, by these results we have partially answered the question of what kind of signals can be modelled as slowly varying sinusoids. In the rest of this paper, we focus on (*) as a guideline and see how it relates to various sinusoid modelling practices.
3. Discussions and Conclusion
3.1. Combining Sinusoids with Close Frequencies
Beating  is a well known effect observed from adding two sinusoids with similar frequencies, in which they "melt" into a single tone with additional modulation. This phenomenon can be easily explained by (*): as slowly varying sinusoids have short-time spectral energy concentrated near their angular frequencies, if the frequencies are close, then their sum also has concentrated short-time spectral energy, therefore is also a slowly varying sinusoid. Additional modulation may be introduced as the result of a wider bandwidth contributed by the small interval between the participant frequencies. A quantitative proof of this argument is given in , which leads to an additive re-estimation algorithm for measuring parameters of slowly varying sinusoids.
Statement (*) also reveals the difficulty in separating close sinusoids by the slow variation criterion alone. Since there are infinite number of ways to divide a spectral ridge into 2 or more subridges, and since all narrow ridges are necessarily slowly varying sinusoids, there are infinite number of separations that are slowly varying.
3.2. Atomic Decomposition
In time-frequency analysis, the term atom refers to basic waveforms with concentrated time and frequency localization into which a signal is decomposed. Windowed sinusoid atoms have been used in short-time Fourier and Gabor transforms , matching pursuits , auditory scene analysis , and methods for approximating time-varying sinusoids [16, 17]. An overlap-add sinusoidal model was proposed  in the typical form of atomic decomposition
where is a slowly varying sinusoid, , and are constants for given , and is the overlap-add window centred at the i th reference point, say . Adjacent windows are arranged to have considerable overlap.
The use of overlapping stationary sinusoids to approximate time-varying sinusoids is partially justified by (*). Since windowed sinusoid atoms have concentrated spectral energy at the sinusoid frequencies, their sum will form a narrow spectral ridge as long as frequencies of adjacent atoms are close enough so that the result represents a slowly varying sinusoid. It is also apparent that if there is a large frequency jump between any adjacent atoms, then the sum is no longer a slowly varying sinusoid, indicating that (17) is not a suitable representation of .
Figure 2 illustrates atomic synthesis with 3D spectrogram, in which the sinusoids are directly visible as ridges. Images in the top row show the atoms in the time-frequency plane, in which frequency bins are marked out by dashed lines; images in the bottom row are the corresponding spectrograms. Figure 2(a) shows a single atom whose spectrogram consists of a single peak. Figure 2(b) shows a signal constructed from seven overlapping atoms without a frequency jump between adjacent atoms. A spectral ridge is clearly observed from its spectrogram. In Figure 2(c), we include a frequency jump of three bins between the 4th and 5th atoms, which is enough to break the ridge in Figure 2(b) into two separate ridges. Representing this signal as two sinusoids will allow much slower modulation rates than a single-sinusoid representation.
3.3. Real Sinusoids and Analytic Signals
A real slowly varying sinusoid can always be written as the sum of two conjugate complex slowly varying sinusoids. According to (*), its spectrogram is made up of two spectral ridges. To find a slowly varying double-sinusoid representation for a real sinusoid, one only needs to separate the spectrogram into two parts, each containing one ridge. This separation is generally not unique. If the two parts are conjugate to each other, then the real part of the corresponding complex sinusoids equals half of the real sinusoid.
Most of the real sinusoids we encounter in practice have always-positive frequencies so that each spectral ridge lies in a half plane on either side of the time axis. In this case, the most natural separation is obtained by splitting the spectrum along time axis, which leads to analytic complex sinusoids . We observe by (*) that the analytic representation is slowly varying by design, if the concerned sinusoid does have a slowly varying representation at all.
As an example, we illustrate the spectrogram of a real linear chirp in Figure 3(a) and that of the corresponding complex linear chirp in Figure 3(b). Amplitude values are warped to the 5th root to make small amplitudes visible. Figure 3(c) shows a nearly analytic sinusoid obtained by setting the spectrogram in Figure 3(a), rather than the spectrum, to zero over negative frequencies. Although Figure 3(b) and Figure 3(c) look different, both are slowly varying complex sinusoids with the real part equalling half the linear chirp in Figure 3(a).
3.4. More on Slowly Varying Real Sinusoids
Although the analytic representation has almost guaranteed slow variation, it is generally not the slowest varying, judged by the values of and . In , we have measured the parameter variation rates by
where is a balancing factor. A necessary condition for a and φ to give the slowest varying representation, in the sense of minimizing I in (18), is given as
where is the 4th-order derivative of . This condition is automatically satisfied regardless of if is exponential and is trinomial, but can be more constraining in other cases.
Nonunique representations of real sinusoids may cause problems in evaluating sinusoid estimators. For example, while a complex linear chirp defines a linear frequency for its corresponding real chirp, the latter's analytic counterpart defines a nonlinear frequency which is no less convincing. Fortunately, in  we have shown that the difference between various sinusoid representations of the same real signal is bounded by their parameter variation rates. Consequently, if a signal has multiple slowly varying sinusoid representations, then they are close to each other.
In this paper, we have given three inequalities that bound the parameter variation rates of the sinusoid representation of a complex signal by its 2nd and 4th spectral moments, indicating that every complex signal with narrow short-time bandwidths is necessarily a slowly varying sinusoid. This, together with several previous results, serves to argue towards the equivalence between slowly varying sinusoids and signals with narrow short-time bandwidths, which, in return, provides extra insights into various aspects of sinusoid modelling.
McAulay RJ, Quatieri TF: Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech, and Signal Processing 1986, 34(4):744-754. 10.1109/TASSP.1986.1164910
Serra X: Musical sound modeling with sinusoids plus noise. Musical Signal Processing 1997, 91-122.
Peeters G, Rodet X: SINOLA: a new analysis/synthesis method using spectrum peak shape distortion, phase and reassigned spectrum. Proceedings of the International Computer Music Conference (ICMC '99), 1999, Beijing, China 153-156.
Carlson AB: Communication Systems. 2nd edition. McGraw-Hill, New York, NY, USA; 1981.
Cohen L, Loughlin P, Vakman D: On an ambiguity in the definition of the amplitude and phase of a signal. Signal Processing 1999, 79(3):301-307. 10.1016/S0165-1684(99)00103-6
Fant G: The acoustics of speech. Proceedings of the 3rd International Conference Solar Air-Conditioning, 1959, Stuttgart, Germany 188-201.
Fletcher NH, Rossing TD: The Physics of Musical Instruments. 2nd edition. Springer, New York, NY, USA; 1998.
Cohen L, Lee C: Standard deviation of instantaneous frequency. Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP '89), May 1989 4: 2238-2241.
Mallat S: A Wavelet Tour of Signal Processing. 2nd edition. Academic Press; 1999.
Davidson KL, Loughlin PJ: Instantaneous spectral moments. Journal of the Franklin Institute 2000, 337(4):421-436. 10.1016/S0016-0032(00)00034-X
Jeffress LA: Beating sinusoids and pitch changes. Journal of the Acoustical Society of America 1968, 43(6):1464. 10.1121/1.1911027
Wen X, Sandler M: Additive and multiplicative reestimation schemes for the sinusoid modeling of audio. Proceedings of 17th European Signal Processing Conference (EUSIPCO '09), 2009, Glasgow, UK
Gabor D: Theory of communication. Journal of the Institute of Electronics Engineers 1946, 3: 429-459.
Mallat SG, Zhang Z: Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing 1993, 41(12):3397-3415. 10.1109/78.258082
Brown GJ, Cooke M: Computational auditory scene analysis. Computer Speech and Language 1994, 8(4):297-336. 10.1006/csla.1994.1016
George EB, Smith MJT: Analysis-by-synthesis/overlap-add sinusoidal modeling applied to the analysis and synthesis of musical tones. Journal of the Audio Engineering Society 1992, 40(6):497-516.
Davy M, Godsill SJ: Bayesian harmonic models for musical signal analysis. In Bayesian Statistics 7. Oxford University Press, Oxford, UK; 2003.
Wen X: Harmonic sinusoid modelling of tonal music events, Ph.D. thesis. University of London, London, UK; 2007.
This paper was supported by the EPSRC EP/E017614/1 project OMRAS2 (Online Music Recognition and Searching).
A. Proof of (9)~(11)
All variables in the appendix, with the exception of , are functions of , and all operators are defined for the same . For simplicity, we omit from all function notations and subscript x from operator .
We first summarize without proof a few properties of the operator defined by (4) as follows:
and immediate results of and :
We quote (5) and (8) here as (A.1) and (A.2)
From (A.2) using , we get
in which identity holds if there exist , , , . (A.3) yields three inequalities
in which identity holds if there existC so that
in which identity holds if there exist C so that
in which identity holds if there exist C so that
Using and (A.1), we get
in which identity holds if
using , , and (A.1), we get
in which identity holds if
Substituting (A.10) into (A.6), we get (10), (A.12) into (A.8), we get (11), and both into (A.4), we get (9). This concludes the proof.
To find out when identity holds in (9), we jointly solve (A.5a), (A.5b), (A.11a), (A.13a), and (A.13b). From (A.5a) and (A.10), we have
and from (A.13a), we have
These two together give
which, together with (A.13b), gives , and consequently , ; that is, the sinusoid is of constant amplitude and angular frequency at . Notice that this condition implies being infinitely large so that is not properly defined by (4). However, identity still holds in (9) for stationary sinusoids if we accept that .
Similarly, we derive that identity holds in (11) if the sinusoid has constant amplitude and angular frequency at , and in (10) if and there exist , .
About this article
Cite this article
Wen, X., Sandler, M. On the Characterization of Slowly Varying Sinusoids. J AUDIO SPEECH MUSIC PROC. 2010, 941732 (2010). https://doi.org/10.1155/2010/941732
- Atomic Decomposition
- Frequency Jump
- Spectral Moment
- Sinusoid Modelling
- Auditory Scene Analysis