 Research
 Open access
 Published:
An MDCT domain threepoint interpolationbased lowcomplexity frequency estimator
EURASIP Journal on Audio, Speech, and Music Processing volume 2017, Article number: 8 (2017)
Abstract
Signal frequency estimation is a problem of significance in many applications including audio signal processing. Compressed domain audio frequency estimators that directly use the modified discrete cosine transform (MDCT) coefficients are suitable for lowcomplexity audio applications. A new frequency estimation approach, which can obtain the estimated value from a simple combination of three MDCT coefficients, is proposed herein. It exploits the underlying relation among adjacent MDCT values and provides a general form of this type of estimators. The estimator manifests obvious computational advantages over other MDCT domain estimators and is suitable for high signaltonoise ratio (SNR) conditions.
1 Introduction
Frequency estimation is a basic problem in signal processing research and has been widely used in various applications such as economics, meteorology, astronomy, industry, and consumer electronics [1]. In recent years, lowcomplexity frequency estimators, which are suitable for lowcost applications, have been proposed in addition to socalled highresolution (or even superresolution) frequency estimation techniques such as Pisarenko [2], MUSIC [3] and ESPRIT [4]. A typical class of the lowcomplexity algorithms operates in the frequency domain (via discrete Fourier transform, DFT) and uses several DFT bins to obtain the estimated value [5–8].
For audio signals, frequency estimation plays a crucial role in parametric audio processing, which has been reported in various applications such as synthesis [9, 10], recognition [11], enhancement [12], and frameloss concealment [13, 14]. In particular, in audio coding, the following two major profiles in MPEG4 audio coding are based on the sinusoidal analysis of an audio signal: HILN (Harmonic and Individual Lines plus Noise) [15] and SSC (SinuSoidal Coding) [16]. Using the lowcomplexity frequency estimator can effectively lower the resource requirement of the entire processing system, which is significant for massive amount multimedia data processing and portable ultralowpower media devices. However, the aforementioned frequency estimation algorithms are not applicable for most lowcost audio applications.
Audio data that are used in most audio applications are stored and transmitted in compressed format, but the compression is not based on DFT. Thus, estimating the parameters of an audio signal, which includes the frequency estimation, is considerably complex. The timedomain signal samples should first be recovered from the compressed data before the estimation, but the recovery generally has a relatively high degree of computational complexity. For highquality audio compression standards such as MPEG2/4 AAC, Dolby AC3, WMA, and IETF Opus, the compression is conducted in the modified discrete cosine transform (MDCT) [17] domain, where an overlap of 50% between successive blocks and time domain alias cancellation (TDAC) are used to mitigate the block effect. To recover one block of the time samples, the inverse MDCT (IMDCT) of three successive blocks is required. Although the frequency estimation algorithm is simple, the IMDCT significantly adds the computational complexity during the recovery of the time domain samples.
To reduce the complexity, several approaches have been proposed. One is to directly calculate the DFT from MDCT with a fast algorithm [18], and the frequency estimation is performed with these DFT values. However, computing the DFT of every block requires the MDCT values of the corresponding block, previous block, and succeeding block, which causes an inevitable algorithm delay of one block. Another approach is to use the oddDFT as an intermediate domain between the time domain and the MDCT domain. The frequency is estimated with the oddDFT coefficients; then, the MDCT is obtained from the oddDFT by a simple conversion [19–21]. Using the oddDFT, the system complexity of an audio application can effectively be decreased, but this scheme is not fit for the applications that take the compressed audio as their input. Another approach is to directly estimate the frequency with the MDCT coefficients. With the analysis of the MDCT coefficients of a sinusoid [22], several MDCT domain estimators have been proposed in the last decade [23–25], which shows great convenience for the lowcomplexity implementation of an estimator. All estimators are based on the ratio of two coefficients using the mapping relationship between the frequency value and the coefficient ratio. Effective estimation is restricted in the monotone mapping region. However, in practice, the noise is unavoidable, which leads the estimation to the nonmonotonic region and produces a wrong result.
The major objective of this paper is to propose a threepoint interpolationbased estimator, which avoids the effect of nonmonotonic mapping and further reduces the complexity of the MDCT domain frequency estimator to render a simple method for various applications. The contributions are summarized as follows: (i) derive an analytical expression of the MDCT of a singletone sinusoid based on the sine window’s centered DFT (CDFT); (ii) propose an MDCT domain threepoint interpolationbased lowcomplexity approach for the signal frequency estimation problem. The proposed algorithm estimates the frequency from three MDCT bin values with only simple calculations and is significantly less complex than the existing methods. The method is effective for the sine window case and exhibits an estimation error lower than 1 Hz when the signaltonoise ratio (SNR) is above 20 dB.
This paper is organized as follows. In Section 2, we provide the MDCT analysis of a sinusoid, which is the basis of the MDCT domain estimators. The proposed algorithm is presented in Section 3. In Section 4, the MonteCarlo simulation results are shown and the complexity is analyzed. The conclusions are summarized in Section 5.
2 MDCT analysis of sinusoids
2.1 Signal model of the estimation
Audio signals are commonly modeled as a combination of several sinusoidal frequency components, which can be expressed as
where n is the signal index; P is the number of components; A _{ m }, f _{ m }, and ∅ _{ m } are the amplitude, normalized frequency, and phase of each component s _{ m }(n), respectively. The problem of the audio signal parameter estimation is to obtain the values of each parameter set {A _{ m }, f _{ m }, ϕ _{ m }} for m = 0, 1, …, P − 1. In general, the frequency estimation is the most important. These frequencies can be estimated together as most time domain methods do or estimated one by one as the frequency domain methods commonly do. When these components are well separated in the frequency scale, the estimation of each component in the frequency domain can be treated as the problem of estimating each single frequency component where all other components act as interference noise. Thus, the signal model may be simplified to a singlecomponent model. In this paper, we concentrate on the frequency estimation of a single tone.
Given a discrete sinusoid, the singletone signal is expressed as
where A, f, and ∅ are the magnitude, frequency, and initial phase of this sinusoid, respectively. Considering the noisy case, the observed signal is
where w(n) is generally assumed as the additive white Gaussian noise (AWGN) with zero mean and variance σ ^{2}. The SNR is A/(2σ ^{2}).
To estimate the parameters in the MDCT domain, the signal x(n) is framed by weighting a window function h(n) of length 2 N, which satisfies the PrincenBradley perfect reconstruction conditions [17], and converted to its N point MDCT coefficients,
where k = 0, 1, …, N − 1 is the MDCT bin index. The problem of MDCT domain frequency estimation is to estimate the value of f from MDCT coefficients X(k). f is commonly expressed as
where f _{ s } is the sampling frequency, \( {l}_0\in {\mathbf{Z}}_0^{+} \), and δ ∈ [0, 1) is the integer and fractional part of the digital frequency l. Thus, the estimation of l is to obtain the values of l _{0} and δ.
2.2 Generalized MDCT analysis
The MDCT analysis of a sinusoid is the basis of the frequency estimator in the MDCT domain. It exhibits the underlying relationship between the MDCT coefficients and the parameters of the sinusoidal signal. This relationship was first explored by Daudet [22] for the sine window case and generalized by Zhang [25] to other window cases. Here, we briefly describe the generalized MDCT analysis. The analysis is similar to that of [25], but the signal model uses Eq. (3).
Considering the noiseless case, the signal is shown in (2); the general form of the MDCT coefficient X(k) of the signal with window h(n) is the real part of an expression Z(k) in the form of [25]
where
H(ξ) is the centered discrete Fourier transform (CDFT) of a window function h(n),
where ξ is not restricted to integer. If h(n) is evensymmetric (a common case in MDCT analysis), the values of its CDFT H(ξ) are real. The MDCT coefficient of the signal in (2) is expressed as
where ϕ _{0} is defined as
Equation (9) provides the precise result of the MDCT coefficient for a given sinusoidal signal with an arbitrary symmetric window function case.
To build a simple relation between the sinusoidal frequency and the MDCT coefficients, we must simplify (9). Such simplification can be performed based on the features of the window and its CDFT H(ξ). The window function has fast fading sidelobes, which makes the significant values of its CDFT coefficients appear only at approximately ξ = 0 [25]. For k = 0, 1, …, N − 1 and l far from 0 or N−1, only the first term in (9) is significant. Thus, the simplified expression of (9) is
2.3 MDCT analysis for sine window case
The sine window is commonly used in audio signal processing and coding. The frequency estimator for the sine window case is important for practical applications. The analytical expression of the CDFT coefficient H(ξ) for the sine window can be derived; thus, the analytical expression of the MDCT coefficient X(k) can also be derived. The expression of X(k) is the basis of the proposed threepoint interpolationbased lowcomplexity frequency estimator.
The sine window is defined as
where n = 0, 1, …, 2N − 1 has the identical length as the MDCT input data. The sine window is evensymmetric, and its CDFT is realvalued. Substituting (12) into (8) and simplifying, we obtain the following expression of the CDFT
For ξ near 0, which implies that the bin index k is near the digital frequency l, Eq. (13) can be approximated as
Values at ξ = {0, −1} are obtained using L’Hospital’s rule. This approximation leads to an error less than 1.25 × 10^{−7}. Substituting (14) into (11), a simplified MDCT bin value X(k) is obtained
This result is the basis of the proposed frequency estimator.
3 Proposed frequency estimator
3.1 General form
To obtain the estimator, we reform (15) as
In (16), X(k) is composed of three parts: a constant valued part \( \frac{A}{\pi}\sqrt{\frac{N}{2}} sin\left(\pi l\right) \), a variable value part \( \frac{1}{\left( k l\right)\left( k l+1\right)} \), and a phase modulation factor \( {\left(1\right)}^{k+1}\cdot cos\left({\phi}_0\frac{3\pi}{2} k\right) \). The phase modulation factor has a period of 4 and can be listed as
Thus, taking \( M(k)=\frac{1}{X(k)} \), for a given k _{0}, denoting M _{−} = M(k _{0} − 2), M _{0} = M(k _{0}), and M _{+} = M(k _{0} + 2), we construct a combination of these three values in the form of
where a _{ i } and b _{ i } (i = 1, 2, 3) are realvalued coefficients. Then, the constant part and phase modulation factor in (15) are canceled out, and only combinations of (k−l)(k−l + 1) remain. Defining δ _{0} = l − k _{0} and substituting it into (17), we obtain
where
If the coefficients a _{ i } and b _{ i } are properly set, a simple relation between λ and δ _{0} can be obtained and δ _{0} can be estimated. For example, if we set A _{2} = A _{1} = 0 and B _{2} = B _{0} = 0 by properly selecting the coefficients a _{ i } and b _{ i }, then λ = δ _{0} ⋅ B _{1}/A _{0}, B _{1}/A _{0} is a constant determined by a _{ i } and b _{ i }. An estimation to δ _{0} is λ/(B _{1}/A _{0}). Thus, the frequency value \( \widehat{l} \) (we use \( \widehat{\cdot} \) to denote an estimated value) can be estimated by \( \widehat{l}={k}_0+{\displaystyle {\widehat{\delta}}_0} \).
3.2 Proposed estimator
In the proposed estimator, k _{0} is set to the index of the maximum MDCT magnitude X(k). δ _{0} is estimated using the following formula:
To simplify the computation, we convert formula (20) to a form that directly uses X(k). For i = −2, 0, 2, denoting X(k _{0} + i) as X _{−}, X _{0}, and X _{+}, respectively, we obtain a new form of (20)
The key steps of the proposed estimator are summarized as follows:

(1)
Find the bin index of the MDCT magnitude peak,
$$ {\widehat{k}}_0=\underset{k}{ \arg \max}\left(\left X(k)\right\right). $$(22) 
(2)
Estimate δ _{0} with the MDCT values of X _{−}, X _{0}, and X _{+} according to formula (21).

(3)
Finally, obtain the estimated value of l,
It is noted that (20) is not the only formula to estimate δ _{0}; we have derived a set of such formulas; for example,
However, the coefficients in (20) are the most suitable for a simple calculation.
4 Results and discussion
4.1 Comparison benchmarks
Four reported MDCT domain estimators [23–26] and one simplified estimator were used as the performance comparison benchmarks. The four reported estimators are as follows:

Merdjani [23], a method based on the analytical expression of the MDCT coefficient;

Zhang [25], an envelopefunctionbased method with a lookup table (the singleframebased envelop method without iteration is used); and

Dun [26], an improved version of the above envelope function method.
We have implemented the estimators of Merdjani and Zhu and obtained Zhang’s from its author. Based on our previous work (Dun), we have noticed that all of these estimators involve conditional constructs, i.e., the specific algorithm is chosen according to one criterion or several criteria. The decision algorithm verifying the criteria and the conditional branch instructions selecting specific algorithm increases the complexity of the program flow especially for pipelined processing. Thus, in our verification tests, one additional benchmark, which is a simplified estimator derived from Merdjani [23], is used and labeled as “Simplified” in the following tests. This simplified estimator has no conditional branch (similar to the proposed estimator), and the frequency is estimated by,
where k _{0} is the frequency bin that locates the maximum of the socalled pseudospectrum S(k),
and α is the ratio of two MDCT coefficients,
4.2 Complexity comparison
4.2.1 General
Complexity refers to the resources that an executable program of the algorithm requires; it includes time complexity and space complexity. Here, the time complexity is compared by accounting the required operations to estimate the frequency, and the space complexity refers to the storage space size required by the algorithm.
To compare the time complexity, operations such as addition, multiplication, division, square root, comparison, and bitshift are accounted for each algorithm. Most existing MDCT domain frequency estimation algorithms [23–26] consist of two steps: find the frequency bin k _{0} that corresponds to the integer part l _{0} and estimate the fractional part δ using a decision method. Note that finding the bin index of the peak location is a common step for all algorithms and the operations are identical, so the operations to find this peak are not included in the comparison.
To compare the space complexity, the required space size to store the lookup table is accounted. The required space to locate the variables and intermediate results is not included in the comparison.
4.2.2 The proposed estimator
According to the proposed frequency estimator in Section 3.2, with the bin index of the maximum X(k), the operations to obtain the estimated value \( \widehat{l} \) is shown in (21), which includes three MDCTcoefficientmultiplications (X _{_} X _{0}, X _{0} X _{+}, and X _{_} X _{+}), three constantcoefficientmultiplications (with 3 and 2), four additions, and one division. A multiplication with numbers such as 2 and 3 is usually substituted by one bitshift and addition. Thus, in practice, three multiplications, five additions, one division, and three bitshifts are used. Neither additional information nor other operation is required.
4.2.3 Other MDCT domain estimators
First, all compared estimators find a peak location. [24–26] use other criteria after locating the initial maximum to obtain \( {\widehat{l}}_0 \), whereas Merdjani [23] and the simplified estimator locate the maximum of pseudospectrum that is converted from MDCT spectrum. The use of a pseudospectrum helps to find the exact \( {\widehat{l}}_0 \), but it also adds a certain amount of operations, which must be accounted in the comparison. Then, always with some decision algorithms (particularly in Zhu [24] and Dun [26]), the value of \( \widehat{\delta} \) is solved from a quadratic equation or computed from a lookup table with polynomial fitting.
We have compared the complexity of these methods as shown in Table 1. The given numbers are the typical values of every algorithm. The size of the lookup table relates to the step. The data in the table present how many values should be stored according to a step of 2^{−13} as reported in [26].
Table 1 shows that the proposed estimator only requires several addition, multiplication, and division operations aside from three bitshift operations (the simplest operation among the list). Neither comparison nor saving space is required. Obviously, the proposed estimator has the lowest complexity. The simplified algorithm has a similar complexity with the proposed estimator if the calculation of S(k) is not considered.
4.3 Simulation results and discussion
Simulations were conducted to verify the proposed frequency estimator and compare with other estimators. Herein, the results for both noiseless and noisepolluted cases are presented.
In all simulations, parameters were set according to the audio applications. The block size and window length were set to 2N = 2048, the sampling frequency was f _{ s } = 44.1 kHz, and the magnitude was A = 1. The initial phase ϕ was randomly generated in the range of (−π, π), which obeyed the uniform distribution. The estimation error of the frequency value, i.e., \( \varepsilon =\widehat{f} f \) in Hertz (Hz), where f is the sinusoidal frequency and \( \widehat{f} \) is the estimated value, was measured by the maximum value ε _{max} and mean square error (MSE). An MSE of 0 dB represents an error of 1 Hz.
With expression (5), we compared the precision of the estimators when δ varied from 0 to 1 with a step of 0.05. The signal frequency l partially decides the model error when simplifies the original form (9) to expression (11); therefore, two values, 46 and 510, were used for its integer part l _{0} in this test. The value of 46 is a bin number that corresponds to approximately 1 kHz according to values of f _{ s } and N. The value of 510 is approximately half of the MDCT bin index, which can minimize the interference caused by the negative frequency of a realvalued sinusoidal input. The results of the noisefree condition are shown in Fig. 1.
As expected, both MSE and maximum error are larger for all estimators when l _{0} = 46. In this frequency domain, the proposed estimator exhibits a slightly larger MSE and maximum error compared to Merdjani, Zhu, and Dun’s methods but significantly less than Zhang’s method and the simplified method. In other words, although no conditional construct is used, the proposed estimator exhibits similar precision to the ones that have conditional branches, whereas other existing estimators significantly lose their accuracy. When l _{0} = 510, the maximum error of the proposed estimator remains similar to other estimators that have conditional branches.
For both cases, the proposed estimator has a slightly larger MSE than the other branched method. The degradation in performance is mainly caused by the third coefficient. In [23, 24, 26], additional decisions are made to select the largest two values. In the proposed estimator, three values are required; neither decision algorithm nor conditional branch instruction is used. Thus, an ultralowcomplexity approach is obtained. Fortunately, the MSE remains near or below 10^{−10} for most frequencies.
Then, the corresponding test of the noisepolluted counterparts was performed. This test shows the performance of each estimator under the condition of noisy interference. For the frequency estimation of a real audio signal, the noise originates from other sound sources, environmental noise, and other frequency components of the audio signal. For multicomponent signals, the interference from other frequency components are a major source of noise. The corresponding results with noise of SNR = 40 are shown in Fig. 2. The precision of all estimators significantly degrades, and MSEs increase from less than 10^{−10} to greater than 10^{−3}. A level of 10^{−2} is shown for the proposed estimator, which corresponds to an error of 0.1 Hz.
A test of MSE vs. SNR was also conducted. In this test, l _{0} was set to 46, which corresponded to approximately 1 kHz; δ was set to be randomly uniformly distributed in (0, 1). The results are shown in Fig. 3. Basically, for SNR higher than 20 dB, the MSEs of the proposed estimator are less than 1 Hz. The maximum sidelobe level of the sine window is −23 dB; thus, for two frequency components, a distance greater than one and a half bin guarantees that the interference is less than −23 dB. According to the parameter settings, this 1.5 bin distance corresponds to 32.3 Hz frequency offset, which is similar to the frequency difference of two music notes: C1 (261.6 Hz) to D1 (293.7 Hz). But in practice, the distance between the notes of a chord is greater than this value. Thus, the proposed estimator is suitable for the lowcomplexity frequency estimation at such high SNR situation.
4.4 Evaluation with real audio signals
In this part, the proposed algorithm is evaluated with real audio signals. After estimating the major components of an audio signal with sinusoidal model parameters (frequency, amplitude, and phase), the signal is reconstructed by the estimated components. The performances of the various methods are evaluated by comparing the original and the reconstructed signals.
In general, the major components of an audio signal are obtained by the following steps: firstly, finding the largest peak in the spectrum and estimating singletone parameters from it; secondly, subtracting this estimated tone from the spectrum. These two steps are repeated until all major tones are estimated. This procedure is recommended in multiple component estimation algorithms because it enables detection of any tones that are initially masked by leakage from nearby large peaks.
In specific, the frequency of each component is estimated firstly; then, the amplitude and phase are estimated with the method given in Merdjani [23]. The proposed algorithm and the five benchmarks are used to get the estimated frequencies. To make comparison in a uniform framework, the components of an audio signal are estimated in the same order by all of the algorithms.
The test has been conducted with audio set that is used in the verification test of MPEG audio, which contains 12 mono audio files as listed in Table 2. With a sampling frequency of 48 kHz and frame length of 1024, each frame lasts about 21.3 ms. Maximum component number of 30 and minimum residual energy of 10^{−4} are used as criteria to stop component extraction of a frame. An overlap of 50% is used between subsequent frames both in MDCT analysis and in waveform reconstruction. Figure 4 presents a detailed part of the reconstructed signal of “es01” when the proposed frequency estimation algorithm is used, and compares it with the original signal. It can be observed that the reconstructed waveform is almost the same with the original audio.
To evaluate the performance of the proposed algorithm, not only the errors between the original and the reconstructed signals are compared but also the audio qualities of the reconstructed signals are measured. The errors are compared by using MSE between the original and the reconstructed audio signals, and the result is plotted in Fig. 5. The audio quality is evaluated by using formal objective test with PQevalAudio software, which is used for perceptual evaluation of audio quality (PEAQ) specified in ITU BS.13871. The Objective Difference Grade (ODG), which has a range from 0 to −4, is used to indicate the audio quality. A score of 0 means no perceptible difference compared with a reference audio, and a score of −4 means that apparent performance degradation can be perceived. The test results are shown in Fig. 6.
The results of Figs. 5 and 6 show that the performance of the reconstructed audio signal remains similar to other estimators except the two most complexed ones although the proposed algorithm reduces the complexity greatly. The proposed algorithm avoids the spectrum conversion (from MDCT to pseudospectrum) used in Merdjani [23] and the simplified algorithm so that the algorithm complexity is irrelevant to the frame length N (as shown in Table 1, typical frame length of audio signal is 1024, 512, or so). At the same time, the proposed algorithm avoids the conditional constructs, which is beneficial to the speed of a frequency estimator in pipelined processor.
5 Conclusions
A lowcomplexity frequency estimator that operates with three MDCT coefficients and only several simple calculations is proposed in this paper. The analytical expression of the MDCT coefficients, which is the basis of the proposed estimator, is also presented. The proposed estimator shows a great reduction in complexity compared to other MDCT domain estimators and provides a good complexity/performance tradeoff. Without using conditional branch instructions, this estimator is especially fit for pipelined operators.
References
P Stoica, RL Moses, Spectral analysis of signals (Pearson/Prentice Hall, Upper Saddle River, 2005)
VF Pisarenko, The retrieval of harmonics from a covariance function. Geophys. J. Int. 33(3), 347–366 (1973)
RO Schmidt, Multiple emitter location and signal parameter estimation. Antennas and Propagation IEEE Transactions on 34(3), 276–280 (1986)
R Roy, T Kailath, ESPRITestimation of signal parameters via rotational invariance techniques. Acoustics, Speech and Signal Processing IEEE Transactions on 37(7), 984–995 (1989)
BG Quinn, Estimating frequency by interpolation using Fourier coefficients. Signal Processing, IEEE Transactions on 42(5), 1264–1268 (1994)
MD Macleod, Fast nearly ML estimation of the parameters of real or complex single tones or resolved multiple tones. Signal Processing, IEEE Transactions on 46(1), 141–148 (1998)
E Jacobsen, P Kootsookos, Fast, accurate frequency estimators [DSP Tips & Tricks]. Signal Processing Magazine, IEEE 24(3), 123–125 (2007)
C Candan, Analysis and further improvement of fine resolution frequency estimation method from three DFT samples. Signal Processing Letters, IEEE 20(9), 913–916 (2013)
H Kawahara, I MasudaKatsuse, A De Cheveigne, Restructuring speech representations using a pitchadaptive timefrequency smoothing and an instantaneousfrequencybased F0 extraction: possible role of a repetitive structure in sounds. Speech Comm. 27(3), 187–207 (1999)
EB George, MJ Smith, Speech analysis/synthesis and modification using an analysisbysynthesis/overlapadd sinusoidal model. Speech and Audio Processing, IEEE Transactions on 5(5), 389–406 (1997)
A. Eronen, and A. Klapuri, Musical instrument recognition using cepstral coefficients and temporal features. (Acoustics, Speech, and Signal Processing, ICASSP’00. 2000 IEEE International Conference on, Istanbul, 2000), pp. II753II756 vol. 2
DPN Rodríguez, JA Apolinário, LWP Biscainho, Audio authenticity: detecting ENF discontinuity with high precision phase analysis. Information Forensics and Security, IEEE Transactions on 5(3), 534–543 (2010)
S.U. Ryu, and K. Rose, An mdct domain frameloss concealment technique for mpeg advanced audio coding. (Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, Honolulu, 2007), pp. I273I276
M.Y. Zhu, N. Chen, X.Q. Yu, and W.G. Wan, Packet Loss Concealment for compressed audio stream using sinusoidal frequency estimation. (Multimedia and Expo (ICME), 2010 IEEE International Conference on, Suntec City, 2010), pp. 316–321
H. Purnhagen, and N. Meine, HILN—the MPEG4 parametric audio coding tools. (Circuits and Systems, The 2000 IEEE International Symposium on, Geneva, 2000), pp. 201–204
A. C. Den Brinker, J. Breebaart, P. Ekstrand, J. Engdegård, F. Henn, K. Kjörling, W. Oomen, and H. Purnhagen, An overview of the coding standard MPEG4 audio amendments 1 and 2: HEAAC, SSC, and HEAAC v2, EURASIP Journal on Audio, Speech, and Music Processing. 2009(3(2009)
JP Princen, AB Bradley, Analysis/synthesis filter bank design based on time domain aliasing cancellation. Acoustics, Speech and Signal Processing, IEEE Transactions on 34(5), 1153–1161 (1986)
S Zhang, L Girin, Fast and accurate direct MDCT to DFT conversion with arbitrary window functions. Audio, Speech, and Language Processing, IEEE Transactions on 21(3), 567–578 (2013)
AJS Ferreira, Accurate estimation in the ODFT domain of the frequency, phase and magnitude of stationary sinusoids (Applications of Signal Processing to Audio and Acoustics, 2001 IEEE Workshop, New Platz, 2001), pp. 47–50
A. J. Ferreira, and D. Sinha, Accurate and robust frequency estimation in the ODFT domain. (Applications of Signal Processing to Audio and Acoustics, 2005 IEEE Workshop on New Paltz, NY, 2005), pp. 16–19
Y Dun, G Liu, A fineresolution frequency estimator in the oddDFT domain. IEEE Signal Processing Letters 22(12), 2489–2493 (2015)
L Daudet, M Sandler, MDCT analysis of sinusoids: exact results and applications to coding artifacts reduction. Speech and Audio Processing, IEEE Transactions on 12(3), 302–312 (2004)
S Merdjani, L Daudet, Direct estimation of frequency from MDCTencoded files (Proceedings of the 6th International Conference on Digital Audio Effects, London, 2003), pp. 8–11
MY Zhu, W Zheng, DX Li, M Zhang, An accurate low complexity algorithm for frequency estimation in MDCT domain. IEEE Trans. Consum. Electron. 54(3), 1022–1028 (2008)
S Zhang, W Dou, H Yang, MDCT sinusoidal analysis for audio signals analysis and processing. Audio, Speech, and Language Processing, IEEE Transactions on 21(7), 1403–1414 (2013)
Y Dun, G Liu, An improved MDCT domain frequency estimation method ((Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference, Xi’an, 2014), pp. 120–123
Funding
This research was supported in part by the National Natural Science Foundation of China under Grants NSFC61173110, NSFC61373113, NSFC61372091, NSFC61671365 and NSFC U1531141.
Authors’ contributions
YD was responsible for proposing the algorithm and drafting the manuscript. GL and XH provided the comments on the verification tests and the drafts. All authors have read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Dun, Y., Liu, G. & Hou, X. An MDCT domain threepoint interpolationbased lowcomplexity frequency estimator. J AUDIO SPEECH MUSIC PROC. 2017, 8 (2017). https://doi.org/10.1186/s1363601701055
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1363601701055