- Research
- Open Access

# An MDCT domain three-point interpolation-based low-complexity frequency estimator

- Yujie Dun
^{1}Email authorView ORCID ID profile, - Guizhong Liu
^{1}and - Xingsong Hou
^{1}

**2017**:8

https://doi.org/10.1186/s13636-017-0105-5

© The Author(s). 2017

**Received:**8 September 2016**Accepted:**29 March 2017**Published:**4 April 2017

## Abstract

Signal frequency estimation is a problem of significance in many applications including audio signal processing. Compressed domain audio frequency estimators that directly use the modified discrete cosine transform (MDCT) coefficients are suitable for low-complexity audio applications. A new frequency estimation approach, which can obtain the estimated value from a simple combination of three MDCT coefficients, is proposed herein. It exploits the underlying relation among adjacent MDCT values and provides a general form of this type of estimators. The estimator manifests obvious computational advantages over other MDCT domain estimators and is suitable for high signal-to-noise ratio (SNR) conditions.

## Keywords

- Frequency estimation
- MDCT
- Audio
- Low complexity

## 1 Introduction

Frequency estimation is a basic problem in signal processing research and has been widely used in various applications such as economics, meteorology, astronomy, industry, and consumer electronics [1]. In recent years, low-complexity frequency estimators, which are suitable for low-cost applications, have been proposed in addition to so-called high-resolution (or even super-resolution) frequency estimation techniques such as Pisarenko [2], MUSIC [3] and ESPRIT [4]. A typical class of the low-complexity algorithms operates in the frequency domain (via discrete Fourier transform, DFT) and uses several DFT bins to obtain the estimated value [5–8].

For audio signals, frequency estimation plays a crucial role in parametric audio processing, which has been reported in various applications such as synthesis [9, 10], recognition [11], enhancement [12], and frame-loss concealment [13, 14]. In particular, in audio coding, the following two major profiles in MPEG-4 audio coding are based on the sinusoidal analysis of an audio signal: HILN (Harmonic and Individual Lines plus Noise) [15] and SSC (SinuSoidal Coding) [16]. Using the low-complexity frequency estimator can effectively lower the resource requirement of the entire processing system, which is significant for massive amount multimedia data processing and portable ultra-low-power media devices. However, the aforementioned frequency estimation algorithms are not applicable for most low-cost audio applications.

Audio data that are used in most audio applications are stored and transmitted in compressed format, but the compression is not based on DFT. Thus, estimating the parameters of an audio signal, which includes the frequency estimation, is considerably complex. The time-domain signal samples should first be recovered from the compressed data before the estimation, but the recovery generally has a relatively high degree of computational complexity. For high-quality audio compression standards such as MPEG2/4 AAC, Dolby AC-3, WMA, and IETF Opus, the compression is conducted in the modified discrete cosine transform (MDCT) [17] domain, where an overlap of 50% between successive blocks and time domain alias cancellation (TDAC) are used to mitigate the block effect. To recover one block of the time samples, the inverse MDCT (IMDCT) of three successive blocks is required. Although the frequency estimation algorithm is simple, the IMDCT significantly adds the computational complexity during the recovery of the time domain samples.

To reduce the complexity, several approaches have been proposed. One is to directly calculate the DFT from MDCT with a fast algorithm [18], and the frequency estimation is performed with these DFT values. However, computing the DFT of every block requires the MDCT values of the corresponding block, previous block, and succeeding block, which causes an inevitable algorithm delay of one block. Another approach is to use the odd-DFT as an intermediate domain between the time domain and the MDCT domain. The frequency is estimated with the odd-DFT coefficients; then, the MDCT is obtained from the odd-DFT by a simple conversion [19–21]. Using the odd-DFT, the system complexity of an audio application can effectively be decreased, but this scheme is not fit for the applications that take the compressed audio as their input. Another approach is to directly estimate the frequency with the MDCT coefficients. With the analysis of the MDCT coefficients of a sinusoid [22], several MDCT domain estimators have been proposed in the last decade [23–25], which shows great convenience for the low-complexity implementation of an estimator. All estimators are based on the ratio of two coefficients using the mapping relationship between the frequency value and the coefficient ratio. Effective estimation is restricted in the monotone mapping region. However, in practice, the noise is unavoidable, which leads the estimation to the non-monotonic region and produces a wrong result.

The major objective of this paper is to propose a three-point interpolation-based estimator, which avoids the effect of non-monotonic mapping and further reduces the complexity of the MDCT domain frequency estimator to render a simple method for various applications. The contributions are summarized as follows: (i) derive an analytical expression of the MDCT of a single-tone sinusoid based on the sine window’s centered DFT (CDFT); (ii) propose an MDCT domain three-point interpolation-based low-complexity approach for the signal frequency estimation problem. The proposed algorithm estimates the frequency from three MDCT bin values with only simple calculations and is significantly less complex than the existing methods. The method is effective for the sine window case and exhibits an estimation error lower than 1 Hz when the signal-to-noise ratio (SNR) is above 20 dB.

This paper is organized as follows. In Section 2, we provide the MDCT analysis of a sinusoid, which is the basis of the MDCT domain estimators. The proposed algorithm is presented in Section 3. In Section 4, the Monte-Carlo simulation results are shown and the complexity is analyzed. The conclusions are summarized in Section 5.

## 2 MDCT analysis of sinusoids

### 2.1 Signal model of the estimation

*n*is the signal index;

*P*is the number of components;

*A*

_{ m },

*f*

_{ m }, and

*∅*

_{ m }are the amplitude, normalized frequency, and phase of each component

*s*

_{ m }(

*n*), respectively. The problem of the audio signal parameter estimation is to obtain the values of each parameter set {

*A*

_{ m },

*f*

_{ m },

*ϕ*

_{ m }} for

*m*= 0, 1, …,

*P*− 1. In general, the frequency estimation is the most important. These frequencies can be estimated together as most time domain methods do or estimated one by one as the frequency domain methods commonly do. When these components are well separated in the frequency scale, the estimation of each component in the frequency domain can be treated as the problem of estimating each single frequency component where all other components act as interference noise. Thus, the signal model may be simplified to a single-component model. In this paper, we concentrate on the frequency estimation of a single tone.

*A*,

*f*, and

*∅*are the magnitude, frequency, and initial phase of this sinusoid, respectively. Considering the noisy case, the observed signal is

*w*(

*n*) is generally assumed as the additive white Gaussian noise (AWGN) with zero mean and variance

*σ*

^{2}. The SNR is

*A*/(2

*σ*

^{2}).

*x*(

*n*) is framed by weighting a window function

*h*(

*n*) of length 2

*N*, which satisfies the Princen-Bradley perfect reconstruction conditions [17], and converted to its

*N*point MDCT coefficients,

*k*= 0, 1, …,

*N*− 1 is the MDCT bin index. The problem of MDCT domain frequency estimation is to estimate the value of

*f*from MDCT coefficients

*X*(

*k*).

*f*is commonly expressed as

*f*

_{ s }is the sampling frequency, \( {l}_0\in {\mathbf{Z}}_0^{+} \), and

*δ*∈ [0, 1) is the integer and fractional part of the digital frequency

*l*. Thus, the estimation of

*l*is to obtain the values of

*l*

_{0}and

*δ*.

### 2.2 Generalized MDCT analysis

The MDCT analysis of a sinusoid is the basis of the frequency estimator in the MDCT domain. It exhibits the underlying relationship between the MDCT coefficients and the parameters of the sinusoidal signal. This relationship was first explored by Daudet [22] for the sine window case and generalized by Zhang [25] to other window cases. Here, we briefly describe the generalized MDCT analysis. The analysis is similar to that of [25], but the signal model uses Eq. (3).

*X*(

*k*) of the signal with window

*h*(

*n*) is the real part of an expression

*Z*(

*k*) in the form of [25]

*H*(

*ξ*) is the centered discrete Fourier transform (CDFT) of a window function

*h*(

*n*),

*ξ*is not restricted to integer. If

*h*(

*n*) is even-symmetric (a common case in MDCT analysis), the values of its CDFT

*H*(

*ξ*) are real. The MDCT coefficient of the signal in (2) is expressed as

*ϕ*

_{0}is defined as

Equation (9) provides the precise result of the MDCT coefficient for a given sinusoidal signal with an arbitrary symmetric window function case.

*H*(

*ξ*). The window function has fast fading sidelobes, which makes the significant values of its CDFT coefficients appear only at approximately

*ξ*= 0 [25]. For

*k*= 0, 1, …,

*N*− 1 and

*l*far from 0 or

*N*−1, only the first term in (9) is significant. Thus, the simplified expression of (9) is

### 2.3 MDCT analysis for sine window case

The sine window is commonly used in audio signal processing and coding. The frequency estimator for the sine window case is important for practical applications. The analytical expression of the CDFT coefficient *H*(*ξ*) for the sine window can be derived; thus, the analytical expression of the MDCT coefficient *X*(*k*) can also be derived. The expression of *X*(*k*) is the basis of the proposed three-point interpolation-based low-complexity frequency estimator.

*n*= 0, 1, …, 2

*N*− 1 has the identical length as the MDCT input data. The sine window is even-symmetric, and its CDFT is real-valued. Substituting (12) into (8) and simplifying, we obtain the following expression of the CDFT

*ξ*near 0, which implies that the bin index

*k*is near the digital frequency

*l*, Eq. (13) can be approximated as

*ξ*= {0, −1} are obtained using L’Hospital’s rule. This approximation leads to an error less than 1.25 × 10

^{−7}. Substituting (14) into (11), a simplified MDCT bin value

*X*(

*k*) is obtained

This result is the basis of the proposed frequency estimator.

## 3 Proposed frequency estimator

### 3.1 General form

*X*(

*k*) is composed of three parts: a constant valued part \( \frac{A}{\pi}\sqrt{\frac{N}{2}} sin\left(\pi l\right) \), a variable value part \( \frac{1}{\left( k- l\right)\left( k- l+1\right)} \), and a phase modulation factor \( {\left(-1\right)}^{k+1}\cdot cos\left({\phi}_0-\frac{3\pi}{2} k\right) \). The phase modulation factor has a period of 4 and can be listed as

*k*

_{0}, denoting

*M*

_{−}=

*M*(

*k*

_{0}− 2),

*M*

_{0}=

*M*(

*k*

_{0}), and

*M*

_{+}=

*M*(

*k*

_{0}+ 2), we construct a combination of these three values in the form of

*a*

_{ i }and

*b*

_{ i }(

*i*= 1, 2, 3) are real-valued coefficients. Then, the constant part and phase modulation factor in (15) are canceled out, and only combinations of (

*k*−

*l*)(

*k*−

*l*+ 1) remain. Defining

*δ*

_{0}=

*l*−

*k*

_{0}and substituting it into (17), we obtain

If the coefficients *a*
_{
i
} and *b*
_{
i
} are properly set, a simple relation between *λ* and *δ*
_{0} can be obtained and *δ*
_{0} can be estimated. For example, if we set *A*
_{2} = *A*
_{1} = 0 and *B*
_{2} = *B*
_{0} = 0 by properly selecting the coefficients *a*
_{
i
} and *b*
_{
i
}, then *λ* = *δ*
_{0} ⋅ *B*
_{1}/*A*
_{0}, *B*
_{1}/*A*
_{0} is a constant determined by *a*
_{
i
} and *b*
_{
i
}. An estimation to *δ*
_{0} is *λ*/(*B*
_{1}/*A*
_{0}). Thus, the frequency value \( \widehat{l} \) (we use \( \widehat{\cdot} \) to denote an estimated value) can be estimated by \( \widehat{l}={k}_0+{\displaystyle {\widehat{\delta}}_0} \).

### 3.2 Proposed estimator

*k*

_{0}is set to the index of the maximum MDCT magnitude |

*X*(

*k*)|.

*δ*

_{0}is estimated using the following formula:

*X*(

*k*). For

*i*= −2, 0, 2, denoting

*X*(

*k*

_{0}+

*i*) as

*X*

_{−},

*X*

_{0}, and

*X*

_{+}, respectively, we obtain a new form of (20)

- (1)Find the bin index of the MDCT magnitude peak,$$ {\widehat{k}}_0=\underset{k}{ \arg \max}\left(\left| X(k)\right|\right). $$(22)
- (2)
Estimate

*δ*_{0}with the MDCT values of*X*_{−},*X*_{0}, and*X*_{+}according to formula (21). - (3)
Finally, obtain the estimated value of

*l*,

*δ*

_{0}; we have derived a set of such formulas; for example,

However, the coefficients in (20) are the most suitable for a simple calculation.

## 4 Results and discussion

### 4.1 Comparison benchmarks

*k*

_{0}is the frequency bin that locates the maximum of the so-called pseudo-spectrum

*S*(

*k*),

*α*is the ratio of two MDCT coefficients,

### 4.2 Complexity comparison

#### 4.2.1 General

Complexity refers to the resources that an executable program of the algorithm requires; it includes time complexity and space complexity. Here, the time complexity is compared by accounting the required operations to estimate the frequency, and the space complexity refers to the storage space size required by the algorithm.

To compare the time complexity, operations such as addition, multiplication, division, square root, comparison, and bit-shift are accounted for each algorithm. Most existing MDCT domain frequency estimation algorithms [23–26] consist of two steps: find the frequency bin *k*
_{0} that corresponds to the integer part *l*
_{0} and estimate the fractional part *δ* using a decision method. Note that finding the bin index of the peak location is a common step for all algorithms and the operations are identical, so the operations to find this peak are not included in the comparison.

To compare the space complexity, the required space size to store the look-up table is accounted. The required space to locate the variables and intermediate results is not included in the comparison.

#### 4.2.2 The proposed estimator

According to the proposed frequency estimator in Section 3.2, with the bin index of the maximum |*X*(*k*)|, the operations to obtain the estimated value \( \widehat{l} \) is shown in (21), which includes three MDCT-coefficient-multiplications (*X*
_{_}
*X*
_{0}, *X*
_{0}
*X*
_{+}, and *X*
_{_}
*X*
_{+}), three constant-coefficient-multiplications (with 3 and 2), four additions, and one division. A multiplication with numbers such as 2 and 3 is usually substituted by one bit-shift and addition. Thus, in practice, three multiplications, five additions, one division, and three bit-shifts are used. Neither additional information nor other operation is required.

#### 4.2.3 Other MDCT domain estimators

First, all compared estimators find a peak location. [24–26] use other criteria after locating the initial maximum to obtain \( {\widehat{l}}_0 \), whereas Merdjani [23] and the simplified estimator locate the maximum of pseudo-spectrum that is converted from MDCT spectrum. The use of a pseudo-spectrum helps to find the exact \( {\widehat{l}}_0 \), but it also adds a certain amount of operations, which must be accounted in the comparison. Then, always with some decision algorithms (particularly in Zhu [24] and Dun [26]), the value of \( \widehat{\delta} \) is solved from a quadratic equation or computed from a look-up table with polynomial fitting.

^{−13}as reported in [26].

Comparison of the complexity

Estimators | Time complexity | Space complexity | |||||
---|---|---|---|---|---|---|---|

Addition | Multiplication | Division | Comparison | Square-root | Bit-shift | ||

Merdjani | 6 + 2 N | 4 + 2 N | 3 | 4 | 1 + N | – | – |

Zhu | 8 | 3 | 5 | 5 | 1 | – | – |

Zhang | 7 | 1 | 3 | 2 | – | – | 4096 |

Dun | 10 | 1 | 5 | 5 | – | – | 6144 |

Simplified | 5 + 2 N | 2 + 2 N | 2 | – | 1 + N | 1 | – |

Proposed | 5 | 3 | 1 | – | – | 3 | – |

Table 1 shows that the proposed estimator only requires several addition, multiplication, and division operations aside from three bit-shift operations (the simplest operation among the list). Neither comparison nor saving space is required. Obviously, the proposed estimator has the lowest complexity. The simplified algorithm has a similar complexity with the proposed estimator if the calculation of *S*(*k*) is not considered.

### 4.3 Simulation results and discussion

Simulations were conducted to verify the proposed frequency estimator and compare with other estimators. Herein, the results for both noiseless and noise-polluted cases are presented.

In all simulations, parameters were set according to the audio applications. The block size and window length were set to 2*N* = 2048, the sampling frequency was *f*
_{
s
} = 44.1 kHz, and the magnitude was *A* = 1. The initial phase *ϕ* was randomly generated in the range of (−*π*, *π*), which obeyed the uniform distribution. The estimation error of the frequency value, i.e., \( \varepsilon =\widehat{f}- f \) in Hertz (Hz), where *f* is the sinusoidal frequency and \( \widehat{f} \) is the estimated value, was measured by the maximum value *ε*
_{max} and mean square error (MSE). An MSE of 0 dB represents an error of 1 Hz.

*δ*varied from 0 to 1 with a step of 0.05. The signal frequency

*l*partially decides the model error when simplifies the original form (9) to expression (11); therefore, two values, 46 and 510, were used for its integer part

*l*

_{0}in this test. The value of 46 is a bin number that corresponds to approximately 1 kHz according to values of

*f*

_{ s }and

*N*. The value of 510 is approximately half of the MDCT bin index, which can minimize the interference caused by the negative frequency of a real-valued sinusoidal input. The results of the noise-free condition are shown in Fig. 1.

As expected, both MSE and maximum error are larger for all estimators when *l*
_{0} = 46. In this frequency domain, the proposed estimator exhibits a slightly larger MSE and maximum error compared to Merdjani, Zhu, and Dun’s methods but significantly less than Zhang’s method and the simplified method. In other words, although no conditional construct is used, the proposed estimator exhibits similar precision to the ones that have conditional branches, whereas other existing estimators significantly lose their accuracy. When *l*
_{0} = 510, the maximum error of the proposed estimator remains similar to other estimators that have conditional branches.

For both cases, the proposed estimator has a slightly larger MSE than the other branched method. The degradation in performance is mainly caused by the third coefficient. In [23, 24, 26], additional decisions are made to select the largest two values. In the proposed estimator, three values are required; neither decision algorithm nor conditional branch instruction is used. Thus, an ultra-low-complexity approach is obtained. Fortunately, the MSE remains near or below 10^{−10} for most frequencies.

^{−10}to greater than 10

^{−3}. A level of 10

^{−2}is shown for the proposed estimator, which corresponds to an error of 0.1 Hz.

*l*

_{0}was set to 46, which corresponded to approximately 1 kHz;

*δ*was set to be randomly uniformly distributed in (0, 1). The results are shown in Fig. 3. Basically, for SNR higher than 20 dB, the MSEs of the proposed estimator are less than 1 Hz. The maximum sidelobe level of the sine window is −23 dB; thus, for two frequency components, a distance greater than one and a half bin guarantees that the interference is less than −23 dB. According to the parameter settings, this 1.5 bin distance corresponds to 32.3 Hz frequency offset, which is similar to the frequency difference of two music notes: C1 (261.6 Hz) to D1 (293.7 Hz). But in practice, the distance between the notes of a chord is greater than this value. Thus, the proposed estimator is suitable for the low-complexity frequency estimation at such high SNR situation.

### 4.4 Evaluation with real audio signals

In this part, the proposed algorithm is evaluated with real audio signals. After estimating the major components of an audio signal with sinusoidal model parameters (frequency, amplitude, and phase), the signal is reconstructed by the estimated components. The performances of the various methods are evaluated by comparing the original and the reconstructed signals.

In general, the major components of an audio signal are obtained by the following steps: firstly, finding the largest peak in the spectrum and estimating single-tone parameters from it; secondly, subtracting this estimated tone from the spectrum. These two steps are repeated until all major tones are estimated. This procedure is recommended in multiple component estimation algorithms because it enables detection of any tones that are initially masked by leakage from nearby large peaks.

In specific, the frequency of each component is estimated firstly; then, the amplitude and phase are estimated with the method given in Merdjani [23]. The proposed algorithm and the five benchmarks are used to get the estimated frequencies. To make comparison in a uniform framework, the components of an audio signal are estimated in the same order by all of the algorithms.

^{−4}are used as criteria to stop component extraction of a frame. An overlap of 50% is used between subsequent frames both in MDCT analysis and in waveform reconstruction. Figure 4 presents a detailed part of the reconstructed signal of “es01” when the proposed frequency estimation algorithm is used, and compares it with the original signal. It can be observed that the reconstructed waveform is almost the same with the original audio.

The description of the 12 MPEG mono sequences

Name | Time/s | Type |
---|---|---|

es01 | 10.73 | Suzanne Vega |

es02 | 8.7 | Male speech, German |

es03 | 7.6 | Female speech, English |

sc01 | 10.97 | Haydn trumpet concert |

sc02 | 12.73 | Classical orchestral music |

sc03 | 11.55 | Contemporary pop music |

si01 | 8 | Harpsichord/cembalo |

si02 | 7.73 | Castanets |

si03 | 27.89 | Pitch pipe |

sm01 | 11.15 | Bagpipe |

sm02 | 10.1 | Glockenspiel |

sm03 | 13.99 | Plucked strings |

The results of Figs. 5 and 6 show that the performance of the reconstructed audio signal remains similar to other estimators except the two most complexed ones although the proposed algorithm reduces the complexity greatly. The proposed algorithm avoids the spectrum conversion (from MDCT to pseudo-spectrum) used in Merdjani [23] and the simplified algorithm so that the algorithm complexity is irrelevant to the frame length *N* (as shown in Table 1, typical frame length of audio signal is 1024, 512, or so). At the same time, the proposed algorithm avoids the conditional constructs, which is beneficial to the speed of a frequency estimator in pipelined processor.

## 5 Conclusions

A low-complexity frequency estimator that operates with three MDCT coefficients and only several simple calculations is proposed in this paper. The analytical expression of the MDCT coefficients, which is the basis of the proposed estimator, is also presented. The proposed estimator shows a great reduction in complexity compared to other MDCT domain estimators and provides a good complexity/performance tradeoff. Without using conditional branch instructions, this estimator is especially fit for pipelined operators.

## Declarations

### Funding

This research was supported in part by the National Natural Science Foundation of China under Grants NSFC61173110, NSFC61373113, NSFC61372091, NSFC61671365 and NSFC U1531141.

### Authors’ contributions

YD was responsible for proposing the algorithm and drafting the manuscript. GL and XH provided the comments on the verification tests and the drafts. All authors have read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- P Stoica, RL Moses,
*Spectral analysis of signals*(Pearson/Prentice Hall, Upper Saddle River, 2005)Google Scholar - VF Pisarenko, The retrieval of harmonics from a covariance function. Geophys. J. Int.
**33**(3), 347–366 (1973)View ArticleMATHGoogle Scholar - RO Schmidt, Multiple emitter location and signal parameter estimation. Antennas and Propagation IEEE Transactions on
**34**(3), 276–280 (1986)View ArticleGoogle Scholar - R Roy, T Kailath, ESPRIT-estimation of signal parameters via rotational invariance techniques. Acoustics, Speech and Signal Processing IEEE Transactions on
**37**(7), 984–995 (1989)View ArticleMATHGoogle Scholar - BG Quinn, Estimating frequency by interpolation using Fourier coefficients. Signal Processing, IEEE Transactions on
**42**(5), 1264–1268 (1994)View ArticleGoogle Scholar - MD Macleod, Fast nearly ML estimation of the parameters of real or complex single tones or resolved multiple tones. Signal Processing, IEEE Transactions on
**46**(1), 141–148 (1998)View ArticleGoogle Scholar - E Jacobsen, P Kootsookos, Fast, accurate frequency estimators [DSP Tips & Tricks]. Signal Processing Magazine, IEEE
**24**(3), 123–125 (2007)View ArticleGoogle Scholar - C Candan, Analysis and further improvement of fine resolution frequency estimation method from three DFT samples. Signal Processing Letters, IEEE
**20**(9), 913–916 (2013)View ArticleGoogle Scholar - H Kawahara, I Masuda-Katsuse, A De Cheveigne, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Comm.
**27**(3), 187–207 (1999)View ArticleGoogle Scholar - EB George, MJ Smith, Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model. Speech and Audio Processing, IEEE Transactions on
**5**(5), 389–406 (1997)View ArticleGoogle Scholar - A. Eronen, and A. Klapuri, Musical instrument recognition using cepstral coefficients and temporal features. (Acoustics, Speech, and Signal Processing, ICASSP’00. 2000 IEEE International Conference on, Istanbul, 2000), pp. II753-II756 vol. 2Google Scholar
- DPN Rodríguez, JA Apolinário, LWP Biscainho, Audio authenticity: detecting ENF discontinuity with high precision phase analysis. Information Forensics and Security, IEEE Transactions on
**5**(3), 534–543 (2010)View ArticleGoogle Scholar - S.-U. Ryu, and K. Rose, An mdct domain frame-loss concealment technique for mpeg advanced audio coding. (Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, Honolulu, 2007), pp. I-273-I-276Google Scholar
- M.-Y. Zhu, N. Chen, X.-Q. Yu, and W.-G. Wan, Packet Loss Concealment for compressed audio stream using sinusoidal frequency estimation. (Multimedia and Expo (ICME), 2010 IEEE International Conference on, Suntec City, 2010), pp. 316–321Google Scholar
- H. Purnhagen, and N. Meine, HILN—the MPEG-4 parametric audio coding tools. (Circuits and Systems, The 2000 IEEE International Symposium on, Geneva, 2000), pp. 201–204Google Scholar
- A. C. Den Brinker, J. Breebaart, P. Ekstrand, J. Engdegård, F. Henn, K. Kjörling, W. Oomen, and H. Purnhagen, An overview of the coding standard MPEG-4 audio amendments 1 and 2: HE-AAC, SSC, and HE-AAC v2, EURASIP Journal on Audio, Speech, and Music Processing. 2009(3(2009)Google Scholar
- JP Princen, AB Bradley, Analysis/synthesis filter bank design based on time domain aliasing cancellation. Acoustics, Speech and Signal Processing, IEEE Transactions on
**34**(5), 1153–1161 (1986)View ArticleGoogle Scholar - S Zhang, L Girin, Fast and accurate direct MDCT to DFT conversion with arbitrary window functions. Audio, Speech, and Language Processing, IEEE Transactions on
**21**(3), 567–578 (2013)View ArticleGoogle Scholar - AJS Ferreira,
*Accurate estimation in the ODFT domain of the frequency, phase and magnitude of stationary sinusoids*(Applications of Signal Processing to Audio and Acoustics, 2001 IEEE Workshop, New Platz, 2001), pp. 47–50Google Scholar - A. J. Ferreira, and D. Sinha, Accurate and robust frequency estimation in the ODFT domain. (Applications of Signal Processing to Audio and Acoustics, 2005 IEEE Workshop on New Paltz, NY, 2005), pp. 16–19Google Scholar
- Y Dun, G Liu, A fine-resolution frequency estimator in the odd-DFT domain. IEEE Signal Processing Letters
**22**(12), 2489–2493 (2015)View ArticleGoogle Scholar - L Daudet, M Sandler, MDCT analysis of sinusoids: exact results and applications to coding artifacts reduction. Speech and Audio Processing, IEEE Transactions on
**12**(3), 302–312 (2004)View ArticleGoogle Scholar - S Merdjani, L Daudet,
*Direct estimation of frequency from MDCT-encoded files*(Proceedings of the 6th International Conference on Digital Audio Effects, London, 2003), pp. 8–11Google Scholar - M-Y Zhu, W Zheng, D-X Li, M Zhang, An accurate low complexity algorithm for frequency estimation in MDCT domain. IEEE Trans. Consum. Electron.
**54**(3), 1022–1028 (2008)View ArticleGoogle Scholar - S Zhang, W Dou, H Yang, MDCT sinusoidal analysis for audio signals analysis and processing. Audio, Speech, and Language Processing, IEEE Transactions on
**21**(7), 1403–1414 (2013)View ArticleGoogle Scholar - Y Dun, G Liu,
*An improved MDCT domain frequency estimation method*((Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference, Xi’an, 2014), pp. 120–123Google Scholar