New aliasing cancelation algorithm for the transition between non-aliased and TDAC-based coding modes
© Song and Kang; licensee Springer. 2014
Received: 1 August 2013
Accepted: 7 January 2014
Published: 27 January 2014
This paper proposes a new aliasing cancelation algorithm for the transition between non-aliased coding and transform coding with time domain aliasing cancelation (TDAC). It is effectively utilized for unified speech and audio coding (USAC) that was recently standardized by the Moving Picture Experts Group (MPEG). Since the USAC combines two coding methods with totally different structures, a special processing called forward aliasing cancelation (FAC) is needed at the transition region. Unlike the FAC algorithm embedded in the current standard, the proposed algorithm does not require additional bits to encode aliasing cancelation terms because it appropriately utilizes adjacent decoded samples. Consequently, around 5% of total bits are saved at 16- and 24-kbps operating modes in speech-like signals. The proposed algorithm is sophisticatedly integrated on the decoding module of the USAC common encoder (JAME) for performance verification, which follows the standard process exactly. Both objective and subjective experimental results confirm the feasibility of the proposed algorithm, especially for contents that require a high percentage of mode switching.
Unified speech and audio coding (USAC; ISO/IEC 23003-3) standardized in early 2012 shows the best performance for speech, music, and mixed type of input signals . Verification tests confirmed the superiority of quality, especially at low bit rates . In an initial stage of designing the coding structure, it was not possible to acquire high-quality output to all input contents because only a single type of traditional audio or speech coding structure was adopted. The best result could be obtained by simultaneously running two types of codecs: Adaptive Multi-rate Wideband plus (AMR-WB+ ) for speech signals and high-efficiency advanced audio coding (HE-AAC ) for audio signals. In case of encoding signals with mixed characteristics, one of two coding modes is chosen depending on the characteristic of input contents. Although this approach improves the quality of all types of contents, many problems occur at transition frames where mode switching is needed between entirely different types of codecs. For example, the segment of perceptually weighted signal encoded by speech codec needs to be smoothly combined with that of the signal encoded by audio codec. Since the characteristic of speech and audio codec is different, however, the overlapped segment between two codecs must not be similar to the input signal. How to determine the encoding mode for the various types of input signal is also important. The problems are mostly solved by adopting novel technologies such as a signal classifier, frequency domain noise shaping (FDNS), and forward aliasing cancelation (FAC) technique .
The FAC algorithm is one of the key technologies in USAC, which enables the successful combination of two different types of codecs, especially at transition frames. To remove the aliasing terms caused by cascading different types of codecs, FAC additionally generates the aliasing cancellation signals, and then they are quantized and transmitted to the decoder. In the earlier version of USAC that had not introduced the FAC technique, the frame boundary of transition frame was variable; thus, a special windowing operation was needed for compensating the aliased signal in the overlap region. Although FAC somewhat solves the problem, it still requires additional bits.
This paper proposes a new aliasing cancelation algorithm that does not need any additional bits, which uses the decoded signal of the adjacent frames. At first, the algorithm generates the relevant aliasing cancelation part by considering the error that occurred by the encoding mode switching. Then, the output signals are reconstructed by adding the generated aliasing cancelation part to the decoded signal and by normalizing the weight caused by the encoding window. In the overall process, the most important thing is how to obtain the aliasing cancelation part by properly utilizing the adjacent signal.
The aliasing cancelation process of the proposed algorithm is conceptually similar to that of the block switching compensation scheme proposed for low delay advanced audio coding (AAC-LD [6, 7]). In the literature, the scheme introduced time domain weightings applicable as a post processing in the decoder in order to remove a look-ahead delay inevitable for a window transition from the long window to the short window. This is similarly considered as an aliasing cancellation signal described in this paper. However, its application and the resulting aliasing form are different.
A new aliasing cancelation algorithm is sophisticatedly integrated in the decoding module of the USAC common encoder (JAME) , which has been designed by our team as an open source paradigm. Objective and subjective test results show that the proposed method has comparable quality to the FAC algorithm while saving the bits for encoding the aliasing signal component in the FAC algorithm.
Section 2 describes the overview of USAC techniques and FAC algorithm. In Section 3, the proposed algorithm is explained in detail. In Section 4, experiments and evaluation results are also described.
2 USAC overview and FAC algorithm
2.2 Forward aliasing cancelation algorithm
Since the USAC consists of two different types of coding methods, it is very important to handle the transition frame where the encoding mode is switched from FD codec to TD codec or vice versa. Note that the MDCT removes the aliasing part of the current frame by combining the signal decoded at the following frame. However, if the encoding mode of the next frame is TD codec, the aliasing term must not be generally canceled. In an initial version of USAC, this problem was solved by discarding the aliased signal and using inconsistent frame length. When the frame length of TD codec is decreased due to aliased signal, the following frame length is increased for synchronizing the starting position of FD codec .
3 Proposed aliasing cancelation algorithm
Theoretically, if there is no quantization error, the FAC algorithm and new aliasing cancelation algorithm are able to perfectly reconstruct the original signal in the transition frame. Practically, since the quantization error is generated by several passes of non-linear filters in the time and frequency domain, it is very difficult to mathematically model the impact of the error. However, it is clear that the FAC method has a quantization error in the frequency domain, while the proposed algorithm includes the error caused by ACELP encoding and inverse windowing. Accordingly, the amount of quantization error can be evaluated and compared by measuring signal-to-noise ratio (SNR) values. As will be shown from the experimental results given in the next section, there is no difference between the proposed and the conventional FAC algorithm. Subjective listening test also confirms the result.
4 Performance evaluation
4.1 Simulation setup and implementation
Test items for the evaluation of the proposed algorithm
Actual achieved bit rates by each item in the operating mode
4.2 Objective test
SNR at 12-, 16-, and 24-kbps operating modes
where Nfac is the number of FAC frames, and N is the number of total frames.
4.3 Subjective test
Subjective test environment
Number of subjects
Systems under the test
ref : Hidden reference
lp35 : 3.5 kHz Low-pass anchor
Conv. : JAME with FAC
Prop.-B : JAME with New AC
12, 16, and 24 kbps mono
The synthesized signal using the proposed algorithm (Prop.-B) has comparable performance to the FAC algorithm (Conv.). Note again that the proposed method does not need additional bits to remove the aliasing term as we have explained before.
Although the FAC algorithm solves the switching problem caused by combining two heterogeneous types of coders, i.e., time domain coder and frequency domain coder, it needs additional bits to cancel out the aliasing components at every transition frame. The proposed new aliasing cancelation algorithm does not need additional bits because it efficiently utilizes decoded signals in the adjacent frames. The proposed algorithm is sophisticatedly integrated into the recently released open-source platform. In case of speech-like signals, it saves over 5% of the total bits compared with the conventional FAC algorithm. Both subjective listening tests and objective tests confirmed that the proposed algorithm showed comparable quality to the conventional FAC algorithm, but it does not require any additional bits for FAC encoding.
JS received his B.S. and M.S. degrees in electrical and electronic engineering from Yonsei University, Seoul, South Korea, in 2004 and 2008, respectively. He is currently pursuing his Ph.D. degree at Yonsei University. His research interests include speech coding, unified speech and audio coding, spatial audio coding, and 3D audio. HGK (M94) received his B.S., M.S., and Ph.D. degrees in electronic engineering from Yonsei University, Seoul, South Korea, in 1989, 1991, and 1995, respectively. He was a Senior Member of the Technical Staff at AT&T, Labs-Research, from 1996 to 2002. In 2002, he joined the Department of Electrical and Electronic Engineering, Yonsei University, where he is currently a professor. His research interests include speech signal processing, array signal processing, and pattern recognition.
The authors would like to thank the reviewers for their suggestions which have contributed a lot to the great improvement of the manuscript.
- Neuendorf M, Multrus M, Rettelbach N, Fuchs G, Robilliard J, Lecomte J, Wilde S, Bayer S, Disch S, Helmrich C, Lefebvre R, Gournay P, Bessette B, Lapierre J, Kjörling K, Purnhagen H, Villemoes L, Oomen W, Schuijers E, Kikuiri K, Chinen T, Norimatsu T, Seng CK, Oh E, Kim M, Quackenbush S, Grill B: MPEG unified speech and audio coding - the ISO/MPEG standard for high-efficiency audio coding of all content types. In 130th AES Convention. Budapest; 26–29 April 2012.Google Scholar
- ISO/IEC JTC1/SC29/WG11: Unified Speech and Audio Coding Verification Test Report N12232. ISO/IEC JTC 1, New York; 2011.Google Scholar
- Makinen J, Bessette B, Bruhn S, Ojala P, Salami R, Taleb A: AMR-WB+: a new audio coding standard for 3RD generation mobile audio services. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP ‘05) 2005, 2: 1109-1112.Google Scholar
- Wolters M, Kjorling K, Homm D, Purnhagen H: Closer look into MPEG-4 high efficiency AAC. In 115th AES Convention. Jacob K Javits Convention Center, New York; 10–13 October 2003.Google Scholar
- ISO/IEC JTC1/SC29/WG11: Proposal for Unification of USAC Windowing and Frame Transitions M17020. ISO/IEC JTC 1, New York; 2009.Google Scholar
- Virette D, Kövesi B, Philippe P: Adaptive time-frequency resolution in modulated transform at reduced delay. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP ‘08) 2008, 2: 3781-3784.Google Scholar
- ISO/IEC JTC1/SC29/WG11: Proposed Core Experiment for Enhanced Low Delay AAC M14237. ISO/IEC JTC 1, New York; 2007.Google Scholar
- ISO/IEC JTC1/SC29/WG11: Unified Speech and Audio Coder Common Encoder Reference Software N12022. ISO/IEC JTC 1, New York; 2011.Google Scholar
- ISO/IEC JTC1/SC29/WG11: ISO/IEC 23003-3/FDIS, Unified Speech and Audio Coding N12231. ISO/IEC JTC 1, New York; 2011.Google Scholar
- Princen JP, Bradley AB: Analysis/synthesis filter bank design based on time domain aliasing cancellation. IEEE Trans. Acoustics Speech Signal Process 1986, 34(5):1153-1161. 10.1109/TASSP.1986.1164954View ArticleGoogle Scholar
- Brandenburg K, Bosi M: Overview of MPEG audio: current and future standards for low-bit-rate audio coding. J. Audio Eng. Soc 1997, 45(1–2):4-21.Google Scholar
- Johnston JD: Estimation of perceptual entropy using noise masking criteria. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP ‘98) 1998, 5: 2524-2527.Google Scholar
- Fuchs G, Multrus M, Neuendorf M, Geiger R: MDCT-based coder for highly adaptive speech and audio coding. In European Signal Processing Conference (EUSIPCO 2009). Glasgow; August 2009:24-28.Google Scholar
- Fuchs G, Subbaraman V, Multrus M: Efficient context adaptive entropy coding for real-time applications. In IEEE International Conference on Acoustics Speech Signal Process. (ICASSP ‘11). IEEE, Piscataway; 2011:493-496.View ArticleGoogle Scholar
- Lecomte J, Gournay P, Geiger R, Bessette B, Neuendorf M: Efficient cross-fade windows for transitions between LPC-based and non-LPC based audio coding. In 126th AES Convention. Munich; 7–10 May 2009.Google Scholar
- Liu C-M, Lee W-C: Unified fast algorithm for cosine modulated filter banks in current audio coding standards. J. Audio Eng. Soc 1999, 47(12):1061-1075.Google Scholar
- Horn RA: The Hadamard product. Symp. Appl. Math 1990, 40: 87-169.MathSciNetView ArticleGoogle Scholar
- ISO/IEC JTC1/SC29/WG11: Verification Test Report on USAC Common Encoder, JAME N13215. ISO/IEC JTC 1, New York; 2012.Google Scholar
- ITU: Recommendation ITU-R BS.1534-1. Method for the Subjective Assessment of Intermediate Quality Level of Coding Systems 2001–2003. International Telecommunication Union, Geneva; 2003.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.