Perceptual Coding of Audio Signals Using Adaptive Time-Frequency Transform

Umapathy, Karthikeyan; Krishnan, Sridhar

doi:10.1155/2007/51563

Research Article
Open access
Published: 19 August 2007

Perceptual Coding of Audio Signals Using Adaptive Time-Frequency Transform

Karthikeyan Umapathy¹ &
Sridhar Krishnan¹

EURASIP Journal on Audio, Speech, and Music Processing volume 2007, Article number: 051563 (2007) Cite this article

1532 Accesses
3 Citations
Metrics details

Abstract

Wide band digital audio signals have a very high data-rate associated with them due to their complex nature and demand for high-quality reproduction. Although recent technological advancements have significantly reduced the cost of bandwidth and miniaturized storage facilities, the rapid increase in the volume of digital audio content constantly compels the need for better compression algorithms. Over the years various perceptually lossless compression techniques have been introduced, and transform-based compression techniques have made a significant impact in recent years. In this paper, we propose one such transform-based compression technique, where the joint time-frequency (TF) properties of the nonstationary nature of the audio signals were exploited in creating a compact energy representation of the signal in fewer coefficients. The decomposition coefficients were processed and perceptually filtered to retain only the relevant coefficients. Perceptual filtering (psychoacoustics) was applied in a novel way by analyzing and performing TF specific psychoacoustics experiments. An added advantage of the proposed technique is that, due to its signal adaptive nature, it does not need predetermined segmentation of audio signals for processing. Eight stereo audio signal samples of different varieties were used in the study. Subjective (mean opinion score—MOS) listening tests were performed and the subjective difference grades (SDG) were used to compare the performance of the proposed coder with MP3, AAC, and HE-AAC encoders. Compression ratios in the range of 8 to 40 were achieved by the proposed technique with subjective difference grades (SDG) ranging from –0.53 to –2.27.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29]

References

Painter T, Spanias A: Perceptual coding of digital audio. Proceedings of the IEEE 2000,88(4):451-515. 10.1109/5.842996
Article Google Scholar
Mallat SG, Zhang Z: Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing 1993,41(12):3397-3415. 10.1109/78.258082
Article MATH Google Scholar
Umapathy K, Krishnan S: Joint time-frequency coding of audio signals. Proceedings of the 5th WSES/IEEE Multiconference on Circuits, Systems, Communications, and Computers (CSCC '01), July 2001, Crete, Greece 32-36.
Google Scholar
Umapathy K, Krishnan S: Low bit-rate coding of wideband audio signals. Proceedings of IASTED International Conference on Signal Processing, Pattern Recognition and Applications (SPPRA '01), July 2001, Rhodes, Greece 101-105.
Google Scholar
Heusdens R, Vafin R, Kleijn WB: Sinusoidal modeling using psychoacoustic-adaptive matching pursuits. IEEE Signal Processing Letters 2002,9(8):262-265. 10.1109/LSP.2002.802999
Article Google Scholar
Verma TS, Meng THY: Sinusoidal modeling using frame-based perceptually weighted matching pursuits. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 2: 981-984.
Article Google Scholar
Heusdens R, Jensen J, Korten P, Vafin R: Rate-distortion optimal high-resolution differential quantisation for sinusoidal coding of audio and speech. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (ASPAA '05), October 2005, New Paltz, NY, USA 243-246.
Google Scholar
Heusdens R, Jensen J: Jointly optimal time segmentation, component selection and quantization for sinusoidal coding of audio and speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 3: 193-196.
Google Scholar
I. JTC1/SC29/WG11 : Overview of the MPEG-4 standard. in International Organization for Standardisation, March 2002
Herre J, Grill B: Overview of MPEG-4 audio and its applications in mobile communications. Proceedings of the 5th International Conference on Signal Processing (ICSP '00), August 2000, Beijing, China 1: 11-20.
Article Google Scholar
Herre J, Brandenburg K, et al.: Second generation ISO/MPEG audio layer-3 coding. The 98th Audio Engineering Society Convention (AES '95), February 1995, Paris, France
Google Scholar
Meltzer S, Moser G: MPEG-4 HE-AAC v2—audio coding for today's digital media world. EBU Technical Review, Geneva, Switzerland; 2006.
Google Scholar
Ryden T: Using listening tests to assess audio codecs. In Collected Papers on Digital Audio Bit Rate Reduction. Edited by: Gilchrist N, Grewin C. Audio Engineering Society, New York, NY, USA; 1996:115-125.
Google Scholar
Mallat S: A wavelet Tour of Signal Processing. Academic Press, San Diego, Calif, USA; 1998.
MATH Google Scholar
Cohen L: Time-frequency distributions: a review. Proceedings of the IEEE 1989,77(7):941-981. 10.1109/5.30749
Article Google Scholar
Brandenburg K, Bosi M: MPEG-2 advanced audio coding: overview and applications. The 103rd Audio Engineering Society Convention, August 1997, Ney York, NY, USA 4641.
Google Scholar
http://www.apple.com/MPEG4/aac/
Eberlein E, Popp H: Layer-3, a flexible coding standard. The 94th Audio Engineering Society Convention, March 1993, Berlin, Germany 3493.
Google Scholar
http://www.iis.fraunhofer.de/bf/amm/
http://lame.sourceforge.net/index.php
http://www.mp3dev.org/
http://www.winamp.com/
Orfanidis SJ: Introduction to Signal Processing. Prentice-Hall, Englewood Cliffs, NJ, USA; 1996.
Google Scholar
Goodwin MM: Adaptive Signal Models: Theory, Algorithms and Audio Applications. Kluwer Academic Publishers, Boston, Mass, USA; 1998.
Book Google Scholar
ISO/IEC JTC 1/SC 29/WG 11N6675 : Report on the verification tests of MPEG-4 parametric coding for high quality audio. in International Organization for Standardisation, July 2004
Heusdens R, Jensen J, Kleijn WB, et al.: Bit-rate scalable intraframe sinusoidal audio coding based on rate-distortion optimization. Journal of the Audio Engineering Society 2006,54(3):167-188.
Google Scholar
van de Par S, Kohlrausch A, Heusdens R, Jensen J, Jensen SH: A perceptual model for sinusoidal audio coding based on spectral integration. EURASIP Journal on Applied Signal Processing 2005,2005(9):1292-1304. 10.1155/ASP.2005.1292
Article MATH Google Scholar
Korten P, Jensen J, Heusdens R: High-resolution spherical quantization of sinusoidal parameters. IEEE Transactions on Audio, Speech, and Language Processing 2007,15(3):966-981.
Article Google Scholar
Christensen MG, van de Par S: Efficient parametric coding of transients. IEEE Transactions on Audio, Speech and Language Processing 2006,14(4):1340-1351.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Ryerson University, 350 Victoria Street, Toronto, ON, M5B 2K3, Canada
Karthikeyan Umapathy & Sridhar Krishnan

Authors

Karthikeyan Umapathy
View author publications
You can also search for this author in PubMed Google Scholar
Sridhar Krishnan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karthikeyan Umapathy.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Umapathy, K., Krishnan, S. Perceptual Coding of Audio Signals Using Adaptive Time-Frequency Transform. J AUDIO SPEECH MUSIC PROC. 2007, 051563 (2007). https://doi.org/10.1155/2007/51563

Download citation

Received: 22 January 2006
Revised: 10 November 2006
Accepted: 05 July 2007
Published: 19 August 2007
DOI: https://doi.org/10.1155/2007/51563

Perceptual Coding of Audio Signals Using Adaptive Time-Frequency Transform

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords