Wideband Speech Recovery Using Psychoacoustic Criteria

Berisha, Visar; Spanias, Andreas

doi:10.1155/2007/16816

Research Article
Open access
Published: 29 August 2007

Wideband Speech Recovery Using Psychoacoustic Criteria

Visar Berisha¹ &
Andreas Spanias¹

EURASIP Journal on Audio, Speech, and Music Processing volume 2007, Article number: 016816 (2007) Cite this article

1374 Accesses
6 Citations
Metrics details

Abstract

Many modern speech bandwidth extension techniques predict the high-frequency band based on features extracted from the lower band. While this method works for certain types of speech, problems arise when the correlation between the low and the high bands is not sufficient for adequate prediction. These situations require that additional high-band information is sent to the decoder. This overhead information, however, can be cleverly quantized using human auditory system models. In this paper, we propose a novel speech compression method that relies on bandwidth extension. The novelty of the technique lies in an elaborate perceptual model that determines a quantization scheme for wideband recovery and synthesis. Furthermore, a source/filter bandwidth extension algorithm based on spectral spline fitting is proposed. Results reveal that the proposed system improves the quality of narrowband speech while performing at a lower bitrate. When compared to other wideband speech coding schemes, the proposed algorithms provide comparable speech quality at a lower bitrate.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48]

References

Spanias A: Speech coding: a tutorial review. Proceedings of the IEEE 1994,82(10):1541-1582. 10.1109/5.326413
Article Google Scholar
Unno T, McCree A: A robust narrowband to wideband extension system featuring enhanced codebook mapping. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1: 805-808.
Google Scholar
Jax P, Vary P: Enhancement of band-limited speech signals. Proceedings of the 10th Aachen Symposium on Signal Theory, September 2001, Aachen, Germany 331-336.
Google Scholar
Jax P, Vary P: Artificial bandwidth extension of speech signals using MMSE estimation based on a hidden markov model. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '03), April 2003, Hong Kong 1: 680-683.
Article MATH Google Scholar
Nilsson M, Kleijn WB: Avoiding over-estimation in bandwidth extension of telephony speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '01), May 2001, Salt Lake, Utah, USA 2: 869-872.
Google Scholar
Jax P, Vary P: An upper bound on the quality of artificial bandwidth extension of narrowband speech signals. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 237-240.
Google Scholar
Nilsson M, Andersen S, Kleijn W: On the mutual information between frequency bands in speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 3: 1327-1330.
Google Scholar
Nilsson M, Gustafsson H, Andersen SV, Kleijn WB: Gaussian mixture model based mutual information estimation between frequency bands in speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 525-528.
Google Scholar
Chan C-F, Hui W-K: Wideband re-synthesis of narrowband CELP coded speech using multiband excitation model. Proceedings of the International Conference on Spoken Language Processing (ICSLP '96), October 1996, Philadelphia, Pa, USA 1: 322-325.
Article Google Scholar
Berisha V, Spanias A: Enhancing the quality of coded audio using perceptual criteria. Proccedings of the 7th IEEE Workshop on Multimedia Signal Processing (MMSP '05), October 2005, Shanghai, China 1-4.
Google Scholar
Berisha V, Spanias A: Enhancing vocoder performance for music signals. Proccedings of IEEE International Symposium on Circuits and Systems (ISCAS '05), May 2005, Kobe, Japan 4: 4050-4053.
Article Google Scholar
Berisha V, Spanias A: Bandwidth extension of audio based on partial loudness criteria. Proceedings of the 8th IEEE Workshop on Multimedia Signal Processing (MMSP '06), October 2006, Victoria, BC, Canada 146-149.
Google Scholar
Edler B, Schuller G: Audio coding using a psychoacoustic pre- and post-filter. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 2: 881-885.
Google Scholar
ITU-T Recommen-dation G.729.1 : G.729 based Embedded variable bit-rate coder: an 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729. 2006.
Google Scholar
Moore BCJ, Glasberg BR, Baer T: A model for the prediction of thresholds, loudness, and partial loudness. Journal of the Audio Engineering Society 1997,45(4):224-240.
Google Scholar
AMR Narrowband Speech Codec : Transcoding Functions. 2001.
Google Scholar
AMR Wideband Speech Codec : Transcoding Functions. 2003.
Google Scholar
Yasukawa H: Enhancement of telephone speech quality by simple spectrum extrapolation method. Proceedings of the 4th European Conference on Speech Communication and Technology (EUROSPEECH '95), September 1995, Madrid, Spain 1545-1548.
Google Scholar
Yasukawa H: Signal restoration of broad band speech using nonlinear processing. Proceedings of European Signal Processing Conference (EUSIPCO '96), September 1996, Trieste, Italy 987-990.
Google Scholar
Yasukawa H: Wideband speech recovery from bandlimited speech in telephone communications. Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '98), May-June 1998, Monterey, Calif, USA 4: 202-205.
Google Scholar
Larson E, Aarts R: Audio Bandwidth Extension. 1st edition. John Wiley & Sons, West Sussex, UK; 2005.
Google Scholar
Carl H, Heute U: Bandwidth enhancement of narrow-band speech signals. Proceedings of the 7th European Signal Processing Conference (EUSIPCO '94), September 1994, Edinburgh, Scotland 2: 1178-1181.
Google Scholar
Yoshida Y, Abe M: An algorithm to reconstruct wideband speech from narrowband speech based on codebook mapping. Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP '94), September 1994, Yokohama, Japan 1591-1594.
Google Scholar
Cheng Y, O'Shaughnessy D, Mermelstein P: Statistical recovery of wideband speech from narrowband speech. IEEE Transactions on Speech and Audio Processing 1994,2(4):544-548. 10.1109/89.326637
Article Google Scholar
Yao S, Chan CF: Block-based bandwidth extension of narrowband speech signal by using CDHMM. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1: 793-796.
Google Scholar
Nakatoh Y, Tsushima M, Norimatsu T: Generation of broadband speech from narrowband speech using piecewise linear mapping. Proceedings of the 5th European Conference on Speech Communication and Technology (EUROSPEECH '97), September 1997, Rhodes, Greece 3: 1643-1646.
Google Scholar
Avendano C, Hermansky H, Wan E: Beyond nyquist: towards the recovery of broad-bandwidth speech from narrow-bandwidth speech. Proceedings of the 4th European Conference on Speech Communication and Technology (EUROSPEECH '95), Septemver 1995, Madrid, Spain 1: 165-168.
Epps J: Wideband extension of narrowband speech for enhancement and coding, Ph.D. dissertation. 2000.
Google Scholar
Dietz M, Liljeryd L, Kjorling K, Kunz O: Spectral band replication, a novel approach in audio coding. Proceedings of 112th AES Audio Engineering Society, May 2002, Munich, Germany 5553.
Google Scholar
Kroon P, Kleijn W: Linear prediction-based analysis-by-synthesis coding. In Speech Coding and Synthesis. Elsevier Science, New York, NY, USA; 1995:81-113.
Google Scholar
Hermansky H: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 1990,87(4):1738-1752. 10.1121/1.399423
Article Google Scholar
Strube HW: Linear prediction on a warped frequency scale. Journal of the Acoustical Society of America 1980,68(4):1071-1076. 10.1121/1.384992
Article Google Scholar
Information Technology-Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/sec 1992.
Moore BC: An Introduction to the Psychology of Hearing. 5th edition. Academic Press, New York, NY, USA; 2003.
Google Scholar
The digital theater systems (dts) http://www.dtsonline.com/
Davidson G: Digital audio coding: dolby AC-3. In The Digital Signal Processing Handbook. CRC Press, New York, NY, USA; 1998:41.1-41.21.
Google Scholar
Painter T, Spanias A: Perceptual segmentation and component selection for sinusoidal representations of audio. IEEE Transactions on Speech and Audio Processing 2005,13(2):149-162.
Article Google Scholar
Atti V, Spanias A: Speech analysis by estimating perceptually relevant pole locations. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1: 217-220.
Google Scholar
Purnhagen H, Meine N, Edler B: Sinusoidal coding using loudness-based component selection. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1817-1820.
Google Scholar
Paillard B, Mabilleau P, Morissette S, Soumagne J: Perceval: perceptual evaluation of the quality of audio signals. Journal of the Audio Engineering Society 1992,40(1-2):21-31.
Google Scholar
Colomes C, Lever M, Rault J, Dehery Y: A perceptual model applied to audio bit-rate reduction. Journal of Audio Engneering Society 1995,43(4):233-240.
Google Scholar
Rix A, Hollier M, Hekstra A, Beerends J: Perceptual evaluation of speech quality (PESQ) the new ITU standard for end-to-end speech quality assessment—part I: time-delay compensation. Journal of Audio Engineering Society 2002,50(10):755-764.
Google Scholar
Rix A, Hollier M, Hekstra A, Beerends J: Perceptual evaluation of speech quality (PESQ) the new ITU standard for end-to-end speech quality assessment—part II: psychoacoustic model. Journal of Audio Engineering Society 2002,50(10):765-778.
Google Scholar
Zwicker E, Fastl H: Psychoacoustics. Springer, New York, NY, USA; 1990.
Google Scholar
Gray R: Vector quantization. IEEE ASSP Magazine 1984,1(2, part 2):4-29.
Article Google Scholar
Unser M: Splines: a perfect fit for signal and image processing. IEEE Signal Processing Magazine 1999,16(6):22-38. 10.1109/79.799930
Article Google Scholar
Durbin J: The fitting of time series models. Review of the International Institute of Statistical 1960, 28: 233-244. 10.2307/1401322
Article MATH Google Scholar
Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL: The DARPA TIMIT acoustic-phonetic continuous speech corpus CD ROM. In Tech. Rep. NISTIR 4930 /NTIS Order No. PB93-173938. National Institute of Standards and Technology, Gaithersburgh, Md, USA; 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Arizona State University, Tempe, AZ, 85287, USA
Visar Berisha & Andreas Spanias

Authors

Visar Berisha
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Spanias
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Visar Berisha.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Berisha, V., Spanias, A. Wideband Speech Recovery Using Psychoacoustic Criteria. J AUDIO SPEECH MUSIC PROC. 2007, 016816 (2007). https://doi.org/10.1155/2007/16816

Download citation

Received: 01 December 2006
Revised: 07 March 2007
Accepted: 29 June 2007
Published: 29 August 2007
DOI: https://doi.org/10.1155/2007/16816

Wideband Speech Recovery Using Psychoacoustic Criteria

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords