- Research Article
- Open access
- Published:
Wideband Speech Recovery Using Psychoacoustic Criteria
EURASIP Journal on Audio, Speech, and Music Processing volume 2007, Article number: 016816 (2007)
Abstract
Many modern speech bandwidth extension techniques predict the high-frequency band based on features extracted from the lower band. While this method works for certain types of speech, problems arise when the correlation between the low and the high bands is not sufficient for adequate prediction. These situations require that additional high-band information is sent to the decoder. This overhead information, however, can be cleverly quantized using human auditory system models. In this paper, we propose a novel speech compression method that relies on bandwidth extension. The novelty of the technique lies in an elaborate perceptual model that determines a quantization scheme for wideband recovery and synthesis. Furthermore, a source/filter bandwidth extension algorithm based on spectral spline fitting is proposed. Results reveal that the proposed system improves the quality of narrowband speech while performing at a lower bitrate. When compared to other wideband speech coding schemes, the proposed algorithms provide comparable speech quality at a lower bitrate.
References
Spanias A: Speech coding: a tutorial review. Proceedings of the IEEE 1994,82(10):1541-1582. 10.1109/5.326413
Unno T, McCree A: A robust narrowband to wideband extension system featuring enhanced codebook mapping. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1: 805-808.
Jax P, Vary P: Enhancement of band-limited speech signals. Proceedings of the 10th Aachen Symposium on Signal Theory, September 2001, Aachen, Germany 331-336.
Jax P, Vary P: Artificial bandwidth extension of speech signals using MMSE estimation based on a hidden markov model. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '03), April 2003, Hong Kong 1: 680-683.
Nilsson M, Kleijn WB: Avoiding over-estimation in bandwidth extension of telephony speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '01), May 2001, Salt Lake, Utah, USA 2: 869-872.
Jax P, Vary P: An upper bound on the quality of artificial bandwidth extension of narrowband speech signals. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 237-240.
Nilsson M, Andersen S, Kleijn W: On the mutual information between frequency bands in speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 3: 1327-1330.
Nilsson M, Gustafsson H, Andersen SV, Kleijn WB: Gaussian mixture model based mutual information estimation between frequency bands in speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 525-528.
Chan C-F, Hui W-K: Wideband re-synthesis of narrowband CELP coded speech using multiband excitation model. Proceedings of the International Conference on Spoken Language Processing (ICSLP '96), October 1996, Philadelphia, Pa, USA 1: 322-325.
Berisha V, Spanias A: Enhancing the quality of coded audio using perceptual criteria. Proccedings of the 7th IEEE Workshop on Multimedia Signal Processing (MMSP '05), October 2005, Shanghai, China 1-4.
Berisha V, Spanias A: Enhancing vocoder performance for music signals. Proccedings of IEEE International Symposium on Circuits and Systems (ISCAS '05), May 2005, Kobe, Japan 4: 4050-4053.
Berisha V, Spanias A: Bandwidth extension of audio based on partial loudness criteria. Proceedings of the 8th IEEE Workshop on Multimedia Signal Processing (MMSP '06), October 2006, Victoria, BC, Canada 146-149.
Edler B, Schuller G: Audio coding using a psychoacoustic pre- and post-filter. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 2: 881-885.
ITU-T Recommen-dation G.729.1 : G.729 based Embedded variable bit-rate coder: an 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729. 2006.
Moore BCJ, Glasberg BR, Baer T: A model for the prediction of thresholds, loudness, and partial loudness. Journal of the Audio Engineering Society 1997,45(4):224-240.
AMR Narrowband Speech Codec : Transcoding Functions. 2001.
AMR Wideband Speech Codec : Transcoding Functions. 2003.
Yasukawa H: Enhancement of telephone speech quality by simple spectrum extrapolation method. Proceedings of the 4th European Conference on Speech Communication and Technology (EUROSPEECH '95), September 1995, Madrid, Spain 1545-1548.
Yasukawa H: Signal restoration of broad band speech using nonlinear processing. Proceedings of European Signal Processing Conference (EUSIPCO '96), September 1996, Trieste, Italy 987-990.
Yasukawa H: Wideband speech recovery from bandlimited speech in telephone communications. Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '98), May-June 1998, Monterey, Calif, USA 4: 202-205.
Larson E, Aarts R: Audio Bandwidth Extension. 1st edition. John Wiley & Sons, West Sussex, UK; 2005.
Carl H, Heute U: Bandwidth enhancement of narrow-band speech signals. Proceedings of the 7th European Signal Processing Conference (EUSIPCO '94), September 1994, Edinburgh, Scotland 2: 1178-1181.
Yoshida Y, Abe M: An algorithm to reconstruct wideband speech from narrowband speech based on codebook mapping. Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP '94), September 1994, Yokohama, Japan 1591-1594.
Cheng Y, O'Shaughnessy D, Mermelstein P: Statistical recovery of wideband speech from narrowband speech. IEEE Transactions on Speech and Audio Processing 1994,2(4):544-548. 10.1109/89.326637
Yao S, Chan CF: Block-based bandwidth extension of narrowband speech signal by using CDHMM. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1: 793-796.
Nakatoh Y, Tsushima M, Norimatsu T: Generation of broadband speech from narrowband speech using piecewise linear mapping. Proceedings of the 5th European Conference on Speech Communication and Technology (EUROSPEECH '97), September 1997, Rhodes, Greece 3: 1643-1646.
Avendano C, Hermansky H, Wan E: Beyond nyquist: towards the recovery of broad-bandwidth speech from narrow-bandwidth speech. Proceedings of the 4th European Conference on Speech Communication and Technology (EUROSPEECH '95), Septemver 1995, Madrid, Spain 1: 165-168.
Epps J: Wideband extension of narrowband speech for enhancement and coding, Ph.D. dissertation. 2000.
Dietz M, Liljeryd L, Kjorling K, Kunz O: Spectral band replication, a novel approach in audio coding. Proceedings of 112th AES Audio Engineering Society, May 2002, Munich, Germany 5553.
Kroon P, Kleijn W: Linear prediction-based analysis-by-synthesis coding. In Speech Coding and Synthesis. Elsevier Science, New York, NY, USA; 1995:81-113.
Hermansky H: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 1990,87(4):1738-1752. 10.1121/1.399423
Strube HW: Linear prediction on a warped frequency scale. Journal of the Acoustical Society of America 1980,68(4):1071-1076. 10.1121/1.384992
Information Technology-Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/sec 1992.
Moore BC: An Introduction to the Psychology of Hearing. 5th edition. Academic Press, New York, NY, USA; 2003.
The digital theater systems (dts) http://www.dtsonline.com/
Davidson G: Digital audio coding: dolby AC-3. In The Digital Signal Processing Handbook. CRC Press, New York, NY, USA; 1998:41.1-41.21.
Painter T, Spanias A: Perceptual segmentation and component selection for sinusoidal representations of audio. IEEE Transactions on Speech and Audio Processing 2005,13(2):149-162.
Atti V, Spanias A: Speech analysis by estimating perceptually relevant pole locations. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1: 217-220.
Purnhagen H, Meine N, Edler B: Sinusoidal coding using loudness-based component selection. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1817-1820.
Paillard B, Mabilleau P, Morissette S, Soumagne J: Perceval: perceptual evaluation of the quality of audio signals. Journal of the Audio Engineering Society 1992,40(1-2):21-31.
Colomes C, Lever M, Rault J, Dehery Y: A perceptual model applied to audio bit-rate reduction. Journal of Audio Engneering Society 1995,43(4):233-240.
Rix A, Hollier M, Hekstra A, Beerends J: Perceptual evaluation of speech quality (PESQ) the new ITU standard for end-to-end speech quality assessment—part I: time-delay compensation. Journal of Audio Engineering Society 2002,50(10):755-764.
Rix A, Hollier M, Hekstra A, Beerends J: Perceptual evaluation of speech quality (PESQ) the new ITU standard for end-to-end speech quality assessment—part II: psychoacoustic model. Journal of Audio Engineering Society 2002,50(10):765-778.
Zwicker E, Fastl H: Psychoacoustics. Springer, New York, NY, USA; 1990.
Gray R: Vector quantization. IEEE ASSP Magazine 1984,1(2, part 2):4-29.
Unser M: Splines: a perfect fit for signal and image processing. IEEE Signal Processing Magazine 1999,16(6):22-38. 10.1109/79.799930
Durbin J: The fitting of time series models. Review of the International Institute of Statistical 1960, 28: 233-244. 10.2307/1401322
Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL: The DARPA TIMIT acoustic-phonetic continuous speech corpus CD ROM. In Tech. Rep. NISTIR 4930 /NTIS Order No. PB93-173938. National Institute of Standards and Technology, Gaithersburgh, Md, USA; 1993.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Berisha, V., Spanias, A. Wideband Speech Recovery Using Psychoacoustic Criteria. J AUDIO SPEECH MUSIC PROC. 2007, 016816 (2007). https://doi.org/10.1155/2007/16816
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1155/2007/16816