Open Access

Wideband Speech Recovery Using Psychoacoustic Criteria

EURASIP Journal on Audio, Speech, and Music Processing20072007:016816

DOI: 10.1155/2007/16816

Received: 1 December 2006

Accepted: 29 June 2007

Published: 29 August 2007

Abstract

Many modern speech bandwidth extension techniques predict the high-frequency band based on features extracted from the lower band. While this method works for certain types of speech, problems arise when the correlation between the low and the high bands is not sufficient for adequate prediction. These situations require that additional high-band information is sent to the decoder. This overhead information, however, can be cleverly quantized using human auditory system models. In this paper, we propose a novel speech compression method that relies on bandwidth extension. The novelty of the technique lies in an elaborate perceptual model that determines a quantization scheme for wideband recovery and synthesis. Furthermore, a source/filter bandwidth extension algorithm based on spectral spline fitting is proposed. Results reveal that the proposed system improves the quality of narrowband speech while performing at a lower bitrate. When compared to other wideband speech coding schemes, the proposed algorithms provide comparable speech quality at a lower bitrate.

[123456789101112131415161718192021222324252627282930313233343536373839404142434445464748]

Authors’ Affiliations

(1)
Department of Electrical Engineering, Arizona State University

References

  1. Spanias A: Speech coding: a tutorial review. Proceedings of the IEEE 1994,82(10):1541-1582. 10.1109/5.326413View ArticleGoogle Scholar
  2. Unno T, McCree A: A robust narrowband to wideband extension system featuring enhanced codebook mapping. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1: 805-808.Google Scholar
  3. Jax P, Vary P: Enhancement of band-limited speech signals. Proceedings of the 10th Aachen Symposium on Signal Theory, September 2001, Aachen, Germany 331-336.Google Scholar
  4. Jax P, Vary P: Artificial bandwidth extension of speech signals using MMSE estimation based on a hidden markov model. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '03), April 2003, Hong Kong 1: 680-683.View ArticleMATHGoogle Scholar
  5. Nilsson M, Kleijn WB: Avoiding over-estimation in bandwidth extension of telephony speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '01), May 2001, Salt Lake, Utah, USA 2: 869-872.Google Scholar
  6. Jax P, Vary P: An upper bound on the quality of artificial bandwidth extension of narrowband speech signals. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 237-240.Google Scholar
  7. Nilsson M, Andersen S, Kleijn W: On the mutual information between frequency bands in speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 3: 1327-1330.Google Scholar
  8. Nilsson M, Gustafsson H, Andersen SV, Kleijn WB: Gaussian mixture model based mutual information estimation between frequency bands in speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 525-528.Google Scholar
  9. Chan C-F, Hui W-K: Wideband re-synthesis of narrowband CELP coded speech using multiband excitation model. Proceedings of the International Conference on Spoken Language Processing (ICSLP '96), October 1996, Philadelphia, Pa, USA 1: 322-325.View ArticleGoogle Scholar
  10. Berisha V, Spanias A: Enhancing the quality of coded audio using perceptual criteria. Proccedings of the 7th IEEE Workshop on Multimedia Signal Processing (MMSP '05), October 2005, Shanghai, China 1-4.Google Scholar
  11. Berisha V, Spanias A: Enhancing vocoder performance for music signals. Proccedings of IEEE International Symposium on Circuits and Systems (ISCAS '05), May 2005, Kobe, Japan 4: 4050-4053.View ArticleGoogle Scholar
  12. Berisha V, Spanias A: Bandwidth extension of audio based on partial loudness criteria. Proceedings of the 8th IEEE Workshop on Multimedia Signal Processing (MMSP '06), October 2006, Victoria, BC, Canada 146-149.Google Scholar
  13. Edler B, Schuller G: Audio coding using a psychoacoustic pre- and post-filter. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 2: 881-885.Google Scholar
  14. ITU-T Recommen-dation G.729.1 : G.729 based Embedded variable bit-rate coder: an 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729. 2006.Google Scholar
  15. Moore BCJ, Glasberg BR, Baer T: A model for the prediction of thresholds, loudness, and partial loudness. Journal of the Audio Engineering Society 1997,45(4):224-240.Google Scholar
  16. AMR Narrowband Speech Codec : Transcoding Functions. 2001.Google Scholar
  17. AMR Wideband Speech Codec : Transcoding Functions. 2003.Google Scholar
  18. Yasukawa H: Enhancement of telephone speech quality by simple spectrum extrapolation method. Proceedings of the 4th European Conference on Speech Communication and Technology (EUROSPEECH '95), September 1995, Madrid, Spain 1545-1548.Google Scholar
  19. Yasukawa H: Signal restoration of broad band speech using nonlinear processing. Proceedings of European Signal Processing Conference (EUSIPCO '96), September 1996, Trieste, Italy 987-990.Google Scholar
  20. Yasukawa H: Wideband speech recovery from bandlimited speech in telephone communications. Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '98), May-June 1998, Monterey, Calif, USA 4: 202-205.Google Scholar
  21. Larson E, Aarts R: Audio Bandwidth Extension. 1st edition. John Wiley & Sons, West Sussex, UK; 2005.Google Scholar
  22. Carl H, Heute U: Bandwidth enhancement of narrow-band speech signals. Proceedings of the 7th European Signal Processing Conference (EUSIPCO '94), September 1994, Edinburgh, Scotland 2: 1178-1181.Google Scholar
  23. Yoshida Y, Abe M: An algorithm to reconstruct wideband speech from narrowband speech based on codebook mapping. Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP '94), September 1994, Yokohama, Japan 1591-1594.Google Scholar
  24. Cheng Y, O'Shaughnessy D, Mermelstein P: Statistical recovery of wideband speech from narrowband speech. IEEE Transactions on Speech and Audio Processing 1994,2(4):544-548. 10.1109/89.326637View ArticleGoogle Scholar
  25. Yao S, Chan CF: Block-based bandwidth extension of narrowband speech signal by using CDHMM. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1: 793-796.Google Scholar
  26. Nakatoh Y, Tsushima M, Norimatsu T: Generation of broadband speech from narrowband speech using piecewise linear mapping. Proceedings of the 5th European Conference on Speech Communication and Technology (EUROSPEECH '97), September 1997, Rhodes, Greece 3: 1643-1646.Google Scholar
  27. Avendano C, Hermansky H, Wan E: Beyond nyquist: towards the recovery of broad-bandwidth speech from narrow-bandwidth speech. Proceedings of the 4th European Conference on Speech Communication and Technology (EUROSPEECH '95), Septemver 1995, Madrid, Spain 1: 165-168.Google Scholar
  28. Epps J: Wideband extension of narrowband speech for enhancement and coding, Ph.D. dissertation. 2000.Google Scholar
  29. Dietz M, Liljeryd L, Kjorling K, Kunz O: Spectral band replication, a novel approach in audio coding. Proceedings of 112th AES Audio Engineering Society, May 2002, Munich, Germany 5553.Google Scholar
  30. Kroon P, Kleijn W: Linear prediction-based analysis-by-synthesis coding. In Speech Coding and Synthesis. Elsevier Science, New York, NY, USA; 1995:81-113.Google Scholar
  31. Hermansky H: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 1990,87(4):1738-1752. 10.1121/1.399423View ArticleGoogle Scholar
  32. Strube HW: Linear prediction on a warped frequency scale. Journal of the Acoustical Society of America 1980,68(4):1071-1076. 10.1121/1.384992View ArticleGoogle Scholar
  33. Information Technology-Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/sec 1992.Google Scholar
  34. Moore BC: An Introduction to the Psychology of Hearing. 5th edition. Academic Press, New York, NY, USA; 2003.Google Scholar
  35. The digital theater systems (dts) http://www.dtsonline.com/
  36. Davidson G: Digital audio coding: dolby AC-3. In The Digital Signal Processing Handbook. CRC Press, New York, NY, USA; 1998:41.1-41.21.Google Scholar
  37. Painter T, Spanias A: Perceptual segmentation and component selection for sinusoidal representations of audio. IEEE Transactions on Speech and Audio Processing 2005,13(2):149-162.View ArticleGoogle Scholar
  38. Atti V, Spanias A: Speech analysis by estimating perceptually relevant pole locations. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1: 217-220.Google Scholar
  39. Purnhagen H, Meine N, Edler B: Sinusoidal coding using loudness-based component selection. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1817-1820.Google Scholar
  40. Paillard B, Mabilleau P, Morissette S, Soumagne J: Perceval: perceptual evaluation of the quality of audio signals. Journal of the Audio Engineering Society 1992,40(1-2):21-31.Google Scholar
  41. Colomes C, Lever M, Rault J, Dehery Y: A perceptual model applied to audio bit-rate reduction. Journal of Audio Engneering Society 1995,43(4):233-240.Google Scholar
  42. Rix A, Hollier M, Hekstra A, Beerends J: Perceptual evaluation of speech quality (PESQ) the new ITU standard for end-to-end speech quality assessment—part I: time-delay compensation. Journal of Audio Engineering Society 2002,50(10):755-764.Google Scholar
  43. Rix A, Hollier M, Hekstra A, Beerends J: Perceptual evaluation of speech quality (PESQ) the new ITU standard for end-to-end speech quality assessment—part II: psychoacoustic model. Journal of Audio Engineering Society 2002,50(10):765-778.Google Scholar
  44. Zwicker E, Fastl H: Psychoacoustics. Springer, New York, NY, USA; 1990.Google Scholar
  45. Gray R: Vector quantization. IEEE ASSP Magazine 1984,1(2, part 2):4-29.View ArticleGoogle Scholar
  46. Unser M: Splines: a perfect fit for signal and image processing. IEEE Signal Processing Magazine 1999,16(6):22-38. 10.1109/79.799930View ArticleGoogle Scholar
  47. Durbin J: The fitting of time series models. Review of the International Institute of Statistical 1960, 28: 233-244. 10.2307/1401322View ArticleMATHGoogle Scholar
  48. Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL: The DARPA TIMIT acoustic-phonetic continuous speech corpus CD ROM. In Tech. Rep. NISTIR 4930 /NTIS Order No. PB93-173938. National Institute of Standards and Technology, Gaithersburgh, Md, USA; 1993.Google Scholar

Copyright

© V. Berisha and A. Spanias. 2007

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.