Skip to main content
  • Research Article
  • Open access
  • Published:

Wideband Speech Recovery Using Psychoacoustic Criteria

Abstract

Many modern speech bandwidth extension techniques predict the high-frequency band based on features extracted from the lower band. While this method works for certain types of speech, problems arise when the correlation between the low and the high bands is not sufficient for adequate prediction. These situations require that additional high-band information is sent to the decoder. This overhead information, however, can be cleverly quantized using human auditory system models. In this paper, we propose a novel speech compression method that relies on bandwidth extension. The novelty of the technique lies in an elaborate perceptual model that determines a quantization scheme for wideband recovery and synthesis. Furthermore, a source/filter bandwidth extension algorithm based on spectral spline fitting is proposed. Results reveal that the proposed system improves the quality of narrowband speech while performing at a lower bitrate. When compared to other wideband speech coding schemes, the proposed algorithms provide comparable speech quality at a lower bitrate.

[123456789101112131415161718192021222324252627282930313233343536373839404142434445464748]

References

  1. Spanias A: Speech coding: a tutorial review. Proceedings of the IEEE 1994,82(10):1541-1582. 10.1109/5.326413

    Article  Google Scholar 

  2. Unno T, McCree A: A robust narrowband to wideband extension system featuring enhanced codebook mapping. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1: 805-808.

    Google Scholar 

  3. Jax P, Vary P: Enhancement of band-limited speech signals. Proceedings of the 10th Aachen Symposium on Signal Theory, September 2001, Aachen, Germany 331-336.

    Google Scholar 

  4. Jax P, Vary P: Artificial bandwidth extension of speech signals using MMSE estimation based on a hidden markov model. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '03), April 2003, Hong Kong 1: 680-683.

    Article  MATH  Google Scholar 

  5. Nilsson M, Kleijn WB: Avoiding over-estimation in bandwidth extension of telephony speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '01), May 2001, Salt Lake, Utah, USA 2: 869-872.

    Google Scholar 

  6. Jax P, Vary P: An upper bound on the quality of artificial bandwidth extension of narrowband speech signals. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 237-240.

    Google Scholar 

  7. Nilsson M, Andersen S, Kleijn W: On the mutual information between frequency bands in speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 3: 1327-1330.

    Google Scholar 

  8. Nilsson M, Gustafsson H, Andersen SV, Kleijn WB: Gaussian mixture model based mutual information estimation between frequency bands in speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 525-528.

    Google Scholar 

  9. Chan C-F, Hui W-K: Wideband re-synthesis of narrowband CELP coded speech using multiband excitation model. Proceedings of the International Conference on Spoken Language Processing (ICSLP '96), October 1996, Philadelphia, Pa, USA 1: 322-325.

    Article  Google Scholar 

  10. Berisha V, Spanias A: Enhancing the quality of coded audio using perceptual criteria. Proccedings of the 7th IEEE Workshop on Multimedia Signal Processing (MMSP '05), October 2005, Shanghai, China 1-4.

    Google Scholar 

  11. Berisha V, Spanias A: Enhancing vocoder performance for music signals. Proccedings of IEEE International Symposium on Circuits and Systems (ISCAS '05), May 2005, Kobe, Japan 4: 4050-4053.

    Article  Google Scholar 

  12. Berisha V, Spanias A: Bandwidth extension of audio based on partial loudness criteria. Proceedings of the 8th IEEE Workshop on Multimedia Signal Processing (MMSP '06), October 2006, Victoria, BC, Canada 146-149.

    Google Scholar 

  13. Edler B, Schuller G: Audio coding using a psychoacoustic pre- and post-filter. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 2: 881-885.

    Google Scholar 

  14. ITU-T Recommen-dation G.729.1 : G.729 based Embedded variable bit-rate coder: an 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729. 2006.

    Google Scholar 

  15. Moore BCJ, Glasberg BR, Baer T: A model for the prediction of thresholds, loudness, and partial loudness. Journal of the Audio Engineering Society 1997,45(4):224-240.

    Google Scholar 

  16. AMR Narrowband Speech Codec : Transcoding Functions. 2001.

    Google Scholar 

  17. AMR Wideband Speech Codec : Transcoding Functions. 2003.

    Google Scholar 

  18. Yasukawa H: Enhancement of telephone speech quality by simple spectrum extrapolation method. Proceedings of the 4th European Conference on Speech Communication and Technology (EUROSPEECH '95), September 1995, Madrid, Spain 1545-1548.

    Google Scholar 

  19. Yasukawa H: Signal restoration of broad band speech using nonlinear processing. Proceedings of European Signal Processing Conference (EUSIPCO '96), September 1996, Trieste, Italy 987-990.

    Google Scholar 

  20. Yasukawa H: Wideband speech recovery from bandlimited speech in telephone communications. Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '98), May-June 1998, Monterey, Calif, USA 4: 202-205.

    Google Scholar 

  21. Larson E, Aarts R: Audio Bandwidth Extension. 1st edition. John Wiley & Sons, West Sussex, UK; 2005.

    Google Scholar 

  22. Carl H, Heute U: Bandwidth enhancement of narrow-band speech signals. Proceedings of the 7th European Signal Processing Conference (EUSIPCO '94), September 1994, Edinburgh, Scotland 2: 1178-1181.

    Google Scholar 

  23. Yoshida Y, Abe M: An algorithm to reconstruct wideband speech from narrowband speech based on codebook mapping. Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP '94), September 1994, Yokohama, Japan 1591-1594.

    Google Scholar 

  24. Cheng Y, O'Shaughnessy D, Mermelstein P: Statistical recovery of wideband speech from narrowband speech. IEEE Transactions on Speech and Audio Processing 1994,2(4):544-548. 10.1109/89.326637

    Article  Google Scholar 

  25. Yao S, Chan CF: Block-based bandwidth extension of narrowband speech signal by using CDHMM. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1: 793-796.

    Google Scholar 

  26. Nakatoh Y, Tsushima M, Norimatsu T: Generation of broadband speech from narrowband speech using piecewise linear mapping. Proceedings of the 5th European Conference on Speech Communication and Technology (EUROSPEECH '97), September 1997, Rhodes, Greece 3: 1643-1646.

    Google Scholar 

  27. Avendano C, Hermansky H, Wan E: Beyond nyquist: towards the recovery of broad-bandwidth speech from narrow-bandwidth speech. Proceedings of the 4th European Conference on Speech Communication and Technology (EUROSPEECH '95), Septemver 1995, Madrid, Spain 1: 165-168.

  28. Epps J: Wideband extension of narrowband speech for enhancement and coding, Ph.D. dissertation. 2000.

    Google Scholar 

  29. Dietz M, Liljeryd L, Kjorling K, Kunz O: Spectral band replication, a novel approach in audio coding. Proceedings of 112th AES Audio Engineering Society, May 2002, Munich, Germany 5553.

    Google Scholar 

  30. Kroon P, Kleijn W: Linear prediction-based analysis-by-synthesis coding. In Speech Coding and Synthesis. Elsevier Science, New York, NY, USA; 1995:81-113.

    Google Scholar 

  31. Hermansky H: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 1990,87(4):1738-1752. 10.1121/1.399423

    Article  Google Scholar 

  32. Strube HW: Linear prediction on a warped frequency scale. Journal of the Acoustical Society of America 1980,68(4):1071-1076. 10.1121/1.384992

    Article  Google Scholar 

  33. Information Technology-Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/sec 1992.

  34. Moore BC: An Introduction to the Psychology of Hearing. 5th edition. Academic Press, New York, NY, USA; 2003.

    Google Scholar 

  35. The digital theater systems (dts) http://www.dtsonline.com/

  36. Davidson G: Digital audio coding: dolby AC-3. In The Digital Signal Processing Handbook. CRC Press, New York, NY, USA; 1998:41.1-41.21.

    Google Scholar 

  37. Painter T, Spanias A: Perceptual segmentation and component selection for sinusoidal representations of audio. IEEE Transactions on Speech and Audio Processing 2005,13(2):149-162.

    Article  Google Scholar 

  38. Atti V, Spanias A: Speech analysis by estimating perceptually relevant pole locations. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1: 217-220.

    Google Scholar 

  39. Purnhagen H, Meine N, Edler B: Sinusoidal coding using loudness-based component selection. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1817-1820.

    Google Scholar 

  40. Paillard B, Mabilleau P, Morissette S, Soumagne J: Perceval: perceptual evaluation of the quality of audio signals. Journal of the Audio Engineering Society 1992,40(1-2):21-31.

    Google Scholar 

  41. Colomes C, Lever M, Rault J, Dehery Y: A perceptual model applied to audio bit-rate reduction. Journal of Audio Engneering Society 1995,43(4):233-240.

    Google Scholar 

  42. Rix A, Hollier M, Hekstra A, Beerends J: Perceptual evaluation of speech quality (PESQ) the new ITU standard for end-to-end speech quality assessment—part I: time-delay compensation. Journal of Audio Engineering Society 2002,50(10):755-764.

    Google Scholar 

  43. Rix A, Hollier M, Hekstra A, Beerends J: Perceptual evaluation of speech quality (PESQ) the new ITU standard for end-to-end speech quality assessment—part II: psychoacoustic model. Journal of Audio Engineering Society 2002,50(10):765-778.

    Google Scholar 

  44. Zwicker E, Fastl H: Psychoacoustics. Springer, New York, NY, USA; 1990.

    Google Scholar 

  45. Gray R: Vector quantization. IEEE ASSP Magazine 1984,1(2, part 2):4-29.

    Article  Google Scholar 

  46. Unser M: Splines: a perfect fit for signal and image processing. IEEE Signal Processing Magazine 1999,16(6):22-38. 10.1109/79.799930

    Article  Google Scholar 

  47. Durbin J: The fitting of time series models. Review of the International Institute of Statistical 1960, 28: 233-244. 10.2307/1401322

    Article  MATH  Google Scholar 

  48. Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL: The DARPA TIMIT acoustic-phonetic continuous speech corpus CD ROM. In Tech. Rep. NISTIR 4930 /NTIS Order No. PB93-173938. National Institute of Standards and Technology, Gaithersburgh, Md, USA; 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Visar Berisha.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Berisha, V., Spanias, A. Wideband Speech Recovery Using Psychoacoustic Criteria. J AUDIO SPEECH MUSIC PROC. 2007, 016816 (2007). https://doi.org/10.1155/2007/16816

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1155/2007/16816

Keywords