Skip to main content

Linear Prediction Using Refined Autocorrelation Function

Abstract

This paper proposes a new technique for improving the performance of linear prediction analysis by utilizing a refined version of the autocorrelation function. Problems in analyzing voiced speech using linear prediction occur often due to the harmonic structure of the excitation source, which causes the autocorrelation function to be an aliased version of that of the vocal tract impulse response. To estimate the vocal tract characteristics accurately, however, the effect of aliasing must be eliminated. In this paper, we employ homomorphic deconvolution technique in the autocorrelation domain to eliminate the aliasing effect occurred due to periodicity. The resulted autocorrelation function of the vocal tract impulse response is found to produce significant improvement in estimating formant frequencies. The accuracy of formant estimation is verified on synthetic vowels for a wide range of pitch frequencies typical for male and female speakers. The validity of the proposed method is also illustrated by inspecting the spectral envelopes of natural speech spoken by high-pitched female speaker. The synthesis filter obtained by the current method is guaranteed to be stable, which makes the method superior to many of its alternatives.

[1234567891011121314151617181920212223242526272829]

References

  1. 1.

    Atal BS, Hanauer SL: Speech analysis and synthesis by linear prediction of the speech wave. The Journal of the Acoustical Society of America 1971,50(2B):637-655. 10.1121/1.1912679

    Article  Google Scholar 

  2. 2.

    Makhoul J: Linear prediction: a tutorial review. Proceedings of the IEEE 1975,63(4):561-580.

    Article  Google Scholar 

  3. 3.

    El-Jaroudi A, Makhoul J: Discrete all-pole modeling. IEEE Transactions on Signal Processing 1991,39(2):411-423. 10.1109/78.80824

    Article  Google Scholar 

  4. 4.

    Vallabha GK, Tuller B: Systematic errors in the formant analysis of steady-state vowels. Speech Communication 2002,38(1-2):141-160. 10.1016/S0167-6393(01)00049-8

    Article  MATH  Google Scholar 

  5. 5.

    Wong DY, Markel JD, Gray AH Jr.: Least squares glottal inverse filtering from the acoustic speech waveform. IEEE Transactions on Acoustics, Speech, and Signal Processing 1979,27(4):350-355. 10.1109/TASSP.1979.1163260

    Article  Google Scholar 

  6. 6.

    Krishnamurthy A, Childers DG: Two-channel speech analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing 1986,34(4):730-743. 10.1109/TASSP.1986.1164909

    Article  Google Scholar 

  7. 7.

    Miyoshi Y, Yamato K, Mizoguchi R, Yanagida M, Kakusho O: Analysis of speech signals of short pitch period by a sample-selective linear prediction. IEEE Transactions on Acoustics, Speech, and Signal Processing 1987,35(9):1233-1240. 10.1109/TASSP.1987.1165282

    Article  Google Scholar 

  8. 8.

    Pinto NB, Childers DG, Lalwani AL: Formant speech synthesis: improving production quality. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989,37(12):1870-1887. 10.1109/29.45534

    Article  Google Scholar 

  9. 9.

    Lee C-H: On robust linear prediction of speech. IEEE Transactions on Acoustics, Speech, and Signal Processing 1988,36(5):642-650. 10.1109/29.1574

    Article  MATH  Google Scholar 

  10. 10.

    Yanagida M, Kakusho O: A weighted linear prediction analysis of speech signals by using the given's reduction. Proceedings of the IASTED International Symposium on Applied Signal Processing and Digital Filtering, June 1985, Paris, France 129-132.

    Google Scholar 

  11. 11.

    Miyanaga Y, Miki N, Nagai N, Hatori K: A speech analysis algorithm which eliminates the influence of pitch using the model reference adaptive system. IEEE Transactions on Acoustics, Speech, and Signal Processing 1982,30(1):88-96. 10.1109/TASSP.1982.1163856

    Article  Google Scholar 

  12. 12.

    Fujisaki H, Ljungqvist M: Estimation of voice source and vocal tract parameters based on ARMA analysis and a model for the glottal source waveform. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '87), April 1987, Dallas, Tex, USA 637-640.

    Google Scholar 

  13. 13.

    Ding W, Kasuya H: A novel approach to the estimation of voice source and vocal tract parameters from speech signals. Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP '96), October 1996, Philadelphia, Pa, USA 2: 1257-1260.

    Article  Google Scholar 

  14. 14.

    Rahman MS, Shimamura T: Speech analysis based on modeling the effective voice source. IEICE Transactions on Information and Systems 2006,E89-D(3):1107-1115. 10.1093/ietisy/e89-d.3.1107

    Article  Google Scholar 

  15. 15.

    Deng H, Ward RK, Beddoes MP, Hodgson M: A new method for obtaining accurate estimates of vocal-tract filters and glottal waves from vowel sounds. IEEE Transactions on Audio, Speech, and Language Processing 2006,14(2):445-455.

    Article  Google Scholar 

  16. 16.

    Hermansky H, Fujisaki H, Sato Y: Spectral envelope sampling and interpolation in linear predictive analysis of speech. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '84), 1984, San Diego, Calif, USA 9: 53-56.

    Google Scholar 

  17. 17.

    Hermansky H: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 1990,87(4):1738-1752. 10.1121/1.399423

    Article  Google Scholar 

  18. 18.

    Varho S, Alku P: Separated linear prediction—a new all-pole modelling technique for speech analysis. Speech Communication 1998,24(2):111-121. 10.1016/S0167-6393(98)00003-X

    Article  Google Scholar 

  19. 19.

    Kabal P, Kleijn B: All-pole modelling of mixed excitation signals. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '01), May 2001, Salt Lake City, Utah, USA 1: 97-100.

    Google Scholar 

  20. 20.

    Oppenheim A, Schafer R: Homomorphic analysis of speech. IEEE Transactions on Audio and Electroacoustics 1968,16(2):221-226. 10.1109/TAU.1968.1161965

    Article  Google Scholar 

  21. 21.

    Rahman MS, Shimamura T: Linear prediction using homomorphic deconvolution in the autocorrelation domain. Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS '05), May 2005, Kobe Japan 3: 2855-2858.

    Article  Google Scholar 

  22. 22.

    Quatieri TF: Discrete-Time Speech Signal Processing: Principles and Practice. Prentice-Hall, Upper Saddle River, NJ, USA; 2002.

    Google Scholar 

  23. 23.

    Lim JS: Spectral root homomorphic deconvolution system. IEEE Transactions on Acoustics, Speech, and Signal Processing 1979,27(3):223-233. 10.1109/TASSP.1979.1163234

    Article  MATH  Google Scholar 

  24. 24.

    Kobayashi T, Imai S: Spectral analysis using generalised cepstrum. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(6):1235-1238. 10.1109/TASSP.1984.1164454

    Article  Google Scholar 

  25. 25.

    Verhelst W, Steenhaut O: A new model for the short-time complex cepstrum of voiced speech. IEEE Transactions on Acoustics, Speech, and Signal Processing 1986,34(1):43-51. 10.1109/TASSP.1986.1164787

    Article  Google Scholar 

  26. 26.

    Kay SM: Modern Spectral Estimation: Theory and Application. Prentice-Hall, Upper Saddle River, NJ, USA; 1988.

    Google Scholar 

  27. 27.

    Stoica P, Moses RL: Introduction to Spectral Analysis. Prentice-Hall, Upper Saddle River, NJ, USA; 1997.

    Google Scholar 

  28. 28.

    Fant G, Liljencrants J, Lin QG: A four parameter model of glottal flow. In Quarterly Progress and Status. Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, Sweden; 1985:1-13.

    Google Scholar 

  29. 29.

    Klatt DH: Software for a cascade/parallel formant synthesizer. Journal of the Acoustical Society of America 1980,67(3):971-995. 10.1121/1.383940

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to M Shahidur Rahman.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Rahman, M.S., Shimamura, T. Linear Prediction Using Refined Autocorrelation Function. J AUDIO SPEECH MUSIC PROC. 2007, 045962 (2007). https://doi.org/10.1155/2007/45962

Download citation

Keywords

  • Acoustics
  • Autocorrelation Function
  • Aliasing
  • Linear Prediction
  • Vocal Tract