Skip to main content

Advertisement

You are viewing the new article page. Let us know what you think. Return to old version

Research Article | Open | Published:

Linear Prediction Using Refined Autocorrelation Function

Abstract

This paper proposes a new technique for improving the performance of linear prediction analysis by utilizing a refined version of the autocorrelation function. Problems in analyzing voiced speech using linear prediction occur often due to the harmonic structure of the excitation source, which causes the autocorrelation function to be an aliased version of that of the vocal tract impulse response. To estimate the vocal tract characteristics accurately, however, the effect of aliasing must be eliminated. In this paper, we employ homomorphic deconvolution technique in the autocorrelation domain to eliminate the aliasing effect occurred due to periodicity. The resulted autocorrelation function of the vocal tract impulse response is found to produce significant improvement in estimating formant frequencies. The accuracy of formant estimation is verified on synthetic vowels for a wide range of pitch frequencies typical for male and female speakers. The validity of the proposed method is also illustrated by inspecting the spectral envelopes of natural speech spoken by high-pitched female speaker. The synthesis filter obtained by the current method is guaranteed to be stable, which makes the method superior to many of its alternatives.

[1234567891011121314151617181920212223242526272829]

References

  1. 1.

    Atal BS, Hanauer SL: Speech analysis and synthesis by linear prediction of the speech wave. The Journal of the Acoustical Society of America 1971,50(2B):637-655. 10.1121/1.1912679

  2. 2.

    Makhoul J: Linear prediction: a tutorial review. Proceedings of the IEEE 1975,63(4):561-580.

  3. 3.

    El-Jaroudi A, Makhoul J: Discrete all-pole modeling. IEEE Transactions on Signal Processing 1991,39(2):411-423. 10.1109/78.80824

  4. 4.

    Vallabha GK, Tuller B: Systematic errors in the formant analysis of steady-state vowels. Speech Communication 2002,38(1-2):141-160. 10.1016/S0167-6393(01)00049-8

  5. 5.

    Wong DY, Markel JD, Gray AH Jr.: Least squares glottal inverse filtering from the acoustic speech waveform. IEEE Transactions on Acoustics, Speech, and Signal Processing 1979,27(4):350-355. 10.1109/TASSP.1979.1163260

  6. 6.

    Krishnamurthy A, Childers DG: Two-channel speech analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing 1986,34(4):730-743. 10.1109/TASSP.1986.1164909

  7. 7.

    Miyoshi Y, Yamato K, Mizoguchi R, Yanagida M, Kakusho O: Analysis of speech signals of short pitch period by a sample-selective linear prediction. IEEE Transactions on Acoustics, Speech, and Signal Processing 1987,35(9):1233-1240. 10.1109/TASSP.1987.1165282

  8. 8.

    Pinto NB, Childers DG, Lalwani AL: Formant speech synthesis: improving production quality. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989,37(12):1870-1887. 10.1109/29.45534

  9. 9.

    Lee C-H: On robust linear prediction of speech. IEEE Transactions on Acoustics, Speech, and Signal Processing 1988,36(5):642-650. 10.1109/29.1574

  10. 10.

    Yanagida M, Kakusho O: A weighted linear prediction analysis of speech signals by using the given's reduction. Proceedings of the IASTED International Symposium on Applied Signal Processing and Digital Filtering, June 1985, Paris, France 129-132.

  11. 11.

    Miyanaga Y, Miki N, Nagai N, Hatori K: A speech analysis algorithm which eliminates the influence of pitch using the model reference adaptive system. IEEE Transactions on Acoustics, Speech, and Signal Processing 1982,30(1):88-96. 10.1109/TASSP.1982.1163856

  12. 12.

    Fujisaki H, Ljungqvist M: Estimation of voice source and vocal tract parameters based on ARMA analysis and a model for the glottal source waveform. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '87), April 1987, Dallas, Tex, USA 637-640.

  13. 13.

    Ding W, Kasuya H: A novel approach to the estimation of voice source and vocal tract parameters from speech signals. Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP '96), October 1996, Philadelphia, Pa, USA 2: 1257-1260.

  14. 14.

    Rahman MS, Shimamura T: Speech analysis based on modeling the effective voice source. IEICE Transactions on Information and Systems 2006,E89-D(3):1107-1115. 10.1093/ietisy/e89-d.3.1107

  15. 15.

    Deng H, Ward RK, Beddoes MP, Hodgson M: A new method for obtaining accurate estimates of vocal-tract filters and glottal waves from vowel sounds. IEEE Transactions on Audio, Speech, and Language Processing 2006,14(2):445-455.

  16. 16.

    Hermansky H, Fujisaki H, Sato Y: Spectral envelope sampling and interpolation in linear predictive analysis of speech. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '84), 1984, San Diego, Calif, USA 9: 53-56.

  17. 17.

    Hermansky H: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 1990,87(4):1738-1752. 10.1121/1.399423

  18. 18.

    Varho S, Alku P: Separated linear prediction—a new all-pole modelling technique for speech analysis. Speech Communication 1998,24(2):111-121. 10.1016/S0167-6393(98)00003-X

  19. 19.

    Kabal P, Kleijn B: All-pole modelling of mixed excitation signals. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '01), May 2001, Salt Lake City, Utah, USA 1: 97-100.

  20. 20.

    Oppenheim A, Schafer R: Homomorphic analysis of speech. IEEE Transactions on Audio and Electroacoustics 1968,16(2):221-226. 10.1109/TAU.1968.1161965

  21. 21.

    Rahman MS, Shimamura T: Linear prediction using homomorphic deconvolution in the autocorrelation domain. Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS '05), May 2005, Kobe Japan 3: 2855-2858.

  22. 22.

    Quatieri TF: Discrete-Time Speech Signal Processing: Principles and Practice. Prentice-Hall, Upper Saddle River, NJ, USA; 2002.

  23. 23.

    Lim JS: Spectral root homomorphic deconvolution system. IEEE Transactions on Acoustics, Speech, and Signal Processing 1979,27(3):223-233. 10.1109/TASSP.1979.1163234

  24. 24.

    Kobayashi T, Imai S: Spectral analysis using generalised cepstrum. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(6):1235-1238. 10.1109/TASSP.1984.1164454

  25. 25.

    Verhelst W, Steenhaut O: A new model for the short-time complex cepstrum of voiced speech. IEEE Transactions on Acoustics, Speech, and Signal Processing 1986,34(1):43-51. 10.1109/TASSP.1986.1164787

  26. 26.

    Kay SM: Modern Spectral Estimation: Theory and Application. Prentice-Hall, Upper Saddle River, NJ, USA; 1988.

  27. 27.

    Stoica P, Moses RL: Introduction to Spectral Analysis. Prentice-Hall, Upper Saddle River, NJ, USA; 1997.

  28. 28.

    Fant G, Liljencrants J, Lin QG: A four parameter model of glottal flow. In Quarterly Progress and Status. Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, Sweden; 1985:1-13.

  29. 29.

    Klatt DH: Software for a cascade/parallel formant synthesizer. Journal of the Acoustical Society of America 1980,67(3):971-995. 10.1121/1.383940

Download references

Author information

Correspondence to M Shahidur Rahman.

Rights and permissions

Reprints and Permissions

About this article

Keywords

  • Acoustics
  • Autocorrelation Function
  • Aliasing
  • Linear Prediction
  • Vocal Tract