Research Article | Open | Published:
Linear Prediction Using Refined Autocorrelation Function
EURASIP Journal on Audio, Speech, and Music Processingvolume 2007, Article number: 045962 (2007)
This paper proposes a new technique for improving the performance of linear prediction analysis by utilizing a refined version of the autocorrelation function. Problems in analyzing voiced speech using linear prediction occur often due to the harmonic structure of the excitation source, which causes the autocorrelation function to be an aliased version of that of the vocal tract impulse response. To estimate the vocal tract characteristics accurately, however, the effect of aliasing must be eliminated. In this paper, we employ homomorphic deconvolution technique in the autocorrelation domain to eliminate the aliasing effect occurred due to periodicity. The resulted autocorrelation function of the vocal tract impulse response is found to produce significant improvement in estimating formant frequencies. The accuracy of formant estimation is verified on synthetic vowels for a wide range of pitch frequencies typical for male and female speakers. The validity of the proposed method is also illustrated by inspecting the spectral envelopes of natural speech spoken by high-pitched female speaker. The synthesis filter obtained by the current method is guaranteed to be stable, which makes the method superior to many of its alternatives.
Atal BS, Hanauer SL: Speech analysis and synthesis by linear prediction of the speech wave. The Journal of the Acoustical Society of America 1971,50(2B):637-655. 10.1121/1.1912679
Makhoul J: Linear prediction: a tutorial review. Proceedings of the IEEE 1975,63(4):561-580.
El-Jaroudi A, Makhoul J: Discrete all-pole modeling. IEEE Transactions on Signal Processing 1991,39(2):411-423. 10.1109/78.80824
Vallabha GK, Tuller B: Systematic errors in the formant analysis of steady-state vowels. Speech Communication 2002,38(1-2):141-160. 10.1016/S0167-6393(01)00049-8
Wong DY, Markel JD, Gray AH Jr.: Least squares glottal inverse filtering from the acoustic speech waveform. IEEE Transactions on Acoustics, Speech, and Signal Processing 1979,27(4):350-355. 10.1109/TASSP.1979.1163260
Krishnamurthy A, Childers DG: Two-channel speech analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing 1986,34(4):730-743. 10.1109/TASSP.1986.1164909
Miyoshi Y, Yamato K, Mizoguchi R, Yanagida M, Kakusho O: Analysis of speech signals of short pitch period by a sample-selective linear prediction. IEEE Transactions on Acoustics, Speech, and Signal Processing 1987,35(9):1233-1240. 10.1109/TASSP.1987.1165282
Pinto NB, Childers DG, Lalwani AL: Formant speech synthesis: improving production quality. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989,37(12):1870-1887. 10.1109/29.45534
Lee C-H: On robust linear prediction of speech. IEEE Transactions on Acoustics, Speech, and Signal Processing 1988,36(5):642-650. 10.1109/29.1574
Yanagida M, Kakusho O: A weighted linear prediction analysis of speech signals by using the given's reduction. Proceedings of the IASTED International Symposium on Applied Signal Processing and Digital Filtering, June 1985, Paris, France 129-132.
Miyanaga Y, Miki N, Nagai N, Hatori K: A speech analysis algorithm which eliminates the influence of pitch using the model reference adaptive system. IEEE Transactions on Acoustics, Speech, and Signal Processing 1982,30(1):88-96. 10.1109/TASSP.1982.1163856
Fujisaki H, Ljungqvist M: Estimation of voice source and vocal tract parameters based on ARMA analysis and a model for the glottal source waveform. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '87), April 1987, Dallas, Tex, USA 637-640.
Ding W, Kasuya H: A novel approach to the estimation of voice source and vocal tract parameters from speech signals. Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP '96), October 1996, Philadelphia, Pa, USA 2: 1257-1260.
Rahman MS, Shimamura T: Speech analysis based on modeling the effective voice source. IEICE Transactions on Information and Systems 2006,E89-D(3):1107-1115. 10.1093/ietisy/e89-d.3.1107
Deng H, Ward RK, Beddoes MP, Hodgson M: A new method for obtaining accurate estimates of vocal-tract filters and glottal waves from vowel sounds. IEEE Transactions on Audio, Speech, and Language Processing 2006,14(2):445-455.
Hermansky H, Fujisaki H, Sato Y: Spectral envelope sampling and interpolation in linear predictive analysis of speech. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '84), 1984, San Diego, Calif, USA 9: 53-56.
Hermansky H: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 1990,87(4):1738-1752. 10.1121/1.399423
Varho S, Alku P: Separated linear prediction—a new all-pole modelling technique for speech analysis. Speech Communication 1998,24(2):111-121. 10.1016/S0167-6393(98)00003-X
Kabal P, Kleijn B: All-pole modelling of mixed excitation signals. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '01), May 2001, Salt Lake City, Utah, USA 1: 97-100.
Oppenheim A, Schafer R: Homomorphic analysis of speech. IEEE Transactions on Audio and Electroacoustics 1968,16(2):221-226. 10.1109/TAU.1968.1161965
Rahman MS, Shimamura T: Linear prediction using homomorphic deconvolution in the autocorrelation domain. Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS '05), May 2005, Kobe Japan 3: 2855-2858.
Quatieri TF: Discrete-Time Speech Signal Processing: Principles and Practice. Prentice-Hall, Upper Saddle River, NJ, USA; 2002.
Lim JS: Spectral root homomorphic deconvolution system. IEEE Transactions on Acoustics, Speech, and Signal Processing 1979,27(3):223-233. 10.1109/TASSP.1979.1163234
Kobayashi T, Imai S: Spectral analysis using generalised cepstrum. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(6):1235-1238. 10.1109/TASSP.1984.1164454
Verhelst W, Steenhaut O: A new model for the short-time complex cepstrum of voiced speech. IEEE Transactions on Acoustics, Speech, and Signal Processing 1986,34(1):43-51. 10.1109/TASSP.1986.1164787
Kay SM: Modern Spectral Estimation: Theory and Application. Prentice-Hall, Upper Saddle River, NJ, USA; 1988.
Stoica P, Moses RL: Introduction to Spectral Analysis. Prentice-Hall, Upper Saddle River, NJ, USA; 1997.
Fant G, Liljencrants J, Lin QG: A four parameter model of glottal flow. In Quarterly Progress and Status. Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, Sweden; 1985:1-13.
Klatt DH: Software for a cascade/parallel formant synthesizer. Journal of the Acoustical Society of America 1980,67(3):971-995. 10.1121/1.383940