- Open Access
Physical task stress and speaker variability in voice quality
© Godin and Hansen. 2015
- Received: 23 December 2013
- Accepted: 23 December 2013
- Published: 8 October 2015
The presence of physical task stress induces changes in the speech production system which in turn produces changes in speaking behavior. This results in measurable acoustic correlates including changes to formant center frequencies, breath pause placement, and fundamental frequency. Many of these changes are due to the subject’s internal competition between speaking and breathing during the performance of the physical task, which has a corresponding impact on muscle control and airflow within the glottal excitation structure as well as vocal tract articulatory structure. This study considers the effect of physical task stress on voice quality. Three signal processing-based values which include (i) the normalized amplitude quotient (NAQ), (ii) the harmonic richness factor (HRF), and (iii) the fundamental frequency are used to measure voice quality. The effects of physical stress on voice quality depend on the speaker as well as the specific task. While some speakers do not exhibit changes in voice quality, a subset exhibits changes in NAQ and HRF measures of similar magnitude to those observed in studies of soft, loud, and pressed speech. For those speakers demonstrating voice quality changes, the observed changes tend toward breathy or soft voicing as observed in other studies. The effect of physical stress on the fundamental frequency is correlated with the effect of physical stress on the HRF (r = −0.34) and the NAQ (r = −0.53). Also, the inter-speaker variation in baseline NAQ is significantly higher than the variation in NAQ induced by physical task stress. The results illustrate systematic changes in speech production under physical task stress, which in theory will impact subsequent speech technology such as speech recognition, speaker recognition, and voice diarization systems.
- Physical task stress
- Glottal waveform analysis
- Speech variability
- Speaker variability
1.1 Background on speaking and exercise
Speaking and exercising compete for some of the same resources, and exercise affects the speech production system. Conversely, speaking during exercise affects exercise performance, influencing heart rate, ventilation, tidal volumes, and perception of dyspnea or air hunger. During exercise, speakers decrease their ventilation while speaking in order to make controlled utterances [3–7], then compensate in the period between utterances by significantly increasing their ventilation past baseline [4, 8]. When speaking segments are so long that recovery periods of sufficient length do not occur with enough frequency, the speaker is forced to place breathing pauses at linguistically inappropriate places . The effect of exercise on speech breathing is significant and consistent enough that it may be used as a feature in the automatic detection of exercise from the speech signal . Studies are inconsistent and conflicting regarding the question of whether speaking increases  or does not increase  heart rate relative to the exercise-only heart rate at the same VO2 task level. Finally, speech production during exercise results in reduction of oxygen intake and an increase in blood lactic acid , decreasing physical performance and hastening fatigue.
Significant inter-speaker variability was observed across these physiological variables including oxygen uptake, heart rate, and blood lactate . Also, while perceived speech production difficulty is strongly correlated with the difficulty of the exercise task [10–12], significant inter-speaker variability has been observed despite these correlations. The strength of the correlation may be increased when the subject pool is more uniformly fit and more generally homogeneous, such as the expert cyclists studied in Rodriguez-Marroyo et al. [11, 13].
Physical stress causes behavioral changes in the speech production system, resulting in acoustical differences compared to speech produced in neutral conditions. The most commonly studied acoustic parameter is the fundamental frequency (F0), which typically increases in physical task stress. In Godin and Hansen , mean F0 increased by 60 % of speakers, similarly for 7 of 10 subjects in Koblick , while Johannes et al.  observed increases for all speakers, with a more uniformly fit subject pool. Furthermore, Johannes et al.  designed their study to include a task of increasing difficulty and measurements of F0 throughout and proposed a nonlinear plateau model for the change in F0 due to stress. They noted that the anchor frequencies and the height of each plateau in their model were speaker-dependent. In contrast, Mohler  observed a linear increase in F0 with increases in VO2. While most studies considering speech during physical tasks use aerobic exercises as stimuli, Orliko  measured speech production characteristics before and during a weightlifting task. Mean F0 was not affected, nor was phonatory airflow nor pitch perturbation coefficient, but the F0 coefficient of variation increased.
Studies have also considered vocal intensity, noise-to-harmonics ratio, and jitter, which all may increase in physical task stress . One study suggests that these increases are correlated with the underlying increase in heart rate (Orlikoff and Baken, ). Godin and Hansen  found that the standard deviation of F0 increased by 2 % of speakers and decreased by 24 % of speakers, suggesting a reduced prosodic range in physical task stress. They found that utterance duration increased by 30 % of the speakers, as well as decreased by 43 % of the remaining speakers. Changes in duration may be related to the breathing strategies discussed above, and the inter-speaker differences here suggest that different speakers employ different strategies. The glottal open quotient and the first two formants are also affected by physical task stress . A qualitative comparison of low and high vowels to plosives and fricatives suggested that the vowels were more affected by physical task stress than the plosives and fricatives  and further that nasal phones are more affected by physical task stress than plosives and fricatives . This may be caused by the decline in nasal resistance during physical stress, which might in turn affect the acoustic properties of the upper vocal tract . Variability across speakers in response to physical task stress is a theme across these studies, where, as cited above, Koblick , Godin and Hansen , Baker et al. , and Godin et al.  observed parameter shifts for a majority but not all speakers. Godin and Hansen  observed changes for all speakers but found statistically significant differences in shift of these parameters across speakers, and Johannes et al.  observed shifts in F0 for all speakers but noted that the parameters of their model were speaker-dependent. The significant inter-speaker variability in the physiological and behavioral effects of stress as observed in, e.g., [3, 5], should result in significant inter-speaker variability in the acoustic correlates of stress. Significant speaker variability in acoustic correlates has also been noted for other types of stress [23, 24].
A recent study, Godin et al. , studied the effects of physical task stress on voice quality. That study measured six parameters, the harmonic richness factor (HRF), normalized amplitude quotient (NAQ), H1–H2 ratio (H1H2), F1F3syn , harmonics-to-noise ratio (HNR), and spectral slope (SS). Each of these six parameters is sensitive primarily to changes in the vocal fold behavior or related acoustical properties, rather than to the upper vocal tract. In plotting the distribution of each parameter in neutral and stress across all speakers, they found very little change in the overall distribution of the parameter sample values. However, when focusing on measurements from individual speakers, they observed effects of physical stress on these parameters for a subset of speakers. As with any examination of the effects of an outside influence on the behavior of the speech production system, we must approach our analysis from a speaker-dependent perspective. This study expands on Godin et al.  to look more closely at a subset of these voice quality measurements.
Voice quality is the acoustic result of phonatory behavior . Voice qualities include modal (neutral), creaky, breathy, whispery, tense, and lax  and depend on the tension and compression of the vocal folds, among other factors. Voice quality varies naturally throughout speech, carries paralinguistic information, and may depend on social context, mood, and intent [27, 28]. Variations in vocal fold health, tension, temperature, configuration, and other aspects result in significant acoustic differences as well as different voice qualities. These changes may be made consciously, as in the case of loud or soft vocal effort [29, 30], may be the result of emotions or stressors [29, 31, 32], or may be the result of unconscious communication habits . Thus, acoustic measures over the speech signal may be strongly associated with particular classes of vocal fold behavior and physiology.
Estimation of the glottal flow waveshape through inverse filtering of the speech waveform, and parameterization of the waveshape estimate, is the primary method by which to derive acoustic parameters that measure voice quality. Care is needed to ensure an effective vocal tact model from traditional linear prediction (LP), since the error residual from LP analysis is not guaranteed to represent the true glottal flow waveform, since it also encodes any error residual from poor vocal tract spectral modeling. The study by Gavidia-Ceballos and Hansen  explored this issue for subjects with various forms of vocal fold cancer and successfully employed estimates of vocal tract structure from parallel speakers to more accurately suppress vocal tract structure for glottal flow waveform analysis. The study by Cummings and Clements  considered inverse filtering with a parametric model of the resulting glottal waveform shape for speech under stress and emotion. Earlier analysis of the glottal source structure suggested this would be possible (Hansen and Clements ). With respect to quality, glottal pulse width, glottal pulse skewness, abruptness of glottal closure, and turbulent noise component may be indicative of voice type variation . Lower open quotient and closing quotient are related to breathy voice, and higher closing quotient is related to pressed voice . Higher AC flow and increased subglottal pressure is associated with loud voice, while lower AC flow and lower subglottal pressure is associated with soft voice . Harmonics-to-noise ratio has been extensively studied and is strongly correlated with breathy and rough voice quality . Aspiration noise is also a significant factor in voice quality and may be estimated using the F1F3syn parameter .
More recently, the normalized amplitude quotient (NAQ) has been demonstrated to be strongly correlated with voice quality variations and robust to noise and estimation errors [28, 37, 40]. Drugman et al.  showed that NAQ, H1H2 ratio, and harmonic richness factor (HRF) measured for soft, modal, and loud speech resulted in significantly different distributions in these parameters for a corpus of a single speaker. On their corpus of a single speaker, NAQ had a higher distribution mean for soft speech, middle distribution mean for the modal speech, and a sharper, lower-mean distribution for the loud speech. While there was less separation across speech types for the H1–H2 ratio, the harmonic richness factor was lower for soft speech.
Speech, even in the absence of stressors or significant emotions, has variations in voice quality that carry paralinguistic cues such as affirm, deny, or backchannel [28, 41], and many other external influences can affect voice quality, such as depression , circadian rhythm and fatigue [43, 44], cognitive load [29, 45, 46], emotions [29, 31, 45, 47, 48], and aging . Also, baseline (modal) values of voice quality measurements such as NAQ vary significantly across speakers . Spontaneous, continuous speech, typical of conversations, has voice quality characteristics that differ significantly enough across speakers that they may be used as features for automatic speaker identification systems [50, 51]. For these reasons, in order to measure voice quality of a given speech segment, the measure must be normalized for the underlying speaker variation regarding age, mood, conversational context, fatigue, and other factors.
Like depression, emotions, circadian rhythm, and conversational context, physical task stress can be expected to induce changes in voice quality, driven by the physical demands of exercise and the competition between the speaking and breathing tasks. As physical task stress is an external factor that drives behavior and physiology rather than a specific phonatory behavior itself, we may not expect a direct link between the parameters of the physical task or the fitness of the speaker and the resulting acoustic measures.
In the analysis of speech under stress, a range of speech parameters are possible. In the area of speech under stress analysis, Hansen [29, 52] considered 200 speech parameters spanning the domains of glottal spectrum, pitch/fundamental frequency, duration, intensity, vocal tract spectral structure. Further analysis was considered for military communication applications of speech under stress by NATO RSG.10 , USAF . These feature analysis studies lead to advancements in robust speech recognition under stress [29, 55–57] and a tutorial overview of a number of stress compensation techniques based on voiced-transition-unvoiced speech tagging as well as neural network and source generator compensation of stress . An additional application domain included advancements in automatic detection of speech under stress using signal processing advancements derived from the Teager energy operator (TEO) , TEO-CB-AutoEnv (). More recently, nonlinear TEO-based advancements have been considered for stress detection using sub-band filterbank weighting for various actual speech under stress scenarios [61, 62]. While these have explored a range of stress conditions, specific speech under physical task stress was not addressed. As such, it is believed that alternative features could also be explored for the present study. As such, it is believed that alternative features could also be explored for the present study. “In this study, the UTSCOPE-Physical Task Stress corpus (see Table 1) is employed for analysis. The Corpus consists of 78 subjects collected in both neutral and physical task stress conditions, as well as being balanced across gender (male/female), native/non-native, read/spontaneous speech conditions.”
We have selected three parameters to study the voice quality effects of physical task stress. Fundamental frequency is widely studied and serves as a comparison with prior work. Harmonic richness factor (HRF) and normalized amplitude quotient (NAQ) have been selected because past studies have quantified the relationship of these parameters to specific speaking behaviors including pressed speech and soft speech. This facilitates our investigation into whether the effects of physical stress can be described in terms of these speaking types.
NAQ and HRF can be reliably estimated if an inverse filtered glottal waveform is available. Past studies have shown that care should be exercised in applying vocal tract inverse filtering for glottal source waveform estimation when voice characteristics are under pathology , since determining the exact glottal closure instant (GCI) is not always possible. In general, glottal inverse filtering is a significant area of research interest. Here, the GLOAT toolkit is used for GCI detection , fundamental frequency estimation , and glottal inverse filtering . Kane and Gobl  demonstrated that voice quality variation has a significant effect on the accuracy of GCI detection, which is critical for correct glottal inverse filtering, but their data suggested that the SEDREAMS method used here is reliable enough for speech analysis, despite voice quality variation.
3.1 Fundamental frequency
The fundamental frequency (F0) has been the primary object of study of speech under physical task stress. Most studies have concluded that stress results in an increase in F0. However, there is significant speaker variability in the effects of physical stress on F0, as Godin and Hansen  noted an increase in the F0 by just 61 % of speakers and a decrease by 14 % of speakers. To reduce F0 estimation errors such as doubling or halving, we have set the allowable range to 120–400 Hz.
3.2 Normalized amplitude quotient
The normalized amplitude quotient (NAQ) is the ratio of the maximum amplitude of the glottal flow to the minimum of the glottal flow derivative, normalized by the fundamental period and the sampling frequency . NAQ is sensitive to variations caused by breathy and pressed phonation  and to soft and loud speech . It is known that NAQ increases for breathy phonation, and decreases for pressed phonation, relative to neutral speech.
3.3 Harmonic richness factor
The harmonic richness factor (HRF) is the ratio of the sum of the amplitudes at the harmonics in the glottal waveform to the amplitude of the component at the fundamental frequency . In Childers and Lee , the HRF of modal voicing was higher than that for breathy voicing by 6.8 db. In Drugman and Alwan , there were clear shifts in the distribution of HRF between loud, modal, and soft voicing. In our implementation of HRF, we have used only the fundamental and the first nine harmonics. This ensures that, unless the F0 exceeds 800 Hz, all measurements of HRF sum over the same number of harmonics, eliminating the dependence of HRF on F0.
The UT-SCOPE and UT-SCOPE-Phy-II corpora were collected under the same protocols and are used together. Expanding on the protocol from above, both corpora include a segment of 35 prompted TIMIT sentences spoken in both neutral and physical task stress (presented through headphones). A spontaneous speech portion is also available but was not used in this study. These sentences comprise the analysis data set used in this study. Having the same sentence spoken in both tasks reduces the phonetic variability for analysis of the effects of physical task stress. Sessions from 66 female native speakers of American English are used in this study. We choose to consider the female speakers because we had a larger sample size and did not want to introduce gender as another variable in the study. All participants were at least 18 years of age at the time of the study. A Conversion II Elliptical/Stair Stepper machine (Fig. 3, along with other equipment) was used to induce the physical task stress. Each speaker was asked to maintain an approximately 10-mph pace on the machine (there is a digital readout which indicates speed and allows the subject to maintain the requested pace). Having the same task for each subject resulted in different levels of exertion for each speaker, and therefore, there is a diversity of exertion levels in the corpus. The data was collected inside 13-by-13 ft ASHA-certified single-walled sound booth, with the subject wearing a Shure Beta 53 close-talking microphone.
Physical task stress is an external factor imposed on the speech production system that competes for limited physical resources when subjects are performing simultaneous tasks (i.e., speaking and physical task). Speaking while exercising increases the actual and perceived difficulty of the task , and exercising while speaking results in significant changes to the fundamental frequency (F0), formant structure (location, etc.), pause placement, open quotient, and other measureable speech parameters. Based on this evidence, this study undertook an investigation of the voice quality changes induced by physical task stress, with particular attention paid to the speaker differences in the measured response. Compared with previous studies in the area of physical task stress [19, 20, 22] and voice quality [28, 36, 39], this study uses a larger corpus of speech, with speakers of varying fitness levels, with significantly more inter-subject variability, while retaining low phonetic variability.
We expected that physical stress would induce a greater variety of phonation behaviors. This might have resulted in a flattening of the NAQ and HRF histograms (i.e., becoming more uniform in distribution). Instead, in the global distributions of voice quality parameters for HRF and NAQ, rather small overall changes were observed, suggesting a corresponding small overall change in phonation behavior. However, for a subset of speakers, shifts in the mean values of NAQ and HRF were consistent with significant changes in voice quality, with trends toward either breathy or soft voice dimensions. These changes were not correlated with an elevated exertion level but were instead correlated with an increased fundamental frequency (F0). Research on speech and exercise has suggested that exercise results in both an increased vocal fold tension and increased subglottal pressure, relative to neutral speech production. This would suggest that, for those whom the voice quality is affected, the voice quality should move toward the pressed or tense voice, rather than the breathy or soft voice observed here in the current study. Further investigation of the relationship between physical changes caused by physical task stress and the voice quality changes is required in order to explain these results. It is in fact a major challenge to exactly measure physical airflow and actual excitation structure during speech production while subjects are performing physical tasks (i.e., without the measurement devices/instruments themselves introducing new variables into the problem).
It has been shown that listeners can perceive physical stress in speech , and therefore, there must be perceptual artifacts that consistently identify stressed speech across speakers. If voice quality is an inconsistent indicator of physical tasks stress, it is likely that inappropriate pause placement, formant shifts, and increased F0 play a more significant role in the perception of physical task stress than voice quality.
Finally, significant variation was observed in the baseline neutral measurements for NAQ and HRF across speakers. The inter-speaker variation in baseline was significantly greater than the variation induced by physical task stress. As observed in Godin et al. , this inter-speaker variation makes it difficult to consistently employ voice quality parameters individually for stress detection, and therefore, probabilistic classifiers may rely more on the correlation between these parameters than on the raw values of individual parameters themselves for detection of voice quality changes.
This project was funded by AFRL under contract FA8750-15-1-0205 and partially by the University of Texas at Dallas from the Distinguished University Chair in Telecommunications Engineering held by J.H.L. Hansen.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- J Deller, JHL Hansen, J Proakis, Discrete-Time Processing of Speech Signals, 2nd edn. (IEEE Press, New York, 2000)Google Scholar
- AT Welford, Stress and Performance. Ergonomics, 2007, pp. 567–580Google Scholar
- E Baker, J Hipp, H Alessio, Ventilation and speech characteristics during submaximal aerobic exercise. J. Speech. Lang. Hear. Res 51, 1203–1214 (2008)View ArticleGoogle Scholar
- JH Doust, JM Patrick, The limitation of exercise ventilation during speech. Respir. Physiol. 46, 137–147 (1981)View ArticleGoogle Scholar
- Y Meckel, A Rotstein, O Inbar, The effects of speech production on physologic responses during submaximal exercise. Med. Sci. Sports Exerc. 34(8), 1337–43 (2002)View ArticleGoogle Scholar
- EF Bailey, JD Hoit, Speaking and breathing in high respiratory drive. J. Speech Lang. Hear. Res. 45, 89–99 (2002)Google Scholar
- JD Hoit, RW Lansing, KE Perona, Speaking-related dyspnea in healthy adults. J. Speech Lang. Hear. Res. 50, 361–374 (2007)Google Scholar
- JE Luketic, The Effect of Inspiratory Muscle Strength Training on Ventilation and Dyspnea During Simultaneous Exercise and Speech (Master’s thesis, Miami University, Oxford, 2007)Google Scholar
- SA Patil, Alternate Sensor Based Speech Systems for Speaker Assessment and Robust Human Communication. PhD thesis, CRSS: Center for Robust Speech Systems (The University of Texas at Dallas, Richardson, 2009)Google Scholar
- JG Mohler, Quantification of dyspnea confirmed by voice pitch analysis. Bull. Eur. Physiopathol. Respir. 18, 837–50 (1982)Google Scholar
- JA Rodriguez-Marroyo, G Villa, J Garcia-Lopez, C Foster, Relationship between the talk test and ventilatory thresholds in well trained cyclists. J. Strength Cond. Res. 27(7), 1942–1949 (2013)View ArticleGoogle Scholar
- A Rotstein, Y Meckel, O Inbar, Perceived speech difficulty during exercise and its relation to exercise intensity and physiological responses. Eur. J. Appl. Physiol. 92, 431–436 (2004)View ArticleGoogle Scholar
- JA Rodriguez-Marroyo, J Garcia-Lopez, C-E Juneau, JG Villa, Workload demands in professional multi-stage cycling races of varying duration. Br. J. Sports Med. 43, 180–185 (2007)View ArticleGoogle Scholar
- KW Godin, JHL Hansen, Analysis and Perception of Speech Under Physical Task Stress. ISCA INTERSPEECH-2008, 2008, pp. 1674–1677. Brisbane, AustraliaGoogle Scholar
- HM Koblick, Effects of Simultaneous Exercise and Speech Tasks on the Perception of Effort and Vocal Measures in Aerobic Instructors (Master’s thesis, Univ. of Central Florida, Orlando, 2004)Google Scholar
- B Johannes, P Wittels, R Enne, G Eisinger, CA Castro, JL Thomas, AB Adler, R Gerzer, Non-linear function model of voice pitch dependency on physical and mental load. Eur. J. Appl. Physiol. 101, 267–276 (2007)View ArticleGoogle Scholar
- RF Orliko, Voice production during a weightlifting and support task. Folia Phoniatr. Logop. 60, 188–194 (2008)View ArticleGoogle Scholar
- RF Orliko, RJ Baken, The effect of the heartbeat on vocal fundamental frequency perturbation. J. Speech Hear. Res. 32, 576–582 (1989)View ArticleGoogle Scholar
- KW Godin, JHL Hansen, Vowel context and speaker interactions influencing glottal open quotient and formant frequency shifts in physical task stress. ISCA INTERSPEECH-2011, 2011, pp. 2945–2948Google Scholar
- KW Godin, JHL Hansen, Analysis of the effects of physical task stress on the speech signal. J. Acoust. Soc. Am. 130, 3992–3998 (2011)View ArticleGoogle Scholar
- LG Olson, KP Strohl, The response of the nasal airway to exercise. Am. Rev. Respir. Dis. 135(2), 356–359 (1987)Google Scholar
- KW Godin, T Hasan, JHL Hansen, Glottal Waveform Analysis of Physical Task Stress Speech. ISCA INTERSPEECH-2012, Wed-SS6-15, 2012, pp. 1–4. Portland, ORGoogle Scholar
- MHL Hecker, KN Stevens, G von Bismark, CE Williams, Manifestations of task-induced stress in the acoustic speech signal. J. Acoust. Soc. Am. 44(4), 993–1001 (1968)View ArticleGoogle Scholar
- JHL Hansen, S Patil, Speech Under Stress: Analysis, Modeling and Recognition. Speaker Classification I: Fundamentals, Features, and Methods, (Springer Publishing, 2007), pp. 108–137Google Scholar
- CT Ishi, A New Acoustic Measure for Aspiration Noise Detection. ISCA INTERSPEECH-2004, 2004. Jeju Island, KoreaGoogle Scholar
- C Gobl, AN Chasaide, Acoustic characteristics of voice quality. Speech Comm. 11, 481–490 (1992)View ArticleGoogle Scholar
- N Campbell, Changes in Voice Quality Due to Social Conditions. Proc. Inter. Congress on Phonetic Science, 2007, pp. 2093–2096Google Scholar
- N Campbell, P Mohktari, Voice Quality: The 4th Prosodic Dimension. Proc. Inter. Congress on Phonetic Science, 2003, pp. 2417–2430Google Scholar
- JHL Hansen, Analysis and Compensation of Stressed and Noisy Speech with Application to Robust Automatic Recognition (PhD thesis, School of Electrical Engineering, Georgia Institute of Technology, Atlanta, 1988)Google Scholar
- C Zhang, JHL Hansen, Analysis and Classification of Speech Mode: Whispered Through Shouted. ISCA Interspeech-2007, 2007, pp. 2289–2292Google Scholar
- C Gobl, AN Chasaide, The role of voice quality in communicating emotion, mood, and attitude. Speech Comm. 40, 182–212 (2003)View ArticleGoogle Scholar
- CE Williams, KN Stevens, Emotions and speech: some acoustical correlates. J. Acoust. Soc. Am. 52(4B), 1238–1250 (1972)View ArticleGoogle Scholar
- L Gavidia-Ceballos, JHL Hansen, Direct speech feature estimation using an iterative EM algorithm for vocal cancer detection. IEEE Trans. Biomed. Eng. 43(4), 373–383 (1996)View ArticleGoogle Scholar
- KE Cummings, MA Clements, Analysis of glottal excitation of emotionally styled and stressed speech. J. Acoust. Soc. Am. 98, 88–98 (1995)View ArticleGoogle Scholar
- JHL Hansen, MA Clemments, Evaluation of speech under stress and emotional conditions. J. Acoust. Soc. Am. 82, S17 (1987)View ArticleGoogle Scholar
- DG Childers, CK Lee, Vocal quality factors: analysis, synthesis, and perception. J. Acoust. Soc. Am. 90(5), 2394–2410 (1991)View ArticleGoogle Scholar
- P Alku, E Vilkman, A comparison of glottal voice source quantification parameters in breathy, normal and pressed phonation of female and male speakers. Folia Phoniatr. Logop. 48, 250–254 (1994)Google Scholar
- EB Holmberg, RE Hillman, JS Perkell, Glottal airflow and transglottal air pressure measurements for male and female speaker in soft, normal, and loud voice. J. Acoust. Soc. Am. 84, 511–529 (1988)View ArticleGoogle Scholar
- G de Krom, Some spectral correlates of pathological breathy and rough voice quality for different types of vowel fragments. J. Speech Hear. Res. 38, 794–811 (1995)View ArticleGoogle Scholar
- T Drugman, B Bozkurtb, T Dutoit, Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation. Speech Comm. 53, 855–866 (2011)View ArticleGoogle Scholar
- CT Ishi, K-I Sakakibara, H Ishiguro, N Hagita, A method for automatic detection of vocal fry. IEEE Trans. Audio Speech Lang. Process. 16(1), 47–56 (2008)View ArticleGoogle Scholar
- E Moore, J Torres, A performance assessment of objective measures for evaluating the quality of glottal waveform estimates. Speech Comm. 50, 56–66 (2008)View ArticleGoogle Scholar
- M Artkoski, J Tommila, A-M Laukkanen, Changes in voice during a day in normal voices without vocal loading. Logoped. Phoniatr. Vocol. 27, 118–123 (2002)View ArticleGoogle Scholar
- AL Bouhuys, HK Schutte, DGM Beersma, GLJ Nieboer, Relations between depressed mood and vocal parameters before, during and after sleep deprivation: a circadian rhythm study. J. Affect. Disord. 19, 249–258 (1990)View ArticleGoogle Scholar
- KE Cummings, MA Clements, Analysis of Glottal Waveforms Across Stress Styles. IEEE ICASSP-90: Inter. Conf. Acoustics, Speech, and Signal Processing, 1990View ArticleGoogle Scholar
- TF Yap, J Epps, EHC Choi, E Ambikairajah, TX Dallas, Glottal Features for Speech-Based Cognitive Load Classification. IEEE ICASSP-2010: Inter. Conf. Acoustics, Speech, and Signal Processing, 2010, pp. 5234–5237Google Scholar
- M Lugger, B Yang, Cascaded Emotion Classification via Psychological Emotion Dimensions Using a Large Set of Voice Quality Parameters. IEEE ICASSP-2008: Inter. Conf. Acoustics, Speech, and Signal Processing, 2008View ArticleGoogle Scholar
- R Sun, E Moore, Affective Computing and Intelligent Interaction, vol. 6975 of Lecture Notes in Computer Science, chapter Investigating Glottal Parameters and Teager Energy Operators in Emotion Recognition, (Springer, 2011), pp. 425–434Google Scholar
- SE Linville, J Rens, Vocal tract resonance analysis of aging voice using long-term average spectra. J. Voice 15, 323–330 (2001)View ArticleGoogle Scholar
- J Gudnason, M Brookes, Voice Source Cepstrum Coefficients for Speaker Identification. IEEE ICASSP-2008: Inter. Conf. Acoustics, Speech, and Signal Processing, 2008View ArticleGoogle Scholar
- MD Plumpe, TF Quatieri, DA Reynolds, Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE. Trans. Speech. Audio. Process. 7(5), 569–86 (1999)View ArticleGoogle Scholar
- JHL Hansen, Evaluation of Acoustic Correlates of Speech Under Stress for Robust Speech Recognition, 1989, pp. 31–32. Boston, MassGoogle Scholar
- JHL Hansen, C Swail, AJ South, RK Moore, H Steeneken, EJ Cupples, T Anderson, CRA Vloeberghs, I Trancoso, P Verlinde, The Impact of Speech Under ‘Stress’ on Military Speech Technology, published by NATO Research & Technology Organization RTO-TR-10, AC/323(IST)TP/5 IST/TG-01, 2000Google Scholar
- JHL Hansen, SE Bou-Ghazale, G Zhou, R Sarikaya, Speech Processing in Noise, Stress, and Lombard Effect, Research Monograph published by DoD, AFRL-IF-RS-TR-1999-208, 1999Google Scholar
- SE Bou-Ghazale, JHL Hansen, A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE. Trans. Speech. Audio. Process. 8(4), 429–442 (2000)View ArticleGoogle Scholar
- JHL Hansen, D Cairns, ICARUS: a source generator based real-time system for speech recognition in noise, stress, and Lombard effect. Speech Comm. 16(4), 391–422 (1995)View ArticleGoogle Scholar
- JHL Hansen, M Clements, Source generator equalization and enhancement of spectral properties for robust speech recognition in noise and stress. IEEE. Trans. Speech. Audio. Process. 3(5), 407–415 (1995)View ArticleGoogle Scholar
- JHL Hansen, Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Comm. Special Issue Speech Under Stress. 20(2), 151–170 (1996)Google Scholar
- D Cairns, JHL Hansen, Nonlinear snalysis and detection of speech under stressed conditions. J. Acoust. Soc. Am. 96(6), 3392–3400 (1994)View ArticleGoogle Scholar
- G Zhou, JHL Hansen, JF Kaiser, Nonlinear feature based classification of speech under stress. IEEE. Trans. Speech. Audio. Process. 9(2), 201–216 (2001)View ArticleGoogle Scholar
- JHL Hansen, W Kim, M Rahurkar, E Ruzanski, J Meyerhoff, Robust emotional stressed speech detection using weighted frequency subbands. EURASIP J. Adv. Signal Process. Article ID 906789, 10 (2011)Google Scholar
- JHL Hansen, E Ruzanski, H Boril, J Meyerhoff, TEO-based speaker stress assessment using hybrid classification and tracking schemes. Int. J. Speech Technol. 15(3), 295–311 (2012)View ArticleGoogle Scholar
- T Drugman, M Thomas, J Gudnason, P Naylor, T Dutoit, Detection of glottal closure instants from speech signals: a quantitative review. IEEE Trans. Audio Speech Lang. Process. 20, 994–1006 (2012)View ArticleGoogle Scholar
- T Drugman, A Alwan, Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics. ISCA INTERSPEECH-2011, 2011, pp. 1973–1976Google Scholar
- J Kane, C Gobl, Evaluation of glottal closure instant detection in a range of voice qualities. Speech Comm. 55, 295–314 (2013)View ArticleGoogle Scholar
- P Alku, T Backstrom, E Vilkman, Normalized amplitude quotient for parametrization of the glottal flow. J. Acoust. Soc. America. 112, 701–710 (2002)View ArticleGoogle Scholar
- A Ikeno, V Varadarajan, S Patil, JHL Hansen, UT-Scope: Speech Under Lombard Effect and Cognitive Stress. IEEE Aerospace Conf.-2007, 2007, pp. 1–7. Big Sky, MontanaGoogle Scholar
- AL Webster, S Aznar-Lain, Intensity of physical activity and the “talk test”. ACSM's Health. Fitness J. 12, 13–17 (2008)Google Scholar
- JA Davis, VA Convertino, A comparison of heart rate methods for predicting endurance training intensity. Med. Sci. Sports. 7, 295–298 (1975)Google Scholar
- H Tanaka, KD Monahan, DR Seals, Age-predicted maximal heart rate revisited. J. Am. Coll. Cardiol. 37, 153–156 (2001)View ArticleGoogle Scholar