On a Method for Improving Impulsive Sounds Localization in Hearing Defenders
EURASIP Journal on Audio, Speech, and Music Processing volume 2008, Article number: 274684 (2008)
This paper proposes a new algorithm for a directional aid with hearing defenders. Users of existing hearing defenders experience distorted information, or in worst cases, directional information may not be perceived at all. The users of these hearing defenders may therefore be exposed to serious safety risks. The proposed algorithm improves the directional information for the users of hearing defenders by enhancing impulsive sounds using interaural level difference (ILD). This ILD enhancement is achieved by incorporating a new gain function. Illustrative examples and performance measures are presented to highlight the promising results. By improving the directional information for active hearing defenders, the new method is found to serve as an advanced directional aid.
In many-cases, individuals are forced to use hearing defenders for their protection against harmful levels of sound. Hearing defenders are used to enforce a passive attenuation of the external sounds which enter our ears. The use of existing hearing defenders affect natural sound perception. This, in turn, results in a reduction of direction-of-arrival (DOA) capabilities [1, 2]. This impairment of DOA estimation accuracy has been reported as a potential safety risk associated with existing hearing defenders .
This paper presents a new method for enhancing the perceived directionality of impulsive sounds while such sounds may contain useful information for a user. The proposed scheme introduces a directional aid to provide enhanced impulsive types of external sounds to a user; improving the DOA estimation capability of the user for those sounds. Exaggerating this directional information for impulsive sounds will not generally produce a psychoacoustically valid cue. Instead, this method is expected to enhance the user's ability to approximate the direction of an impulsive sound source, and thereby speed up the localization of this source. With the exception of enhanced directionality of impulsive sounds, the proposed method should not alter other classes of sounds (e.g., human speech sounds). Safety is likely to be increased by using our new approach for impulsive sounds.
The spatial information is enhanced without increasing the sound levels (i.e., signals are only attenuated and not amplified). The risk of damaging the user's hearing by the increased sound levels is thereby avoided. However, the proposed directional aid passes the enhanced external sounds directly to the user without any restrictions. It is therefore recommended, in a real implementation, that a postprocessing stage is incorporated after the proposed directional aid for limiting the sound levels passed to the user. Active hearing defenders with such limiting features are commercially available today.
A suitable application of our directional aid is for the active hearing defenders used in hunting, police, or military applications, in which impulsive sounds such as gun or rifle shots are omnipresent. In these applications, the impulsive sounds are likely to accompany danger, and therefore fast localization of impulsive sound sources is vital. A similar idea for enhancing the directional information can be found in , wherein the hearing defender is physically redesigned using passive means in order to compensate for the loss in directional information.
A brief introduction to the theory of human directional hearing is provided hereafter followed by our proposed scheme for a directional aid. An initial performance evaluation of the proposed method is given with a summary and conclusions.
2. Theory of Human Directional Hearing
The human estimation of direction of arrival can be modeled by two important binaural auditory cues : interaural time difference (ITD) and interaural level difference (ILD). There are other cues which are also involved in the discrimination of direction of arrival in the elevation angle. For example, the reflections of the impinging signals by the torso and pinna are some important features for the estimation of elevation angle. These reflections are commonly modeled by head related transfer functions (HRTFs) [6, 7]. The focus of this paper is on the use of the binaural cue ILD and estimation of direction of arrival on the horizontal plane.
The spatial characteristics of human hearing will be focused on when describing the underlying concept of these two cues, ITD and ILD. It is assumed that the sound is emitted from a monochromatic point source (i.e., a propagating sinusoidal specified by its frequency, amplitude, and phase). In direction-of-arrival estimation, the intersensor distance is very important to avoid spatial aliasing, which introduces direction-of-arrival estimation errors. The distance between the two ears of a human individual corresponds roughly to one period (the wavelength) of a sinusoidal with fundamental frequency . (For an adult person, this fundamental frequency is kHz.) A signal whose frequency exceeds is represented by more than one period for this particular distance. Those signals with frequencies below this threshold, , are represented by a fraction of a period. Consequently, for a signal whose frequency falls below , the phase information is utilized for direction-of-arrival estimation and this corresponds to the ITD model. However, for a signal with frequencies above , the phase information is ambiguous, and the level information of the signal is more reliable for direction-of-arrival estimation; this corresponds to the ILD model. The use of this level information stems from the fact that a signal that travels a further distance has, in general, lower intensity, and this feature is more accentuated at higher frequencies. Consequently, the ear closer to the source would have higher intensity sound than the opposite ear. Also, the human head itself obstructs signals passing from one ear to the other ear [8, 9].
This discussion (above) gives only a general overview and is a simplification of many of the processes involved in human direction-of-arrival estimation. However, this background provides us with the basis for a simplified human direction-of-arrival estimation model, as considered in this paper.
3. Proposed Scheme for a Directional Aid
In our scheme, two external omnidirectional microphones are mounted in the forward direction on each of the two cups of the hearing defender; see Figure 1. Also, two loudspeakers are placed in the interior of each cup. These loudspeakers are employed for the realization of a directional aid.
An overview of the scheme proposed for a directional aid is shown in Figure 2. Note that in this scheme, the low-frequency signal components are simply passed without any processing.
3.1. Signal Model
The microphones spatially sample the acoustical field, providing temporal signals and , where L and R represent the left and right sides of the hearing defender, respectively. An orthogonal two-band filter bank is used for each microphone. The low-frequency (LF) band of this filter bank, denoted by , consists of a low pass filter having a cut-off frequency around the fundamental frequency, , corresponding to the ITD spectral band. Similarly, the high-frequency (HF) band of the filter bank is denoted by and corresponds to the ILD spectral band. Since only the ILD localization cue has been employed in our approach, the LF signals (corresponding to the ITD cues) are simply passed through the proposed system, unaltered.
The left microphone signal, , is decomposed by the two-band filter bank into an LF signal, , and an HF signal, . Similarly the right microphone signal, , is decomposed into LF and HF components, and . The HF components are the inputs to the ILD enhancement block, see Figure 3, providing enhanced outputs of and . The left- and right-side output signals, and , are the sum of LF input signal components and enhanced HF output signal components according to and , respectively.
These filters, and , are for the sake of simplicity 128 tap long finite impulse response (FIR) filters, and they have been designed by the window method using Hamming window. It should be noted that, in a real implementation, it is of utmost importance to match the passive path to the active (digital) path with respect to signal delay in order to avoid a possibly destructive signal skew. The impulse response function of the passive path between the external microphone of a hearing defender to a reference microphone placed close to the ear canal of a user is presented in Figure 4. This estimated impulse response has a low pass characteristic and it has a dominant peak at 7 samples delay with sampling frequency 8 kHz. Thus, the active path should match this 7 sample delay of the passive path. This can be achieved in a real implementation by selecting a low delay (1 sample delay) analog-to-digital and digital-to-analog converters. In addition, the digital filter bank should be selected (or designed) with a pronounced focus on group delay in order to satisfy the matching of the passive and active paths (e.g., by using infinite impulse response (IIR) filter banks). The Haas effect (also denoted by the precedence effect)  pronounces the importance to minimize the temporal skew between the active and passive paths. An overly long delay in combination with a low passive path attenuation yields that our directional aid is unperceived. These aforementioned practical details are however considered out of the scope of this paper. However, these matters should be subject to further investigation in a later real-time implementation and evaluation of the proposed method.
3.2. The Proposed ILD Enhancement Scheme
One fundamental consideration regarding our proposed method involves first distinguishing whether a signal onset occurs. (A tutorial on onset detection in music processing can be found in , and a method for onset detection for source localization can be found in .) Once a signal onset has occurred, any other new onsets are disregarded within a certain time interval, unless a very distinct onset appears. This time interval is used to avoid undesired false onsets which may occur due to high reverberant environment or acoustical noise. When an onset is detected, the method distinguishes which of the sides (i.e., left or right) has the current attention. For instance, for a signal that arrives to the left microphone before the right microphone, attention will be focused on the left side, and vice versa. Based on the information about the onset and the side which provides the attention, the "unattended" side will be attenuated accordingly. Hence, the directionality of the sound can be improved automatically.
A detailed description of the important stages of the proposed method, involving onset detection, formation of side attention, and gain function computation method for the desired directionality enhancement, is followed here.
3.2.1. Onset Detection
The envelopes of each HF input signal are employed in the onset detection. The envelopes are denoted by and . To avoid mismatch due to uneven amplification among the two microphone signals, a floor function is computed for each side. These floor functions, denoted by and , are computed as
Here, represents a factor associated with the integration time of the floor functions. This integration time should be in the order of seconds such that the floor functions track slow changes in the envelopes. The function takes the minimum value of the two real parameters and . The normalized envelopes, and , are now computed according to
The envelope difference function is defined as
A ceiling function, , of the envelope difference function is computed according to
Here, is a real valued parameter that controls the release time of the ceiling function. This release time influences the resetting of some attention functions in (7), and this release time should correspond to the reverberation time of the environment. The function returns the maximum value of the real parameters and .
Now, an onset is detected if the ceiling function exactly equals the envelope difference function, that is . This occurs only when the function in (4) selects the second parameter, , which corresponds to an onset.
3.2.2. Side Attention Decision
In the case of a detected onset, the values of the normalized envelopes determine the current attention. If , the attention is to the left side and the corresponding attention function is updated. If, on the other hand, , the attention will be on the right side, and the attention function for the right side is updated. This attention function mechanism is formulated as two cases:
where the cases and are
and represents a forgetting factor for the attention functions and its integration time should be close to the expected interarrival time between two impulses.
3.2.3. Directional Gain Function
To avoid any false decisions, due to high reverberation environment or acoustical noise, a long-term floor function, , is employed to the ceiling function according to
where the parameter controls the integration time of this long-term average, and this integration time should be in the order of seconds in order to track slow changes in the ceiling function. In order to avoid drift in the attention functions, they are set to if the function of (7) selects the second parameter, . This condition will trigger a time after a recent onset has occurred (this time is determined mainly by and partly by ). Thereafter, the recent impulse is considered absent.
Depending upon the values of attention functions of and and the ceiling and floor functions of and , the two directional gain functions, and , can be calculated. If , the attention will shift towards the left side and consequently the right side will be suppressed. If, on the other hand, the attention is shifted towards the right side, that is, , then the left side is suppressed. The directional gain functions are computed according to
where the cases and are
Here, is a mapping function that controls the directional gain, and should be able to discriminate certain types of sounds. The mapping function used in this paper is inspired by the unipolar sigmoid function that is common in neural network literature ; it is defined here as
where the parameter controls the maximum directional gain imposed by the proposed algorithm. The parameter corresponds to a center-point that lies between the pass-through region () and attenuation region () of the mapping function. The parameter corresponds to the transition rate of the mapping function from the pass-through region to the attenuation region. The reason for using the quotient of the two parameters, and in (10), is to make the mapping function invariant to scales of the input signal. The various parameters in the present mapping function have been selected empirically such that impulsive sounds (which are identified as target sounds) are differentiated from speech (nontarget sounds). A set of parameters that appear to be suitable in the tested scenarios are , , and . The mapping function in (10) is presented in Figure 5. It is stressed that these parameters are found empirically through manual calibration of the algorithm. Optimal parameter values can be found by using some form of neural training.
Now, the output signals of the ILD enhancement block can be expressed as and . Consequently, the total output of the directional aid can be obtained as and .
3.3. Illustration of Performance
This section illustrates important output signals with the proposed algorithm. An impulsive sound signal (gun shots) and a speech signal are used as input for the algorithm. To aid the illustration, all signals have the peak magnitude 1. The sampling frequency and the algorithm's parameter values follow those outlined in Section 4. Four impulses are present; the first two impulses originate from the left side of the hearing defender, the second two impulses from the right side of the hearing defender. After 3.5 seconds, only speech is active. Figure 6 illustrates the input with its corresponding directional aid outputs and other relevant intermediary signals. This illustration highlights the operation of the algorithm, also demonstrates that the directional information for the two test signals is in fact enhanced (according to magnitude of the outputs for the two test impulses).
4. Performance Evaluation
In the following, the performance and characteristics of the proposed algorithm are demonstrated. Two cases are investigated. First is the directional aid's ability to enhance the directionality of impulsive sounds (gun shots) relative to speech sounds evaluated. Speech is a type of signal that should be transparent to the algorithm, that is, it should pass through the algorithm unaltered, since the focus of our algorithm is the enhancement of impulsive sounds. Second, the directional aid's sensitivity to interfering white noise is evaluated at various levels of impulsive sound peak energy to interfering noise ratio (ENR). The signals used in this evaluation are delivered through a loudspeaker in an office room (reverberation time milliseconds) and recorded using the microphones on an active hearing defender; see Figure 1. The sampling frequency is kHz, and the parameter values used in the evaluation are selected as seconds, and second, where the actual value of every parameter is computed using , where is the time constant (in seconds) associated to every parameter . This approximation is valid for .
4.1. Performance Measures
The maximal spectral deviation (MSD) is used as an evaluation measure. The MSD assesses the maximal deviation (in log-scale) of the processed output signal related to the unprocessed input signal, and is defined as
where the spectral deviation is
Here, and represent power spectral density estimates of the processed output signal and the corresponding input signal , where represents the channel index and corresponds to the frequency bin index. In other words, MSD assesses the maximal spectral deviation of the output signal with respect to the input signal over all channels and all frequencies. In general, the MSD is high if the process alters the output signal with respect to the input signal, and MSD is low if the output signal is spectrally close to the input signal.
For the evaluation of the directional aid's sensitivity to interfering noise, a directional gain deviation (DGD) measure is used. This measure compares the directional gains of each channel in an ideal case when no noise is present (ENR = ), denoted by and , with the case when interfering noise is present at a specific ENR level, while the directional gains are denoted as and . The DGD measures for each channel are defined as
Consequently, the desired behavior can be obtained if the directional gains at a specific ENR level exactly follow the directional gains in the ideal case, yielding the DGD measures to be zero. Any deviation from this behavior is considered as nonideal.
4.2. An Impulsive Test Signal
In this first test, an impulsive type of test signal (gun shots) is used to show the objective performance. The MSD for this impulsive test signal is 4.3 dB, which implies that the algorithm spectrally alters this test signal. This is also the expectation of the algorithm.
4.3. A Nonimpulsive Test Signal
In this second test, a nonimpulsive test signal (a speech signal) is used to demonstrate the performance. It is expected that such a signal should be transparent to the algorithm. The MSD for this speech test signal is 0 dB, which indicates that the algorithm is able to let such nonimpulsive signals remain spectrally undistorted.
4.4. Sensitivity to Interfering Noise
A mixture of white Gaussian noise and impulsive sounds acts as an input to the directional aid. The impulsive sounds are set to have a maximal amplitude of 1. The level of the interfering noise is then set according to a desired ENR level. The DGD measures for each channel are presented in Figure 7. This figure indicates that the directional aid fails to operate for ENR levels below 20 dB.
5. Summary and Conclusions
This paper presents a novel algorithm that serves as a directional aid for hearing defenders. Moreover, this algorithm intends to provide a protection scheme for the users of active hearing defenders. The users of the existing hearing defenders experience distorted directional information, or none at all. This is identified as a serious safety flaw. Therefore, this paper introduces a new algorithm and an initial analysis has been carried out. The algorithm passes nonimpulsive signals unaltered and the directional information of impulsive signals is enhanced as obtained by the use of a directional gain. According to some objective measures, the algorithm performs well and a more detailed analysis including a psychoacoustic study on real listeners will be conducted in future research. Furthermore, the psychoacoustic study should be carried out on a real-time system, where the impact of various design parameter values is evaluated with respect to the psychoacoustic performance with an intended live application.
The work presented herein is an initial work introducing a strategy for a directional aid in hearing defenders, with focus on impulsive sounds. Future research may include enhancing directional information (other than those related to impulsive sound classes) such as directionality of, for example, tonal alarm signals from a reversing truck.
Future research may also involve modifications of this proposed algorithm such as reduction of the sensitivity to interfering noise. The directional aid may be further enhanced with the addition of a control structure that restrains enhancement of the repetitive impulsive sounds, such as those from a pneumatic drill. This would extend the possible application areas of our directional aid.
Simpson BD, Bolia RS, McKinley RL, Brungart DS: The impact of hearing protection on sound localization and orienting behavior. Human Factors 2005,47(1):188-198. 10.1518/0018720053653866
Brungart DS, Kordik AJ, Eades CS, Simpson BD: The effect of microphone placement on localization accuracy with electronic pass-through earplugs. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA '03), October 2003, New Paltz, NY, USA 149-152.
Hager LD:Hearing protection. Didn't hear it comingnoise and hearing in industrial accidents. Occupational Health & Safety 2002,71(9):196-200.
Rubak P, Johansen LG: Active hearing protector with improved localization performance. Proceedings of the International Congress and Exposition on Noise Control Engineering (Internoise '99), December 1999, Fort Lauderdale, Fla, USA 627-632.
Blauert J: Spatial Hearing: The Psychacoustics of Human Sound Localization. MIT Press, Cambridge, Mass, USA; 1983.
Begault DR: 3-D Sound for Virtual Reality and Multimedia. Academic Press, San Diego, Calif, USA; 1994.
Duda RO: Modeling head related transfer functions. Proceedings of the 27th Asilomar Conference on Signals, Systems and Computers (ACSSC '93 ), November 1993, Pacific Grove, Calif, USA 2: 996-1000.
Moore BCJ: An Introduction to the Psychology of Hearing. 4th edition. Academic Press, San Diego, Calif, USA; 1997.
Cheng CI, Wakefield GH: Introduction to head-related transfer functions (HRTFs): representations of HRTFs in time, frequency, and space. Journal of the Audio Engineering Society 2001,49(4):231-249.
Gardner MB: Historical background of the Haas and/or precedence effect. The Journal of the Acoustical Society of America 1968,43(6):1243-1248. 10.1121/1.1910974
Bello JP, Daudet L, Abdallah S, Duxbury C, Davies M, Sandler MB: A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing 2005,13(5):1035-1047.
Supper B, Brookes T, Rumsey F: An auditory onset detection algorithm for improved automatic source localization. IEEE Transactions on Audio, Speech and Language Processing 2006,14(3):1008-1017.
Haykin S: Neural Networks: A Comprehensive Foundation. Prentice Hall, Upper Saddle River, NJ, USA; 1998.
About this article
Cite this article
Sällberg, B., Sattar, F. & Claesson, I. On a Method for Improving Impulsive Sounds Localization in Hearing Defenders. J AUDIO SPEECH MUSIC PROC. 2008, 274684 (2008). https://doi.org/10.1155/2008/274684
- Filter Bank
- Interaural Time Difference
- Infinite Impulse Response
- Onset Detection
- Floor Function