- Research
- Open Access
- Published:

# An artificial patient for pure-tone audiometry

*EURASIP Journal on Audio, Speech, and Music Processing*
**volume 2018**, Article number: 8 (2018)

## Abstract

The successful treatment of hearing loss depends on the individual practitioner’s experience and skill. So far, there is no standard available to evaluate the practitioner’s testing skills. To assess every practitioner equally, the paper proposes a first machine, dubbed artificial patient (AP), mimicking a real patient with hearing impairment operating in real time and real environment. Following this approach, we develop a multiple-input multiple-output auditory model that synthesizes various types of hearing loss as well as elements from psychoacoustics such as false response and reaction time. The model is then used to realize a hardware implementation, comprising acoustic and vibration sensors, sound cards, and a fanless personal computer. The AP returns a feedback signal to the practitioner upon perceiving a valid test tone at the hearing threshold analogous to a real patient. The AP is derived within a theoretical framework in contrast to many other solutions. The AP handles masked air-conduction and bone-conduction hearing levels in the range from 5 to 80 dB and from – 20 to 70 dB, respectively, both at 1 kHz. The frequency range is confined within 250 and 8000 Hz. The proposed approach sets a new quality standard for evaluating practitioners.

## Introduction

So far, 328 million adults and 32 million children suffer from disabling hearing loss [1]. Disabling hearing loss refers to hearing loss greater than 40 dB in the better hearing ear in adults and a hearing loss greater than 30 dB in the better hearing ear in children. To identify auditory impairment, the audiologist makes an audiometry exam. If diagnosed accurately, hearing loss can then be managed by technology or corrected by surgery. In combination with counseling of hearing impaired persons and their family, this will make it possible to achieve successfully hearing (re)habilitation. To evaluate the practitioner’s testing skills, there is currently no standard available.

### Background

Pure-tone audiometry is a behavioral test used to identify hearing threshold levels of an individual. This test is performed with an audiometer, comprising a single tone generator, a bone vibrator for measuring the cochlea function, and earphones for air-conduction testing. The result is recorded in an audiogram [2]. To test *air-conduction (AC) hearing*, pure-tone sound pressure is applied to the ipsilateral ear through an earphone. Sound propagates into the ear through air in the auditory ear canal. The procedure is repeated for specific frequencies typically in the range from 250 and 8000 Hz. To test *bone-conduction (BC) hearing*, pure-tone vibrating force is placed on the ipsilateral mastoid, bypassing the middle ear. An acoustic stimulus presented to the ipsilateral ear does not necessarily stimulate the ipsilateral cochlea only, but also crosses the skull, gets attenuated, and stimulates the contralateral cochlea. An overview of mean value and range of this so-called *interaural attenuation* for AC and BC hearing tests can be found in [3] and [4], respectively. To eliminate its participation from the test, the contralateral cochlea is stimulated by narrow-band masking noise centered around the pure-tone frequency.

The lowest sound level at which the pure tone at a standardized frequency is heard, is called “hearing threshold,” expressed in hearing level of decibel relative to the quietest sound a standardized young healthy individual ought to be able to hear [5]. Since nerve cell activity is essential random, the threshold can only be interpreted as stimulus level where the pure tone is detected with some predefined probability [6]. The psychometric function describes the probability of a positive response as a function of the pure-tone level. The shape of the psychometric function depends on the particular audiometric test procedure, used by the audiologist [7]. The most used method in manual audiometry is the modified Hughson-Westlake procedure [8], also described in ANSI S3.21-1978 (R-1992). The measurement procedure can also be complicated when the patient gives false response due to confusion and due to pseudohypacusis [9]. A response is a false alarm when the patient responds though there is no test tone present. A response is a missed detection when the patient fails to respond to an audible test tone. Another issue, influencing the measurement procedure, is the reaction time of the patient. Generally speaking, age slows the reaction time [10]. When a test tone with a level near the hearing threshold is presented, the reaction time increases again [11]. Finally, ambient noise in the test environment causes (additional) auditory masking due to suppression of basilar membrane vibrations in the cochlea [12]. Hence, the permissible ambient noise level has to comply with standardized noise limits [13]. In rural locations, however, audiometric tests are often performed in public places and office rooms due to lack of infrastructure. In such environments, background noise level may be as high as an A-weighted sound pressure level of 51 dB [14].

### Related work

The underlying signal model is crucial for the quality of a patient simulator. When the anatomic abnormality causing the hearing loss is *known*, the auditory system can be modeled as chain of (complex) electrical signal blocks, each based on a parameterized signal model. The parameters of the model represent the underlying pathological condition. The ratio of output signal to input signal is proportional to the hearing loss. Following this approach, Parent and Allen reproduce in [15] the major characteristics of the tympanic membrane between the middle ear and the external ear as passive electrical network. Kates and Arehart derive in [16, 17] a signal block representing the acoustic properties of the middle ear and the cochlea. A parametrized signal model of the entire acoustic chain, consisting of external canal, eardrum, bone chain with oval window, auditory nerve, cochlear nucleus, thalamus, brain stem, and cortex can be found in [18]. Note that all these models are single-input single-output, i.e., they do not consider cross-hearing between the left and right ear.

When the underlying anatomic abnormality is *unknown* or not of interest, model complexity can be reduced significantly by abstracting the hearing threshold. Following this approach, the individual hearing loss near the hearing threshold can be readily extracted from the audiograms in the frequency domain [19]. Along with the mean interaural attenuation for AC and BC hearing [4, 20], one can easily derive a multiple-input multiple-output patient simulator such as that in [21].

To train audiology students, instructors, and audiologists, several patient simulators have been developed during the past years. Among them there are *Otis - The virtual patient* [22], *AudsimFlex* [21], and *Audiology Clinic* [23]. These patient simulators are computer-generated patients, enabling the trainee to develop clinical reasoning skills without causing damage to real patients [24]. Recently, Heitz developed in [25] the Clinical Audiology Simulator for use at the University of Canterbury in conjunction with the HIT Lab New Zealand. In contrast to the previous patient simulators, the latter complies with the test batteries by the New Zealand Audiology Society. All these patient simulators simulate only hearing thresholds in noise-free environments without considering the patient behavior. In other words, they mimic real patients that never make mistakes.

In an attempt to close this gap, the work in [26] realizes a patient simulator that emulates hearing thresholds in real environments. The underlying system model attempts to abstract the hearing threshold. The drawbacks are firstly that the model is incapable of separating the audio sources. Therefore, both the pure tone from the ipsilateral ear and masking noise from the contralateral ear may trigger a feedback signal by the ipsilateral cochlea, resulting in a tremendous number of false positive errors. That limits the practical usability of this patient simulator. Secondly, the system architecture in [26] has been designed by following a heuristic approach. Thus, the performance of their patient simulator is sub-optimal.

To personalize hearing aids, Szopos et al. synthesized in [27] human audiograms based on the Real-Coded Genetic Algorithm. Note that cross-hearing cannot be incorporated in their algorithm, and convergence of their algorithm is not guaranteed either.

### Contribution

In this contribution, we want to add intelligence to existing patient simulators. We develop a multiple-input multiple-output auditory hearing model that is capable of handling jointly ipsilateral, contralateral, and interaural hearing loss, abstracting the AC and BC hearing thresholds. The model outputs an expression for the total hearing level at both cochleas as a function of the ipsilateral (test sound) and the contralateral (masking) sound pressure. Its matrix representation is full rank, implying that only a valid test signal can trigger a feedback signal but not the masking noise. To control the patient’s behavior, the proposed model is able to predict false alarm and missed detection probabilities, as well as an individual reaction time. The development of a psychometric function is outside the scope of the paper and so is the incorporation of the practitioner’s threshold measurement protocol. Yet, our artificial patient does not limit the testing strategy of the practitioner.

Based on our model, we realize an AP within the framework of non-coherent parameter estimation and hypothesis testing. The hardware of the proposed AP comprises an artificial head with microphones for AC hearing and skull simulator for BC hearing, an environmental microphone to record background noise, sound cards, and a noiseless personal computer. That implies solid-state drive and no fan. It is worth mentioning that the combined hardware/software implementation has the advantage that our AP cannot only be used to evaluate practitioners but also measurement errors due to transducer displacement and environmental noise which cannot be handled by a software-only realization.

The resulting artificial patient (AP) shall be aware about the environmental noise in the test room, be able to identify the ipsilateral input autonomously, listen to real audiometric test signals, and return a feedback signal to the practitioner, all in (soft) real time. The AP may also include elements from psychoacoustics with the ultimate goal to evaluate the practitioner. The system has been designed by the commercial software LabVIEW. The database is managed by the open-source software MySQL.

### Notation

In the following, bold letters denote vectors and matrices. Unless stated otherwise, time functions are indicated by a lowercase letter and its Fourier transform by the corresponding uppercase letter. The notation col{·} represents a column vector with the elements in the argument as its entries. The floor function ⌊(·)⌋ returns the largest integer less than or equal to the argument. The symbol diag{·} denotes the square matrix with the argument along its main diagonal and ∥·∥ is the 2-norm of the argument. Note that all quantities are measured in linear units unless they are of type level. The latter is the logarithm of the ratio of the value of that quantity to a reference value of the same quantity, expressed in Bel.

## System model

In this section, we derive a digital system model that incorporates audiometric parameters as well as elements of psychoacoustics.

### Hearing level at the cochleas

Starting from the sound pressure at the audio receptors, we develop an analytic expression for the root-mean-square (rms) sound pressure at the cochleas.

Let the *sound pressure*\({p}_{\mathrm {a}}^{(m)}(t) \in \mathbb {R}\) in [Pa], *m*∈{left(l),right(*r*)}, be a time function at the *m*th audio receptor. Similarly, let \({p}_{\mathrm {b}}^{(m)}(t) \in \mathbb {R}\) serve as the sound pressure proxy for the vibratory stimulus at the *m*th mastoid corresponding to the dynamic force per surface area of the vibrator. Stacked together, the vector \(\boldsymbol {{p}}^{(m)}(t) \in \mathbb {R}^{2}\) has the form

Suppose *p*^{(m)}(*t*) is sampled at rate 1/*T* where *T* is the sampling time. The resulting discrete-time signal \(\boldsymbol {p}^{(m)}[\ell ] \in \mathbb {R}^{2 \times 1}\) at sampling instant *ℓ**T* corresponds to its continuous counterpart *p*^{(m)}(*t*) exactly if the sample rate meets the requirements of the sampling theorem [28]. The N-point discrete Fourier transform \(\boldsymbol {P}^{(m)}[k] \in \mathbb {C}^{2 \times 1}\), operated on each row of *p*^{(m)} has support on *k*=0,…,*N*−1. The discrete frequency index *k* is related to the continuous frequency *f* according to *k*=*f**T**N*.

The *normalized energy*, \({\mathcal {E}}^{(n,m)}_{0} \in \mathbb {R}\), *n*∈{l,r}, at the *n*th cochlea, caused by the sound pressure in the *m*th audio receptor, can be computed as follows (see, e.g., [29], chapter 3): the input spectrum *P*^{(m)}[*k*] is weighted with the diagonal calibration matrix \(\boldsymbol {C}^{(m)}[{k}] \in \mathbb {R}^{2 \times 2}\), processed by the hearing abstraction vector \(\boldsymbol {H}^{(n,m)}[{k}] \in \mathbb {R}^{1 \times 2}\) and normalized by the BC threshold \({A}^{(n)}_{\mathrm {b}}[{k}]\). The norm of the result, summed over the *B*-octave band around the center frequency index *k*_{0}, for the test tone leads to the desired quantity

The diagonal elements of the calibration matrix account for the sensitivity of the human ear as well as the attenuation in the connected hardware. The hearing abstraction vector

describes non-responsiveness of the hearing system for a stimulus, presented to the *m*th audio receptor, lower than the hearing threshold at the *n*th ear. The scenario illustrated in Fig. 1. The ratio *m*th AC threshold \({A}_{\mathrm {a}}^{(m)}[\!k]\) to *m*th BC threshold \({A}_{\mathrm {b}}^{(m)}[\!k]\) refers to as the *m*th air-bone gap *G*^{(m)}[ *k*] in (3). Some of the acoustic energy on the way to the inner left cochlea crosses the skull and becomes an interfering bone-conducted signal at the other cochlea. The ratio of acoustic energy at one cochlea to that at the other cochlea is commonly referred to as the *interaural AC attenuation*\({I}_{\mathrm {a}}[\!k] \in \mathbb {R}\) in (3). Analogously, \({I}_{\mathrm {b}}[\!k] \in \mathbb {R}\) is commonly denoted as the *interaural BC attenuation*.

The Parseval’s theorem [30] states that the Fourier transform conserves energy. Hence, the *rms sound pressure proxy at cochlea**n*, caused by audio receptor *m*, equals the square-root of the energy at the same cochlea in (2) divided by \(\sqrt {N}\), i.e.,

In matrix notation,

We are now ready to compute the *hearing level vector*\(\boldsymbol {L} \in \mathbb {R}^{2}\) at the cochleas, defined as

in decibel (dB). The signal and noise vectors are given by ** S**=

*Π*_{0}

**and**

*X***=**

*W*

*Π*_{0}(

**−**

*1***), respectively, where**

*X*

*Π*_{0}is defined in (5). The vector \(\boldsymbol {X} \triangleq \text {col} \{ x^{({\mathrm {l}})}, x^{({\mathrm {r}})} \}\) determines the sound class. In particular, its entry

*x*

^{(m)},

*m*∈{l,r}, reads

Conventionally, the reference sound pressure *p*_{ref} reads 20 *μ*Pa rms, corresponding to the lowest audible sound pressure at 1000 Hz that a young healthy individual ought to be able to perceive.

### Elements of psychoacoustics

Not only hearing loss but also psychoacoustics impacts the audiometric test procedure. In this contribution, we consider the parameters false alarm and missed detection, and mean response time.

#### False alarm and missed detection

To model errors in the human auditory system, let us point out the existing analogy of on-off keying (OOK) in digital communications. Suppose that bit one and bit zero correspond to the present waveform with signal energy \(\mathcal {E}_{\text {OOK}}\) and the absent, respectively. When the transmitted waveforms are exposed to additive white Gaussian noise with spectral density \({\mathcal {N}_{0}}\), the optimal non-coherent energy detector computes the energy of the received signal and compares the result with some *OOK threshold* *Θ*. The probabilities of mistaking a logic zero for a one, *ε*_{FA}, and a logic one for a zero, *ε*_{MD}, are given by [31]

respectively. Here, \(Q(a,b) = \int _{b}^{\infty } x I_{0}(ax) \exp \left \{-\left (a^{2}+x^{2}\right)/2 \right \} \mathrm {d} x\) is the Marcum Q-function with *I*_{0}(*x*) denoting the 0th-order modified Bessel-function of the first kind [31].

Let us move to pure-tone audiometry where appropriate stimuli and pauses are presented in alternating order to the ipsilateral ear. The basilar membrane within the cochlea extracts the frequencies of the stimuli in a non-coherent way [32] as long as their sound pressure is above the hearing threshold. With decreasing signal-to-noise ratio, the patient more likely misses the test tone. Under the hearing threshold, false alarms might occur. Hence, the patient responds to acoustic stimuli similar to what non-coherent OOK energy detection does. Following this approach, we add white Gaussian noise with particular density \({\mathcal {N}_{0}}\) to the hearing level vector in (6) and pass the result to an envelope detector that makes controlled errors *ε*_{FA} and *ε*_{MD}. We start with the spectral noise density. Substituting (8) for (9) with \(a \triangleq \sqrt {2 \mathcal {E}_{\text {OOK}}/{\mathcal {N}_{0}}}\) and \(b^{\star } \triangleq \sqrt {-2 \ln {\varepsilon }_{\text {FA}}}\), it follows

To obtain *a* and hence, \({\mathcal {N}_{0}}\), we could invert *Q*(*a*,*b*^{⋆}) in (10). This approach, however, is cumbersome. Instead, we use the iterative Newton-Raphson method, to find a fix point *a*=*a*^{⋆} satisfying *f*(*a*)=*ε*_{MD}−1+*Q*(*a*,*b*^{⋆})=0. Starting from *a*^{(0)}>0, the algorithm computes at iteration *i*+1

where

Since *f*^{′}(*a*)>0, *f*^{′′}(*a*)<0 and *a*>0, monotonic convergence to a fix point *a*^{⋆} is guaranteed. Ergo,

Substituting (13) for (8), it follows for the OOK threshold

We have developed an artificial patient that is capable of generating arbitrary false alarm and missed detection probabilities by self-adapting two parameters, namely the spectral noise density \({\mathcal {N}_{0}}\) in (13) and the OOK threshold in (14).

#### Mean reaction time

It has been shown in [11] that the mean reaction time *τ* of a patient can be modeled as the sum of fixed individual delay *τ*_{0} plus a variable component depending on the stimulus level. Based on the experimental results in [11], we propose the linear model

in seconds where ** L** is defined in (6). It can be seen that from clearly audible levels towards the threshold, hesitation will increase. Below the hearing threshold, the reaction time is set constant to the value which would occur at a hearing level of 0 dB.

### Hearing model

We have developed a multiple-input multiple-output system reading the input vector ** P** from the transducers and writing the output vector

**. This first system mimics hearing loss. Subsequent OOK system, operating in the log-domain, considers**

*L***as input vector that is distorted and delayed, to generate the hearing level vector \(\boldsymbol {Y} \triangleq \boldsymbol {Y}[\ell ] \in \mathbb {R}^{2}\) at the basilar membrane, i.e.,**

*L*The vector ** L** has energy \(\boldsymbol {\mathcal {E}}_{\text {OOK}}\). The vector \(\boldsymbol {N} \in \mathbb {R}^{2}\) contains additive white Gaussian samples with spectral density \({\mathcal {N}_{0}}\left (\mathcal {E}_{\text {OOK}}^{(n)}\right)\),

*n*∈{l,r} according to (13) under the assumption that errors occur independently at either cochlea. The second system mimics patient behavior.

## System design

The task is to design an AP that passes the received sound pressure in (1) through a filter and a distortion module, to generate the observation vector ** Y** in (16). The result is then demodulated in a non-coherent fashion and sent to a slicer, deciding “heard” or “not heard.” The AP does not know the center frequency of the presented sound, its class, or its signal energy.

In the beginning of a test, the AP randomly chooses a patient profile from the local patient database referred to as *DB* in Fig. 2, such as audiograms, interaural attenuation, missed detection probability *ε*_{MD}, false alarm probability *ε*_{FA}, individual response time *τ*_{0}, and calibration data. The audiograms along with custom interaural attenuation coefficients can be used, to compute the vector *H*^{(n,m)}[*k*] in (3). In the end of the preparation phase, the AP loads the calibration matrices *C*^{(m)}[*k*]. Vector *H*^{(n,m)}[*k*] and matrix *C*^{(m)}[*k*] are assumed to be quasi constant within one of the *K* frequency bands.

### Joint center frequency and signal energy estimation

We start with the computation of the matrix *Π*_{0}. Little is known about the form of the signal, but it is considered to be deterministic and confined to the frequency spectrum of bandwidth *B*, centered around one of the *K* audiometric frequencies. When noise in the received signal is Gaussian and additive with zero mean, the optimum non-coherent receiver [33] consists of *K* paths, each one a cascade of a *B*-octave bandpass filter with center frequency index *k*_{ν}, *ν*=1,…,*K*, followed by a square device and a integrator/summator computing the signal energy \(\mathcal {E}_{\nu } \triangleq \mathcal {E}_{\nu }({k}_{\nu })\). The receiver chooses the largest of the energies according to

Let us return to the specific model in (2) evaluated at frequency index *k*_{ν}, *ν*=1,…,*K*. When the corresponding elements \(\pi ^{(m,n)}_{\nu }\) in (5) are packed in a matrix, say \(\boldsymbol {\Pi }_{\nu } \triangleq \boldsymbol {\Pi }_{\nu }({k}_{\nu })\), the optimum energy detector chooses the matrix \(\hat {\boldsymbol {\Pi }}_{0}\) from the set of matrices *Π*_{ν} such that the amount of energy, or equivalently, the matrix two-norm, is maximum. Hence,

The proposed energy estimator is labeled as *maximization* in Fig. 2.

### Sound classification

We continue with estimating the components of the vector ** X** in (7), classifying the received sound. To find an appropriate estimation algorithm, that also works in real time, we exploit the fact that pure tone and noise mainly differ in their bandwidth. When the input signal has finite energy, we may define its rms bandwidth \(\mathcal {B}^{(m)}_{\text {rms}}\),

*m*∈{l,r}, as second normalized moment of the weighted input spectrum ∥

*C*^{(m)}[

*k*]

*P*^{(m)}[

*k*]∥

^{2}. In matrix-vector notation,

Here, \(\hat {{k}}_{0}\) is the frequency estimate provided by the energy estimator in (17). From (19), we readily obtain the estimate \(\hat {x}^{(m)}\) of the *m*th entry in \(\hat {\boldsymbol {X}}\), according to

By inserting the estimates \(\hat {\boldsymbol {X}}\) in (20) and \(\hat {\boldsymbol {\Pi }}_{0}\) in (18) into (6), we obtain an estimate for the hearing level estimate of the error-free patient, \(\hat {\boldsymbol {L}}\). The corresponding signal block in Fig. 2 is denoted as *hearing level generator*.

### Error injection

To control error, the block *noise injection* adds white Gaussian noise with density \({{\mathcal {N}_{0}}}^{(n)}\left (\hat {\mathcal {E}}_{\text {OOK}}^{(n)},{\varepsilon }/2\right)\) to the hearing level estimate \(\hat {L}^{(n)}\) at the *n*th cochlea according to (13). To find a guess \(\hat {\mathcal {E}}_{\text {OOK}}^{(n)}\) of \({\mathcal {E}}_{\text {OOK}}\) in a simple yet efficient way, we follow a recursive approach. At recursion *q*, we have

with initial value \(\hat {\mathcal {E}}^{(n)}_{\text {OOK},0} = 0\). The output signal is passed through a delay line, implementing (15). The delay line is labeled as *delay* in Fig. 2. It outputs the estimate \(\hat {\boldsymbol {Y}}\) of the observation vector ** Y** in (16).

### Detection of the pure tone

We shall introduce the concept of hypothesis testing in the context of pure-tone audiometry. The assumption that the pure tone is absent at the *n*th cochlea is denoted by the *null hypothesis*\(H^{(n)}_{0}\). The *alternative hypothesis*\(H^{(n)}_{1}\) is the counterpart to \(H^{(n)}_{0}\).

The optimum non-coherent detector forms the decision variable (*Y*^{(n)})^{2} and compares the result with the individual threshold *Θ*^{(n)}, defined in (14), to form the decision rule

The signal detector is sketched as a cascade of (·)^{2}, subtractor and slicer in Fig. 2

## Performance evaluation

### Experimental equipment

The performance of the proposed artificial patient strongly depends on the selected hardware which has to be chosen with care.

The material of the artificial head determines the maximum AC interaural attenuation between the left and the right audio channels that can be emulated at software level. Table 1 lists mean values for interaural attenuation at the common *K*=7 audiometric frequencies [4, 20]. It can be seen that the transmission loss through our artificial head has to be larger than 46 dB. Physically dense materials provide best sound attenuation. For example, an artificial head, made of 160 mm dense concrete with mass density *ρ*=2300 kg/m^{3} insulates 56 and 87 dB, respectively, at 250 and 8000 Hz (see [34], annex) so that above requirements are met. A prototype of the artificial head is shown in Fig. 3.

To test AC hearing, the settings consist of two mono-microphones in place of the ears, a two-channel charge amplifier, and an external sound card. The receptors for AC audiometry hearing shall be small so that they fit into the artificial head. Condenser microphones are generally smaller and more sensitive than dynamic (moving coil and ribbon) microphones but have higher self-noise and also require phantom power. A compromise between sensitivity and self-noise is the 1/2-inch all-titanium condenser microphone 4955 by Brüel & Kjær with IEEE 1451.4 Transducer Electronic Data Sheets (TEDS), requiring 200 V polarization voltage and 14 V phantom voltage. The temperature coefficient is ± 0.01 dB/°C. The A-weighted sound pressure level of self-noise is specified as 6.5 dB. This low self-noise makes it possible to handle patient profiles with hearing level down to 0 dB. A LEMO 7-pin plug connects the microphones with the 2690 NEXUS charge amplifier by Brüel & Kjær. This amplifier not only enhances the electrical signal but also provides the microphones with phantom power and polarization voltage through the same LEMO 7-pin plug. The frequency response of any microphone depends on altitude *and* temperature [35]. Though the application is indoors, a digital temperature sensor, using Maxim’s 1-wire technology, is deployed, to monitor the ambient temperature.

To test BC hearing, a skull simulator mimics the load characteristic of the human skull. The core piece of the skull simulator is a piezoelectric accelerator. We deploy the skull simulator SKS10 by Interacoustics A/S. Putner et al. show in [36] that such kind of skull simulator has high linearity of 2% full scale and total harmonic distortion of less than 0.6% in the frequency range of interest. Note that only one ear is BC tested at a time. Hence, only *one* skull simulator is subsequently realized optionally for both ears. As cross-talk is absent at hardware level, any BC interaural attenuation can be emulated at software level.

The sound card samples the audio signal at rate *R* and passes the result to an A/D converter, operating at *Q* bit-depth. When the quantization error is uniformly distributed between [− 0.5 + 0.5] LSB, the dynamic range is related to the number of quantization bits as *D*=20 log10(2^{Q}) in dB [29]. Hence, a typical dynamic range of *D*=120 dB leads to a minimum bit-depth of *Q*=20. Based on above requirements, we have chosen the 24-bit X-Fi sound blaster by Creative. Its standardized mono sampling rate is set to *R*=22050 S/s, supporting a Nyquist rate of 11025 Hz.

Ambient noise might influence quality of pure-tone audiometry [14, 37]. To monitor background noise in the audiometric test room, an environmental microphone is used along with a low-noise pre-amplifier, and the internal sound card of the computer. The relaxed requirements on this type of microphone allow us to deploy the low-cost ECM8000 model by Behringer.

Table 2 summarizes the measurement equipment. A relay gives the patient feedback signal. The system design software LabVIEW allows for real-time implementation of targets and hence is ideally suited for the proposed AP. The database is managed by MySQL.

### Calibration

To ensure reproducible hearing test results, the audiometric test room has to be sufficiently quiet, the test equipment needs to be certified, and our AP needs to be calibrated. Calibration took place in an ISO-certified test room at the University Medical Center of Utrecht, Netherlands. The test equipment is composed of a Decos, AudioNigma audiometer, a pair of Sennheiser HDA 200 supra-aural earphones, and a Radioear B-71 bone vibrator. All of them meet the ISO-389 standard.

To eliminate the influence of the hardware on the received sound pressure spectrum, our AP is calibrated as follows: first, the AP is initialized with \({A}_{\mathrm {a}}^{({\mathrm {l}})}[{k}] = {A}_{\mathrm {a}}^{({\mathrm {r}})}[{k}]={A}_{\mathrm {b}}^{({\mathrm {l}})}[{k}] = {A}_{\mathrm {b}}^{({\mathrm {r}})}[{k}]=1\), *I*_{a}[*k*]=*I*_{b}[*k*]=0, *ε*_{MD}=*ε*_{FA}=0, corresponding to loss-less “hearing,” ideal isolation, and full cooperation, respectively. Then, a pure tone is presented to the *n*th audio channel at frequency *f*_{0}=1 kHz and hearing level of \(L_{0}^{(n)} = 40\) dB, and our AP responds at hearing level \(L^{(n)} \neq L_{0}^{(n)}\) in (6). Hence, the *n*th entry in the weighting matrix *C*^{(n)}[*k*] can be computed as \(\boldsymbol {C}^{(n)}[k]=10^{\left (L_{0}^{(n)} -L^{(n)}\right)/20}\). This procedure is repeated for the audiometric center frequencies 250,500,2000,4000, and 8000 Hz and the other audio channel. All calibration data is stored in a database managed by MySQL.

### Numerical examples

In the subsequent experiments, we verify the functionality of our artificial patient from Section 3 with the equipment and the calibration procedure described in Section 4.2. The signal bandwidth *B* is set equal *B*=1/3, i.e., one third of an octave, and the FFT-size is *N*=882 corresponding to a minimal processing delay of *τ*=40 ms. A comparison with other simulators has been made as well: the simple AP in [26], dubbed *AP-S,* and AudsimFlex [21], representing a variety of software-based commercially available patient simulators with similar features. Unless noted otherwise, the test tone is presented to the left microphone.

In a first experiment, we re-tested the AP in a silent room at Pisa University allowing us audiometric measurements down to a hearing level of close to 0 dB. The AP emulates a normally hearing person without making errors, i.e., \({A}_{\mathrm {a}}^{({\mathrm {l}})}[{k}] = {A}_{\mathrm {a}}^{({\mathrm {r}})}[{k}]={A}_{\mathrm {b}}^{({\mathrm {l}})}[{k}] = {A}_{\mathrm {b}}^{({\mathrm {r}})}[{k}]=1\) and *ε*_{MD}=*ε*_{FA}=0. The measurement error \(L_{0}^{({\mathrm {l}})} - L^{({\mathrm {l}})}\) in dB as a function of the hearing level \(L_{0}^{({\mathrm {l}})}\) in dB is plotted in Fig. 4. For AC testing, we have tested the audiometric frequecies *f*_{0}=[250,500,1000,2000,4000,8000] Hz. It can be seen that the error of our AP is confined to a range of ± 2 dB while that of AP-S is twice as large. Furthermore, the sensitivity of our AP is fixed at a hearing level of 10 dB while that in [26] is limited to a hearing level of 30 dB. The improved accuracy is mainly caused by the fact that our receiver architecture is based on a theoretical framework, namely joint maximum likelihood parameter estimation while that of AP-S is of heuristic nature. The boost in sensitivity, however, is mainly caused by the improved noise figure of our microphones. Specifically, self-noise of the microphone 4955 by Brüel & Kjær, incorporated in our AP, is 18.5 dB lower than that of the Røde Lavalier microphone in the AP of [26]. The competing AudsimFlex does not depend on real signals, and hence, it is omitted from the plot. For BC testing, we have tested the audiometric frequecies *f*_{0}=[500,1000,2000,4000] Hz. It can be seen that the measurement error of our AP is confined to ± 2 dB, as well. Our AP and the AP-S have very similar sensitivity as both emulators employ the same skull simulator SKS10 by Interacoustics.

Let us test AC hearing with narrow-band masking. To demonstrate the plateau method [38], we have chosen the parameters *f*_{0}=1 kHz, masked hearing threshold levels of \({A}_{\mathrm {a}}^{({\mathrm {l}})}[100]=65\) dB, \({A}_{\mathrm {a}}^{({\mathrm {r}})}[100]=20\) dB, \({A}_{\mathrm {b}}^{({\mathrm {l}})}[100]=5\) dB, \({A}_{\mathrm {b}}^{({\mathrm {r}})}[100]=0\) dB. Moreover, *I*_{a}[100]=50 dB, *I*_{b}[100]=0 dB, and *ε*_{FA}=*ε*_{MD}=0. The masking diagram is shown Fig. 5. It can be seen that with increasing masking intensity, the apparent threshold raises to a hearing loss of 65 dB, as the tone is picked up by the right ear. When masking noise intensity is further increased, our AP responds with a plateau until the test tone is picked up by the left ear. Beyond this point masking noise spills into the left ear, raising the hearing threshold again. The width of the plateau is about 15 dB. Neglecting the central masking effect [39], the desired hearing threshold must be located at a hearing level of 65 dB, too. The AP-S, in contrast, is incapable of handling the plateau method, mainly because the underlying sound pressure matrix *Π*_{0} is rank deficient. The ideal patient simulator AudsimFlex suggests a plateau at a hearing level of 70 dB. Note that AudsimFlex accounts for the central masking effect, causing a threshold shift in the test ear by 1 dB.

Figure 6 reports the error rate at the actuator as a function of the individual signal-to-noise ratio \(\gamma \triangleq \mathcal {E}_{\text {OOK}}^{({\mathrm {l}})}\left / \left ({\mathcal {N}_{0}}\left (\mathcal {E}_{\text {OOK}}^{({\mathrm {l}})}\right)\right)\right.\). The test tone is located at *f*_{0}=1 kHz. The AP emulates a normally hearing person but makes errors, i.e., \({A}_{\mathrm {a}}^{({\mathrm {l}})}[{k}] = {A}_{\mathrm {a}}^{({\mathrm {r}})}[{k}]= {A}_{\mathrm {b}}^{({\mathrm {l}})}[{k}] = {A}_{\mathrm {b}}^{({\mathrm {r}})}[{k}]=1\) and *ε*_{MD}=*ε*_{FA}∈{10,3,1,0.3, and 0.1*%*}. To obtain this plot, we measured the individual error rates *ε*^{(l)} and *ε*^{(r)} at the left and right detectors in Fig. 2, corresponding to the number of incorrect individual decisions divided by the total number of processed data frames. As noise in the individual detectors is uncorrelated, it follows for the total error *ε*=1−(1−*ε*^{(l)})(1−*ε*^{(r)})≈*ε*^{(l)}+*ε*^{(r)}. For comparison purposes, the theoretical bound *P*_{b} in [40] for *D*=0.5,

has also been added to the plot. When the diagnostic technique follows the recommendation by the British Society of Audiology [41], implying a value of *D*=0.5, it can be seen that the error rate is located slightly under the bound. For *D*=0.8 (short pauses), the measured curve follows accurately the bound. For *D*=0.2 (long pauses), the AP generates roughly 2.5 times less errors than anticipated. Generally speaking, the AP is capable of reproducing all target error rates. The competing AP-S and AudsimFlex patient simulators do not consider patient behavior and hence cannot be compared with.

In the last experiment, we measure the reaction time *τ* of ten emulated patients as a function of the hearing level *L*^{(l)}. Each patient represents a normally hearing person with individual delay *τ*_{0}=190 ms [11] and no errors, i.e., *ε*_{MD}=*ε*_{FA}=0. The center frequency *f*_{0} of the test tone has been chosen randomly. The corresponding box-and-whisker plot is shown in Fig. 7. The body of the box represents the interquartile range (IQR). The whisker length is 1.5 times the IQR. It can be seen that the IQR ranges from 6 to 8 ms. Note that the operating system in our PC is soft real-time, implying that the response time of any running task is essentially random. A line graph showing the measured median reaction time \(\bar {\tau }\) overlays the box-and-whisker plot. Starting from \(\bar {\tau } = 270\) ms at a hearing level of 20 dB, the measured curve decreases down to \(\bar {\tau } = 210\) ms at a hearing level of 80 dB. This result is inline with the model in (15) and also with the experimental results in [11] aside *f*_{0}=250 Hz where our AP responds a little faster in the low hearing level regime. Finally, we want to point out that the individual delay *τ*_{0} corresponds to a vertical shift of the curve in the figure.

## Conclusions

In this paper, we proposed a multiple-input multiple-output audiometric system model, comprising ipsilateral and contralateral hearing thresholds, ipsilateral hearing loss as well as elements from psychoacoustics such as false alarm, missed detection, and individual response time. This model was then used to realize an artificial patient in hardware within a theoretical framework, operating in real time and in real environments. The application software is based on LabVIEW. The patient profiles are stored in a database, managed by MySQL. First measurement results indicate that the proposed artificial patient is able to handle air-conducting and bone-conducting signals with auditory masking over a wide range of stimulus intensities and a controlled false alarm behavior. The hardware implementation of our artificial patient makes it possible to assess practitioner’s expertise in real time and real environments. More features will be added to the artificial patient in future.

## References

- 1
World Health Organization, Fact sheet 300—deafness and hearing loss. http://www.who.int/mediacentre/factsheets/fs300/en/. Accessed: 26 Apr 2017. http://www.who.int/mediacentre/factsheets/fs300/en/.

- 2
(HK Walker, WD Hall, JW Hurst, eds.),

*Clinical methods: the history, physical, and laboratory examinations, 3rd edn.*(Butterworths, Boston, 1990). Chap. 133. - 3
RJ Roeser, M Valente, H Hosford-Dunn,

*Audiology diagnosis*(Thieme, Germany, 2007). - 4
M Nolan, DJ Lyon, Transcranial attenuation in bone-conduction audiometry. J. Laryngol. Otol.

**95:**, 597–608 (1981). - 5
ISO 389 series,

*Acoustics—reference zero for the calibration of audiometric equipment*(International Organization for Standardization, Geneva, Switzerland, 2017). https://www.iso.org/ics/13.140/x/. - 6
GT Fechner,

*Elemente der Psychophysik [elements of psychophysics]*(Breitkopf und Härtel, Leipzig, 1860). - 7
H Levitt, Transformed up-down methods in psychoacoustics. J. Acoust. Soc. Am.

**49:**, 467–477 (1971). - 8
R Carhart, JF Jerger, Preferred method for clinical determination of pure-tone thresholds. J. Speech Hear. Disord.

**24:**, 330–345 (1959). - 9
(JE Peck, ed.),

*Pseudohypacusis: false and exaggerated hearing loss*(Plural Publishing Inc., San Diego, 2011). - 10
JL Fozard, M Vercryssen, SL Reynolds, PA Hancock, RE Quilter, Age differences and changes in reaction time: the Baltimore longitudinal study of aging. J. Gerontol. Psychol. Sci.

**49**(4), 179–189 (1994). - 11
L Marshall, JF Brandt, The relationship between loudness and reaction time in normal hearing listeners. Acta Otolaryngol.

**90:**, 244–249 (1980). - 12
A Recio-Spinoso, NP Cooper, Masking of sounds by a background noise—cochlear mechanical correlates. J. Physiol.

**591**(10), 2705–2721 (2013). https://doi.org/10.1113/jphysiol.2012.248260. - 13
ISO 8253-1,

*Basic pure tone air and bone conduction threshold audiometry*(International Organization for Standardization, Geneva, Switzerland, 2010). https://www.iso.org/ics/13.140/x/. - 14
ACS Kam, LKC Li, KNK Yeung, W Wu, Z Huang, H Wu, MCF Tong, Automated hearing screening for preschool children. SAGE J. Med. Screen.

**21**(2), 71–75 (2014). - 15
P Parent, JB Allen, Time-domain “wave” model of the human tympanic membrane. Hear. Res.

**263:**, 152–167 (2010). - 16
JM Kates, KH Arehart, The hearing-aid quality speech index (HASQI). J. Audio Eng. Soc.

**58**(5), 363–381 (2010). - 17
JM Kates, KH Arehart, The hearing-aid speech quality index (HASQI) version 2. J. Audio Eng. Soc.

**62**(3), 99–117 (2014). - 18
Y Soeta, Y Ando,

*2. Signal processing model of human auditory system*(Springer, Tokyo, 2015). - 19
TA Hamill,

*Making masking manageable*(CreateSpace Independent Publishing Platform, North Charleston, 2016). mbook. - 20
KJ Brännström, J Lantz, Interaural attenuation for sennheiser HDA 200 circumaural earphones. Int. J. Audiol.

**49:**, 467–471 (2010). - 21
Nova Southeastern University, FL, USA, AudSim Flex (2015). http://audsim.com. Accessed 26 Apr 2017.

- 22
Innoforce Est., Liechtenstein, Otis—the virtual patient (2013). http://www.innoforce.com. Accessed 26 Apr 2017.

- 23
Parrot Software, Audiology Clinic (2009). https://www.parrotsoftware.com/shop/audiology.htm. Accessed 06 Dec 2017.

- 24
J Round, E Conradi, T Poulton, Training staff to create simple interactive virtual patients: the impact on a medical and healthcare institution. Med. Teach.

**31**(8), 764–769 (2009). - 25
A Heitz,

*Improving clinical education through the use of virtual patient-based computer simulations. PhD thesis, University of Canterbury*, (New Zealand, 2013). https://ir.canterbury.ac.nz/xmlui/handle/10092/8193. - 26
A Kocian, S Chessa, W Grolman, in

*Proc. Symposium on Computers and Communication, ISCC 2016*. Development and realization of an artificial patient with hearing impairment (IEEEMessina, 2016), pp. 760–765. https://doi.org/10.1109/ISCC.2016.7543828. - 27
E Szopos, I Saracut, C Farcas, M Neag, M Topa, in

*Proc. of Int. Symposium on Signals, Circuits and Systems (ISSCS 2015)*. IIR filter synthesis based on real-coded genetic algorithm for human audiogram emulation (IEEEIasi, 2015), pp. 1–4. - 28
AJ Jerri, The Shannon sampling theorem—its various extensions and applications: a tutorial review. Proc. IEEE.

**65**(11), 1565–1596 (1977). - 29
(JG Proakis, ed.),

*Digital communications, 3rd edn.*(McGraw-Hill Inc., New York, 1995). - 30
Parseval des Chênes, Mémoire sur les séries et sur l’intégration complète d’une équation aux différences partielles linéaire du second ordre, à coefficients constants. Mém. présent. Inst. Sci. Lett. Arts Divers savans lus assem., Sci. Math. Phys.

**1:**, 638–648 (1806). in French. - 31
M Schwartz, WR Bennett, S Stein,

*Communication systems and techniques*(IEEE Press, New York, 1996). - 32
PC Loizou, Mimicking the human ear. IEEE Signal Process Mag.

**15**(5), 101–130 (1998). https://doi.org/10.1109/79.708543. - 33
H Urkowitz, Energy detection of unknown deterministic signals. Proc. IEEE.

**55:**, 523–531 (1967). - 34
M Vorländer,

*Auralization*(Springer, Berlin, 2007). - 35
AJ Campanella, Reference sound source calibration at various temperatures and site altitudes. J. Acoust. Soc. Am.

**108:**, 2551 (2000). - 36
J Putner, PB Grams, H Fastl, in

*Tagungsband Fortschritte der Akustik - AIA-Tagungsband Fortschritte der Akustik - DAGA 2013, Meran, Italy*. Nonlinear behavior of piezoelectric accelerometers (DEGAMerano, 2013), pp. 63–64. - 37
AK Abraham, C Jain, L Yashaswini, Effect of ambient noise on pure tone hearing screening test conducted in Indian rudal locations. J. Indian Inst. Speech Hear.

**35:**, 58–65 (2016). - 38
JD Hood, Principles and practices of bone-conduction audiometry. Laryngoscope.

**70:**, 1211–1228 (1960). - 39
AL McQueen, JG Terhune, Central masking: fact or artifact?New Sch. Soc. Res.

**9**(1), 15–20 (2011). - 40
JM Geist, Asymptotic error rate behavior for noncoherent on-off keying. IEEE Trans. Comm.

**42**(2/3/4), 225 (1994). - 41
British Society of Audiology, pure-tone air-conduction and bone-conduction threshold audiometry with and without masking. Guidelines, British Society of Audiology (2015). Accessed: 20 July 2017. http://www.thebsa.org.uk/resources/pure-tone-air-bone-conduction-threshold-audiometry-without-masking/.

### Funding

This project was a third-party activity (ital. “conto terzi”) funded in part by the University Medical Center Utrecht, The Netherlands.

### Availability of data and materials

Not applicable to this article as no datasets were generated or analyzed during the current study.

## Author information

### Affiliations

### Contributions

AK developed the system model and realized the system. AK, GC, and SC conceived and designed the experiments. WG provided conception of the original idea, materials and analysis tools. AK wrote the paper. All authors read and approved the final manuscript.

### Corresponding author

Correspondence to Alexander Kocian.

## Ethics declarations

### Ethics approval and consent to participate

Not applicable.

### Consent for publication

Not applicable.

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## About this article

### Cite this article

Kocian, A., Cattani, G., Chessa, S. *et al.* An artificial patient for pure-tone audiometry.
*J AUDIO SPEECH MUSIC PROC.* **2018, **8 (2018). https://doi.org/10.1186/s13636-018-0131-y

Received:

Accepted:

Published:

### Keywords

- Pure-tone audiometry
- Real time
- Signal processing
- Noise injection
- On-off keying