In this section, we derive a digital system model that incorporates audiometric parameters as well as elements of psychoacoustics.

### 2.1 Hearing level at the cochleas

Starting from the sound pressure at the audio receptors, we develop an analytic expression for the root-mean-square (rms) sound pressure at the cochleas.

Let the *sound pressure*\({p}_{\mathrm {a}}^{(m)}(t) \in \mathbb {R}\) in [Pa], *m*∈{left(l),right(*r*)}, be a time function at the *m*th audio receptor. Similarly, let \({p}_{\mathrm {b}}^{(m)}(t) \in \mathbb {R}\) serve as the sound pressure proxy for the vibratory stimulus at the *m*th mastoid corresponding to the dynamic force per surface area of the vibrator. Stacked together, the vector \(\boldsymbol {{p}}^{(m)}(t) \in \mathbb {R}^{2}\) has the form

$$ \boldsymbol{{p}}^{(m)}(t) \triangleq \text{col} \left\{ {p}_{\mathrm{{a}}}^{(m)}(t), {p}_{\mathrm{b}}^{(m)}(t) \right\}. $$

(1)

Suppose *p*^{(m)}(*t*) is sampled at rate 1/*T* where *T* is the sampling time. The resulting discrete-time signal \(\boldsymbol {p}^{(m)}[\ell ] \in \mathbb {R}^{2 \times 1}\) at sampling instant *ℓ**T* corresponds to its continuous counterpart *p*^{(m)}(*t*) exactly if the sample rate meets the requirements of the sampling theorem [28]. The N-point discrete Fourier transform \(\boldsymbol {P}^{(m)}[k] \in \mathbb {C}^{2 \times 1}\), operated on each row of *p*^{(m)} has support on *k*=0,…,*N*−1. The discrete frequency index *k* is related to the continuous frequency *f* according to *k*=*f**T**N*.

The *normalized energy*, \({\mathcal {E}}^{(n,m)}_{0} \in \mathbb {R}\), *n*∈{l,r}, at the *n*th cochlea, caused by the sound pressure in the *m*th audio receptor, can be computed as follows (see, e.g., [29], chapter 3): the input spectrum *P*^{(m)}[*k*] is weighted with the diagonal calibration matrix \(\boldsymbol {C}^{(m)}[{k}] \in \mathbb {R}^{2 \times 2}\), processed by the hearing abstraction vector \(\boldsymbol {H}^{(n,m)}[{k}] \in \mathbb {R}^{1 \times 2}\) and normalized by the BC threshold \({A}^{(n)}_{\mathrm {b}}[{k}]\). The norm of the result, summed over the *B*-octave band around the center frequency index *k*_{0}, for the test tone leads to the desired quantity

$$ {\mathcal{E}}^{(n,m)}_{0} = \frac{2}{N} \sum_{{k}= \lfloor 2^{-{B}/2} {k}_{0} \rfloor+1}^{\lfloor 2^{+{B}/2} {k}_{0} \rfloor} \left| \frac{\boldsymbol{H}^{(n,m)}[\!k] \boldsymbol{C}^{(m)}[\!k] \boldsymbol{P}^{(m)}[\!k]}{{A}_{\mathrm{b}}^{(n)}[\!k]} \right|^{2}. $$

(2)

The diagonal elements of the calibration matrix account for the sensitivity of the human ear as well as the attenuation in the connected hardware. The hearing abstraction vector

$$ \boldsymbol{H}^{(n,m)}[\!k] \triangleq \left\{ \begin{array}{lccrr} \big[ & {G}^{(m)}[\!k] & 1 &\big], & m = n \\ \big[ & {I}_{\mathrm{a}}[\!k] & {I}_{\mathrm{b}}[\!k] & \big], & m \neq n \end{array}, \right. $$

(3)

describes non-responsiveness of the hearing system for a stimulus, presented to the *m*th audio receptor, lower than the hearing threshold at the *n*th ear. The scenario illustrated in Fig. 1. The ratio *m*th AC threshold \({A}_{\mathrm {a}}^{(m)}[\!k]\) to *m*th BC threshold \({A}_{\mathrm {b}}^{(m)}[\!k]\) refers to as the *m*th air-bone gap *G*^{(m)}[ *k*] in (3). Some of the acoustic energy on the way to the inner left cochlea crosses the skull and becomes an interfering bone-conducted signal at the other cochlea. The ratio of acoustic energy at one cochlea to that at the other cochlea is commonly referred to as the *interaural AC attenuation*\({I}_{\mathrm {a}}[\!k] \in \mathbb {R}\) in (3). Analogously, \({I}_{\mathrm {b}}[\!k] \in \mathbb {R}\) is commonly denoted as the *interaural BC attenuation*.

The Parseval’s theorem [30] states that the Fourier transform conserves energy. Hence, the *rms sound pressure proxy at cochlea**n*, caused by audio receptor *m*, equals the square-root of the energy at the same cochlea in (2) divided by \(\sqrt {N}\), i.e.,

$$ {\pi}^{(n,m)}_{0} = \sqrt{\frac{1}{N}{\mathcal{E}}^{(n,m)}_{0}}. $$

(4)

In matrix notation,

$$ \boldsymbol{\Pi}_{0} \triangleq \left[ \begin{array}{cc} {\pi}^{({\mathrm{l}},{\mathrm{l}})}_{0} & {\pi}^{({\mathrm{l}},{\mathrm{r}})}_{0} \\ {\pi}^{({\mathrm{r}},{\mathrm{l}})}_{0} & {\pi}^{({\mathrm{r}},{\mathrm{r}})}_{0} \end{array}\right]. $$

(5)

We are now ready to compute the *hearing level vector*\(\boldsymbol {L} \in \mathbb {R}^{2}\) at the cochleas, defined as

$$ \boldsymbol{L} \triangleq 20 \log_{10} \boldsymbol{S} - 20 \log_{10} \max \left(\boldsymbol{W}, \left[ \begin{array}{c} {p}_{\text{ref}} \\ {p}_{\text{ref}} \end{array}\right]\right) $$

(6)

in decibel (dB). The signal and noise vectors are given by *S*=*Π*_{0}*X* and *W*=*Π*_{0}(*1*−*X*), respectively, where *Π*_{0} is defined in (5). The vector \(\boldsymbol {X} \triangleq \text {col} \{ x^{({\mathrm {l}})}, x^{({\mathrm {r}})} \}\) determines the sound class. In particular, its entry *x*^{(m)}, *m*∈{l,r}, reads

$$ x^{(m)} = \left\{ \begin{array}{lcl} 1 &; & \text{Pure tone} \\ 0 &; & \text{(Masking) noise} \end{array}. \right. $$

(7)

Conventionally, the reference sound pressure *p*_{ref} reads 20 *μ*Pa rms, corresponding to the lowest audible sound pressure at 1000 Hz that a young healthy individual ought to be able to perceive.

### 2.2 Elements of psychoacoustics

Not only hearing loss but also psychoacoustics impacts the audiometric test procedure. In this contribution, we consider the parameters false alarm and missed detection, and mean response time.

#### 2.2.1 False alarm and missed detection

To model errors in the human auditory system, let us point out the existing analogy of on-off keying (OOK) in digital communications. Suppose that bit one and bit zero correspond to the present waveform with signal energy \(\mathcal {E}_{\text {OOK}}\) and the absent, respectively. When the transmitted waveforms are exposed to additive white Gaussian noise with spectral density \({\mathcal {N}_{0}}\), the optimal non-coherent energy detector computes the energy of the received signal and compares the result with some *OOK threshold* *Θ*. The probabilities of mistaking a logic zero for a one, *ε*_{FA}, and a logic one for a zero, *ε*_{MD}, are given by [31]

$$\begin{array}{@{}rcl@{}} {\varepsilon}_{\text{FA}} & = & \exp \left(- {\Theta}^{2}/{\mathcal{N}_{0}} \right), \end{array} $$

(8)

$$\begin{array}{@{}rcl@{}} {\varepsilon}_{\text{MD}} & = & 1 - Q\left(\sqrt{2 \mathcal{E}_{\text{OOK}}/{\mathcal{N}_{0}}}, \sqrt{2{\Theta}^{2}/{\mathcal{N}_{0}}}\right), \end{array} $$

(9)

respectively. Here, \(Q(a,b) = \int _{b}^{\infty } x I_{0}(ax) \exp \left \{-\left (a^{2}+x^{2}\right)/2 \right \} \mathrm {d} x\) is the Marcum Q-function with *I*_{0}(*x*) denoting the 0th-order modified Bessel-function of the first kind [31].

Let us move to pure-tone audiometry where appropriate stimuli and pauses are presented in alternating order to the ipsilateral ear. The basilar membrane within the cochlea extracts the frequencies of the stimuli in a non-coherent way [32] as long as their sound pressure is above the hearing threshold. With decreasing signal-to-noise ratio, the patient more likely misses the test tone. Under the hearing threshold, false alarms might occur. Hence, the patient responds to acoustic stimuli similar to what non-coherent OOK energy detection does. Following this approach, we add white Gaussian noise with particular density \({\mathcal {N}_{0}}\) to the hearing level vector in (6) and pass the result to an envelope detector that makes controlled errors *ε*_{FA} and *ε*_{MD}. We start with the spectral noise density. Substituting (8) for (9) with \(a \triangleq \sqrt {2 \mathcal {E}_{\text {OOK}}/{\mathcal {N}_{0}}}\) and \(b^{\star } \triangleq \sqrt {-2 \ln {\varepsilon }_{\text {FA}}}\), it follows

$$ {\varepsilon}_{\text{MD}} = 1 - Q\left(a, b^{\star}\right). $$

(10)

To obtain *a* and hence, \({\mathcal {N}_{0}}\), we could invert *Q*(*a*,*b*^{⋆}) in (10). This approach, however, is cumbersome. Instead, we use the iterative Newton-Raphson method, to find a fix point *a*=*a*^{⋆} satisfying *f*(*a*)=*ε*_{MD}−1+*Q*(*a*,*b*^{⋆})=0. Starting from *a*^{(0)}>0, the algorithm computes at iteration *i*+1

$$ a^{(i+1)} = a^{(i)} - \frac{f\left(a^{(i)}\right)}{f^{\prime}\left(a^{(i)}\right)} $$

(11)

where

$$ f^{\prime}(a) \triangleq \frac{\mathrm{d}}{\mathrm{d} a}f(a) = \left(\frac{b^{\star}}{a} \right)^{1/2} e^{-a - b{^{\star}}} I_{0}\left(2 \sqrt{a b{^{\star}}}\right). $$

(12)

Since *f*^{′}(*a*)>0, *f*^{′′}(*a*)<0 and *a*>0, monotonic convergence to a fix point *a*^{⋆} is guaranteed. Ergo,

$$ {\mathcal{N}_{0}}\left(\mathcal{E}_{\text{OOK}}\right) = \frac{2 \mathcal{E}_{\text{OOK}}}{{a^{\star}}^{2}}. $$

(13)

Substituting (13) for (8), it follows for the OOK threshold

$$ {\Theta}(\mathcal{E}_{\text{OOK}}) = \frac{b^{\star}}{a^{\star}}\sqrt{\mathcal{E}_{\text{OOK}}}. $$

(14)

We have developed an artificial patient that is capable of generating arbitrary false alarm and missed detection probabilities by self-adapting two parameters, namely the spectral noise density \({\mathcal {N}_{0}}\) in (13) and the OOK threshold in (14).

#### 2.2.2 Mean reaction time

It has been shown in [11] that the mean reaction time *τ* of a patient can be modeled as the sum of fixed individual delay *τ*_{0} plus a variable component depending on the stimulus level. Based on the experimental results in [11], we propose the linear model

$$ \tau = \tau_{0} + \frac{110- \max \{ \boldsymbol{L}, \boldsymbol{0} \} }{1000} $$

(15)

in seconds where *L* is defined in (6). It can be seen that from clearly audible levels towards the threshold, hesitation will increase. Below the hearing threshold, the reaction time is set constant to the value which would occur at a hearing level of 0 dB.

### 2.3 Hearing model

We have developed a multiple-input multiple-output system reading the input vector *P* from the transducers and writing the output vector *L*. This first system mimics hearing loss. Subsequent OOK system, operating in the log-domain, considers *L* as input vector that is distorted and delayed, to generate the hearing level vector \(\boldsymbol {Y} \triangleq \boldsymbol {Y}[\ell ] \in \mathbb {R}^{2}\) at the basilar membrane, i.e.,

$$ \boldsymbol{Y} = \left\{ \begin{array}{lcr} (\boldsymbol{L} + \boldsymbol{N}) & ; & \ell \geq \lfloor \tau/T \rfloor \\ \boldsymbol{0} & ; & \text{otherwise} \end{array} \right.. $$

(16)

The vector *L* has energy \(\boldsymbol {\mathcal {E}}_{\text {OOK}}\). The vector \(\boldsymbol {N} \in \mathbb {R}^{2}\) contains additive white Gaussian samples with spectral density \({\mathcal {N}_{0}}\left (\mathcal {E}_{\text {OOK}}^{(n)}\right)\), *n*∈{l,r} according to (13) under the assumption that errors occur independently at either cochlea. The second system mimics patient behavior.