- Methodology
- Open access
- Published:
Physics-constrained adaptive kernel interpolation for region-to-region acoustic transfer function: a Bayesian approach
EURASIP Journal on Audio, Speech, and Music Processing volume 2024, Article number: 43 (2024)
Abstract
A kernel interpolation method for the acoustic transfer function (ATF) between regions constrained by the physics of sound while being adaptive to the data is proposed. Most ATF interpolation methods aim to model the ATF for fixed source by using techniques that fit the estimation to the measurements while not taking the physics of the problem into consideration. We aim to interpolate the ATF for a region-to-region estimation, meaning we account for variation of both source and receiver positions. By using a very general formulation for the reproducing kernel function, we have created a kernel function that considers both directed and residual fields as two separate kernel functions. The directed field kernel considers a sparse selection of reflective field components with large amplitudes and is formulated as a combination of directional kernels. The residual field is composed of the remaining densely distributed components with lower amplitudes. Its kernel weight is represented by a universal approximator, a neural network, in order to learn patterns from the data freely. These kernel parameters are learned using Bayesian inference both under the assumption of Gaussian priors and by using a Markov chain Monte Carlo simulation method to perform inference in a more directed manner. We compare all established kernel formulations with each other in numerical simulations, showing that the proposed kernel model is capable of properly representing the complexities of the ATF.
1 Introduction
Predicting how an environment alters a sound wave propagating within it is a complex problem with no closed-form solution. The effects that scatterers, the boundaries of the environment, and other such environmental factors can have on sound waves are not fully understood. Assuming the environment has time-invariant acoustic characteristics and alters sound linearly, we can model these changes rather effectively by studying the behavior of the room impulse response (RIR) between any source and receiver positions in the environment, and specifically its frequency response the acoustic transfer function (ATF), which has interesting physical properties we can utilize for interpolation. Our main concern in this paper is the ATF for variable source and receiver positions within the regions measured at a discrete set of points.
Most current methods for ATF and RIR interpolation are limited to point-to-point, meaning single source and receiver, and point-to-region, meaning single source with variable receiver positions [1]. Examples of such methods would be considering ATFs to represent standard linear time-invariant poles/zeros systems [2, 3], approximate them using elementary wave functions [4,5,6], embedding physical properties using physics-informed neural networks [7] and reconstructing RIRs from data by exploiting spatial patterns and similarities using data-driven universal approximators [8, 9]. These methods do not account for source position variation and impose restrictions to the environment that are not broadly applicable, such as requiring big data sets with specific sampling point distributions. We aim to create an ATF model that can be employed broadly.
Thus, our objective is to create a region-to-region ATF interpolation approach that is constrained by the laws of physics. Specifically, we want to construct a linear estimator that always satisfies the Helmholtz equation [10]. In [11], such an estimator was proposed by considering the ATF to be the superposition between a direct and a reverberant component, while approximating the latter with a truncated series expansion of wave function solutions to the Helmholtz equation. Previously, we extended this method to the equivalent of a series expansion of infinite order by using kernel ridge regression [12, 13]. We incorporated the physics of the problem into the kernel function by representing the reverberant component as a continuous superposition of plane waves. We would later extend this formulation to target reflective field components of higher amplitude while being insensitive to components of lower amplitude [14].
We propose the use of a fully adaptive kernel function for the ATF that takes into consideration directed and residual sound fields as separate kernel functions added together, creating a complete model for the reverberant field caused by the environment The directed reverberation field is caused by a sparse set of reflective components with much greater amplitudes than average. This field was modeled based on a point-to-region kernel proposed in [15] to interpolate the sound field using von Mises–Fisher distributions [16]. The reverberant component, is composed of the remaining densely distributed plane wave components of lower amplitude. Due to the presence of densely distributed higher order reflections with low amplitude adding up together, we employ a neural network (NN) for the weight function. Using our generalized formulation for the ATF, our weight function could fully represent the nuances of the ATF. This kernel function formulation was initially proposed in [17], and is explored further in this work. An analogous formulation was also adapted for the point-to-region case in [18].
Previously, we have optimized the parameters of the kernel functions using robust loss functions [14] and simple least squares [17] combined with other optimization techniques [19]. In this work, we take a probabilistic approach and instead propose two different optimization schemes based on Bayesian inference [20]. One of them is the well-established Gaussian process regression (GPR) [21], or Kriging, and represents the optimal solution for kernel ridge regression under the assumption of Gaussian priors [22], in contrast to the simple least square used in [17]. Another approach we propose is utilizing probabilistic programming [23] to infer the effects the kernel parameters have on the resulting distribution using a Markov chain Hamiltonian Monte Carlo (MCHMC) sampling method [24, 25], specifically the no u-turn sampler (NUTS) [25], a variant of MCHMC. We combine the use of more sophisticated techniques by making the method more general and not assuming the amplitude or phase of the direct component is known, unlike in [17, 18], which makes the method more broadly applicable.
The remainder of this paper is organized as follows: in Sect. 2, we discuss the needed preliminary knowledge on the ATF and the formulation of the central problem from which we derive the kernel functions, as well as briefly discuss previous kernel formulations. Following that, in Sect. 3, we discuss the formulation of the adaptive kernel and justify the choices made in its design. Next, we go over the optimization of the model parameters using Bayesian inference in Sect. 4 in order to derive both GPR and MCHMC-based estimations. Following that, we go over the numerical experiments and the results of the attempted interpolations. Finally, we discuss the totality of our findings in the conclusion.
2 Preliminaries and objectives
Suppose that there is a space \(\Omega \in \mathbb {R}^3\) with stationary acoustic properties. Suppose also that the speed of sound is c and the frequency is f. Consider that \(\Omega\) contains two simply connected regions of interest with arbitrary geometry referred to as the source region \(\Omega _{\text{S}} \subset \Omega\) and the receiver region \(\Omega _{\text{R}}\subset \Omega\). The ATF \(h:\Omega _{\text{R}} \times \Omega _{\text{R}} \times \mathbb {R} \rightarrow \mathbb {C}\) between a monopole source placed in position \(\mathbf{s}\in \Omega _{\text{S}}\), and a pressure microphone placed in receiver position \(\mathbf{r}\in \Omega _{\text{R}}\) and wave number \(k = 2\pi f/c\) is written as \(h(\mathbf{r}|\mathbf{s}, k)\). Henceforth, the wave number k will be omitted from arguments for notational simplicity unless explicitly necessary.
2.1 Physical model of ATF
The behavior of h depends on the properties of \(\Omega\); however, there are properties of the ATF model that we assume to hold regardless of environment. The first of them is that the ATF can be divided into a direct component \(h_{\text{D}}\) and a reverberant component \(h_{\text{R}}\) [11] as
The second property is that the form of \(h_{\text{D}}\) is considered to be known:
where \(G_0\) is the free-field Green’s function [10] and \(\alpha _0\) is a multiplicative coefficient unknown to us. This model is similar to that of the exterior field with a scatterer proposed in [26], and could plausibly be extended to multipole sources using a similar formulation.
The third property is that, as \(\Omega _{\text{S}}\) and \(\Omega _{\text{R}}\) are free of scatterers and sources external to those relevant to the measurements, the reverberant component has an unknown form with known physics [13] in the form of the homogeneous Helmholtz equation when applied to both source and receiver coordinates:
where \(\nabla ^2_{\mathbf{r}}\) is the Laplacian operator applied only to the receiver position coordinates, and \(\nabla ^2_{\mathbf{s}}\) is the Laplacian operator applied only to the source position coordinates.
Finally, as our source is a monopole, the ATF is considered to be reciprocal [27]. This means if you switch around which point is the source and which is the receiver, the measured pressure field should not change. In other words,
Given \(h_{\text{D}}\) has a straightforward formulation that can be expressed with a single coefficient, we must define a feature space \(\mathscr {H}\) for the reverberant component that satisfies all of the properties expanded upon here while also allowing for solutions we can feasibly derive.
2.2 Problem statement
Suppose that we distribute a total of L point sources in the source region at known positions \(\{\mathbf{s}_l\}_{l=1}^{L} \subset \Omega _{\text{S}}\) and M pressure microphones in the receiver region at positions \(\{\mathbf{r}_m\}_{m=1}^M \subset \Omega _{\text{R}}\), as shown in Fig. 1. We then record all possible \(N=LM\) ATF samples between each source and microphone position pair
as
where \(n=m+(l-1)M\) is the index of the pairs. We also define the vectors of measurements \(\mathbf{y} = [y_1,\dots ,y_N]^\textsf{T}\) and noises \(\varvec{\epsilon } = [\epsilon _1,\ \dots ,\ \epsilon _N]^\textsf{T}\), here not assumed to have any specific probabilistic distribution.
Given the properties outlined in Sect. 2.1, our objective is to define a probabilistic model constrained by those physical requirements. We can find the most probable model parameters given our data as
where \(p(\alpha _0, h_{\text{R}}|\mathbf{y})\) is the posterior distribution, \(\mathscr {H}\) is the functional space containing the reverberant component, \(p(\mathbf{y}|\alpha _0,g_{\text{R}})\) is the likelihood, \(p(\alpha _0, h_{\text{R}})\) is the prior, and \(p(\mathbf{y})\) is the marginal probability distribution associated with measuring \(\mathbf{y}\), a quantity we cannot know for certain and that we cannot alter. This formula, known as Bayes’ theorem [20], indicates that optimizing the posterior is equivalent to optimizing only the numerator, meaning our model parameters are estimated with a maximum a posteriori (MAP) approach. This estimation is determined as
where \(\hat{\alpha }_0\) is the direct component coefficient and \(\hat{h}_{\text{R}}\) is the interpolation function of the reverberant component. The interpolation function of the direct component \(\hat{h}_{\text{D}}\) is obtained by using the value of \(\hat{\alpha }_0\) in (2). Then, our final interpolation function \(\hat{h}\) is the sum of both.
In order to solve (10), we need to properly define \(\mathscr {H}\) as well as determine the distributions associated. It is important that the feature space of the ATF allows for a tractable formula to compute the interpolation function. We opted to define \(\mathscr {H}\) as a reproducing kernel Hilbert space (RKHS), which yields a comprehensive and very general framework.
2.3 Reproducing kernel Hilbert space for reverberant component
Kernel methods have many applications in machine learning and statistical modeling [21, 22] that make the formulation of our feature space as a RKHS especially attractive. A RKHS is a type of functional space that is also a Hilbert space, meaning a complete inner product space [28], equipped with an inner product \(\langle \cdot , \cdot \rangle\) and a kernel function \(\kappa\) called the reproducing kernel of \(\mathscr {H}\), which is unique [29].
The impetus behind defining such a particular space is the representer theorem [29, 30], which states any empirical optimization criterion can be optimized using a linear estimator based on the reproducing kernel and on the measurement arguments. An empirical optimization criterion in this context means the optimization of some objective function based on empirical measurements, for which (10) qualifies. In other words, the interpolation function of the reverberant component can be guaranteed to be of the form
where \(\varvec{\alpha }\in \mathbb {C}^N\) is the coefficient vector and \(\varvec{\kappa }\) is the functional vector. Therefore, as long as we define the space \(\mathscr {H}\) so it has the desirable properties for the reverberant component, our interpolation function will satisfy those conditions as well.
2.3.1 Formulation of RKHS with directional weighting
Given that the sound field interpolation problem and the ATF interpolation problem are closely related, we look to the sound field estimation problem and the usage of the Herglotz wave function [31, 32] as a form of defining a general RKHS [15, 33]. This allows for the creation of an equivalent formulation for the ATF [14, 17], except defined on two directional components instead of one:
where \(\mathbb {S}^2\subset \mathbb {R}^3\) is the unit sphere, representing the directional space, \(\hat{\mathbf{s}}\in \mathbb {S}^2\) is the directional component relating to the source position, \(\hat{\mathbf{r}}\in \mathbb {S}^2\) for the receiver, and \(\tilde{h}_{\textrm{R}}\) represents both phase and amplitude of this directional pair.
Using this formulation, the inner-product space \((\mathscr {H}, \langle \cdot , \cdot \rangle _{\mathscr {H}})\) can be defined as [14, 17]
where \(\overline{\cdot }\) is the complex conjugate, \(g_1\), \(g_2 \in \mathscr {H}\) are generic functions in the feature space, \(w:\mathbb {S}^2\times \mathbb {S}^2\rightarrow \mathbb {R}_+\) is the directional weighting function, and \(L^2\left (\mathbb {S}^2\times \mathbb {S}^2, w\right )\) represents the square-integrable functions on the domain \(\mathbb {S}^2\times \mathbb {S}^2\) and for the weight function w.
The relation \(\tilde{h}_{\textrm{R}}\left (\hat{\mathbf{r}}, \hat{\mathbf{s}}\right ) = \tilde{h}_{\textrm{R}}\left (\hat{\mathbf{s}}, \hat{\mathbf{r}}\right )\) guarantees the reciprocity of ATF, meaning \((\mathscr {H}, \langle \cdot , \cdot \rangle _{\mathscr {H}})\) is an inner product space that satisfies our model requirements outlined in Sect. 2.1 regardless of the weight function w. Because \(\mathscr {H}\) inherits the completeness of \(L^2\) spaces, it is a Hilbert space.
We can also define a bivariate kernel function such that
which we can show is a reproducing kernel by operating with it on a generic function \(g_{\text{R}}\in \mathscr {H}\) to show that:
to which we can apply the reciprocity to show that
meaning \(\kappa\) is a reproducing kernel for a wide variety of weight functions w and thus our space is indeed a RKHS. Therefore, we have constructed a physical model guaranteed to satisfy the Helmholtz equation and to be reciprocal while the weight function w can freely learn using only data-driven methods.
2.3.2 Uniform weight
In [12], we proposed a kernel ridge regression (KRR) method for ATF interpolation. In [13], we showed that the proposed model was equivalent to a uniform weight, i.e., a constant weight function:
While this kernel function can be sufficient in many problem configurations, the lack of parameters harms its performance, as it cannot properly adapt to properties of the environment.
2.3.3 Sunken sphere weight
Due to the separation of direct and reverberant fields, the wave components associated with directions similar to that direct path should be reduced, as those components are present in the direct field function. For that reason, the authors proposed a weight function in [14] that rejects components in the \(\hat{\varvec{\eta }}_0\) direction connecting the centers of the regions and promotes lateral components. This weight will be referred to as the sunken sphere weight \(w_{\textrm{sk}}\), due to the shape of its gain plot [14].
We guarantee reciprocity by making the weight function \(w_{\textrm{sk}}\) separable, meaning \(w_{\textrm{sk}}\left (\hat{\mathbf{r}}, \hat{\mathbf{s}}\right ) =\mathring{w}_{\textrm{sk}}\left (\hat{\mathbf{r}}\right ) \mathring{w}_{\textrm{sk}}\left (\hat{\mathbf{s}}\right )\). The auxiliary weight \(\mathring{w}_{\textrm{sk}}\) determines the auxiliary kernel function as
where \(\mathbf{x}\) and \(\mathbf{x}^\prime\) are positions, \(\Delta \mathbf{x}=\mathbf{x}-\mathbf{x}^\prime\), \(\textrm{Re}\) is the real part, and \(\gamma _{\textrm{sk}},\ \beta _{\textrm{sk}}\) are parameters of the weight. The resuting reverberant component kernel function \(\kappa _{\textrm{sk}}\) can be expressed in terms of the auxiliary kernels as
This adaptive kernel is capable of delivering better estimations than the uniform kernel [14], but still represents a rather limited model for the reverberant field.
3 Adaptive kernel function for directed and residual reverberation
Both the uniform and sunken sphere kernels lack the necessary complexity to fully represent the features of the reverberant component for arbitrary environments. The uniform kernel assumes equal gain for all directions, which is unrealistic, as we expect incoming wave components to be stronger or weaker depending on the relative positioning of \(\Omega _{\textrm{R}}\) within \(\Omega\). The sunken sphere kernel allows for greater complexity, but it still assigns the same gain for directions coaxially distributed around the anti-bias direction \(\hat{\varvec{\eta }}_0\).
We propose the use of a kernel function that is fully adaptive to the environment by defining a combined kernel function that addresses both directed and residual reverberant fields as separate models. The directed field component represents a select set of reflective field components with high amplitude expected to result from early reflections on the boundary of \(\Omega\) and possible constructive interference. The residual field represents the remaining components of the reverberant field, an infinite superposition of reflections with weaker directionality.
3.1 Directed kernel function
The directed field kernel function aims to represent wave components with strong directionality, and as such is directed towards a sparse set of bias directions where the weight \(w_{\text{dir}}\) shows higher gains. The auxiliary weight \(\mathring{w}_{\text{dir}}\) and kernel \(\mathring{\kappa }_{\text{dir}}\) can be expressed as
where \(\left \{\hat{\varvec{\eta }}_{d}\right \}_{d=1}^D\subset \mathbb {S}^2\) are the bias directions, \(\varvec{\beta }\) is the gain coefficient of each direction, \(\varvec{\gamma }\) are the combination coefficients of the directions and C is the normalization function for the kernel. This auxiliary kernel is a combination of von Mises–Fisher distributions, chosen both due to their rapid growth that promotes greater amplitudes, and due to their convenient properties in directional statistics [16], which allows for a closed form solution for the kernel [34]. It was initially introduced in [15]. The \(\ell _1\)-norm criterion for \(\varvec{\gamma }\) also induces sparsity, coherent with our assumptions of the directed field.
We can then obtain the ATF weight \(w_{\text{dir}}\) and kernel \(\kappa _{\text{dir}}\) in a similar manner to the sunken sphere ATF kernel as
which always guarantees that reciprocity is upheld. Given that \(j_0\) is an entire function [35] and even, compensating for the square root, this kernel is differentiable everywhere on both parameters, making this kernel capable of being optimized using simple descent criteria.
This kernel function is much more nuanced than the sunken sphere kernel, being able to represent more directions with coefficients learned from data. The directed weight has been shown to be very effective in representing reflections with very strong directionality [15], however the residual field composed of densely distributed lower amplitude wave components could not be entirely represented. For that reason, this kernel was combined with a residual kernel.
3.2 Residual kernel function
This kernel was created to address the expected behavior of the residual field [17], composed of numerous lower amplitude higher order components that could not be adequately addressed with the simpler directed field model. This field is more challenging to properly characterize than the directed component, as the residual field is comprised of densely distributed wave components generating directional patterns harder to represent mathematically. Instead, this kernel approximates the expression in (17) with a more flexible weight function without as many predetermined biases, unlike the directed component. As we have a shallow understanding of the expected behavior of this field we opt to use a universal approximator, in this case a NN, as the weight function in order for it to adapt to patterns learned from data.
We can impart into the weight the known properties of the residual field by carefully choosing the architecture of the NN and the field representation. In the interest of easing numerical load and reduce redundant calculations, we have opted to make this weight separable as well, thus directly enforcing the reciprocity. The auxiliary weight \(\mathring{w}_{\text{res}}\) and auxiliary kernel \(\mathring{\kappa }_{\text{res}}\) can thus be expressed as
where \(\textrm{NN}\) is a fully connected neural network receiving as argument a direction vector \(\hat{\mathbf{r}} \in \mathbb {S}^2\), \(\textrm{NI}[\cdot ]\) indicates the integral is performed numerically, and \(\varvec{\theta }\) are the parameters of the NN. The ATF weight \(w_{\text{res}}\) and kernel \(\kappa _{\text{res}}\) are obtained as a combination of auxiliary kernels, like the previous kernels:
This kernel function is not expected to perform properly by itself. NNs are structured as to take advantage of patterns in data, meaning that representing sudden localized growth requires either a special architecture or a deep network trained on external data that would carry too many parameters to be feasibly trained under the constraints of our problem. In other words, this kernel is not equipped to represent the directed field, but is expected to perform well in conjunction with the directed kernel.
The integration grid is not a trainable parameter of the kernel but it is still malleable, allowing for sampling in order to balance out model complexity and precision in the computation of the integral. As this estimation is essentially a discretization of (17), the defined kernel has an associated inner-product and thus approximates a properly-defined kernel function.
The NN weight employed here is a continuous function capable of estimating an arbitrary number of weight values with a fixed set of parameters. Given our objective is to create kernel functions for interpolations between source and receiver arrays, the expectation is that data is relatively scarce, as experiments with variable source position are difficult to perform. For that reason, this network should remain compact as to avoid overfitting. This creates an issue of representational power, which we addressed by employing a technique used primarily for keeping NNs memory-efficient (Fig. 2).
3.2.1 Network architecture
The architecture of the network was meant to remain compact, meaning we did not assign many hidden layers to make it a deep neural network (DNN), but rather employed 2 hidden layers with 12 neurons each. We also employed an extra layer that simply assigned a \(\textrm{ReLU}\) function, \(\textrm{ReLU}(x)=\max (x, 0)\), to the output of the previous layer in order to keep the weight nonnegative, one of the few requirements we imposed to it. For the rest of the network, we employed a special type of activation function, the rowdy activation [36].
The rowdy activation function is a special class of activation functions that are also adaptive to the data. That is to say, the rowdy carries with it network parameters that are few in number, but influence the entire network. We define the rowdy activation function \(\varphi\) as
where \(\varphi _0\) represents a standard activation function, in our case \(\tanh\), t is an iterator, T is the number of sinusoidal perturbations, \(\{\omega _t\}_{t=1}^{2T}\) are the angular frequencies of the sinusoidal functions, and \(\{\tau _t\}_{t=1}^{2T}\) are the amplitudes. Due to \(\tanh\) being analytic in the real line [35] and both \(\sin\) and \(\cos\) being entire functions, this activation function is always differentiable and will pose no problems for the learning of the NN.
4 Parameter optimization based on Bayesian inference
The introduction of Bayesian inference methods has had a monumental impact in machine learning and inverse problems as a whole [20, 37]. By taking into account uncertainties in our model, we can derive estimations that are less sensitive to unexpected observations.
It is very unlikely for a function resulting entirely out of randomness to be continuous, much less differentiable [38]. For that reason, random signals such as noise have very low probability of showing the same patterns as a signal satisfying a differential equation, which requires it be differentiable. By imposing physical constraints into our methods, we can guarantee differentiability. For a proper choice of partial differential equation, our data models should serve as strong priors.
In considering the physics of the ATF outlined in Sect. 2, we can create coherent models with the behavior of the ATF satisfying the Helmholtz equation. This factor makes the process of Bayesian inference more guided. We experimented with two different forms of Bayesian inference techniques: GPR and MCHMC.
4.1 Gaussian process regression
GPR, also known as Kriging [21], is the solution to our problem statement when we consider the signal and noise distribution to be Gaussian distributions where the variance of the noise is known.
Consider that by evaluating the model on the measurement points, we see:
where \(\mathbf{G}\) is the vector of direct components, \(\mathbf{K}\) is the Gram matrix. The optimization of \(\varvec{\alpha }\) can be given using kernel ridge regression [17, 22] and the optimization of \(\alpha _0\) can be given using least squares [19] as
where \(\lambda = \sigma ^2\) is the regularization constant, which also corresponds to the variance of the noise. Under these assumptions, we can rewrite (61) as an optimization problem only on the parameters of the model \(\varvec{\chi }\) to find
And of course, the parameters \(\varvec{\chi }\) are internal to the kernels and thus are dependencies of the Gram matrix \(\mathbf{K}\) and of the coefficients \(\varvec{\alpha }\) and \(\alpha _0\). The detailed derivation of this estimation is explored in Appendix 1.
The optimization of \(\varvec{\chi }\) is performed with gradient descent [22] for the parameters of the residual kernel and the reduced gradient method [19] for the directed component, in order to determine the descent direction of \(\varvec{\gamma }\) while keeping the unit norm condition [17].
GPR is a well-established technique that supercedes simple KRR [21]. However, it relies on both data and noise having normal distributions of known variance. We study next a method that can quantify uncertainty into the estimations.
4.2 MCHMC regression
Most Bayesian-based models need assumptions made about the distributions involved. One rather common assumption is that of Gaussian priors [38]. GPR is a widely used technique very attractive due to the form of the posterior being very tractable and straightforward to optimize. However, the posterior assumed might be nonconvex in regards to the kernel parameters and might have saddles and local minima that hinder optimization.
Another option is to estimate the parameters of the problem using probabilistic programming. We use a similar approach for regression as [39] did for the related problem of sound field reconstruction, meaning we impose that our model parameters \(\varvec{\chi }\) are entirely comprised of random variables. We evaluate the posterior based on samples of \(\varvec{\chi }\) taken using a probabilistic programming language [23]. We also assume we have a Markov process, meaning the next sampled state random variable \(\varvec{\chi }_{t+1}\) only depends on the current state random variable \(\varvec{\chi }_{t}\) of values for our parameters and their probability densities are connected by a transition function \(\tau _t\) such that:
where \(\mathcal {X}\) represents the values our random variable can take, and the transition function is a conditional probability density function that informs the sampling strategy we are using.
When the transition function preserves the underlying posterior distribution, our samples will start to tend towards the typical set for a big enough number of randomly generated samples [24]. An example of a sampling strategy that does so is Hamiltonian Monte Carlo, which makes it a very potent sampling strategy. Our chosen sampler is a variation of Hamiltonian Monte Carlo, NUTS [25].
With a sufficiently big number of samples, we can obtain the probability of observing \(\mathbf{y}\) conditioned to our distribution of \(\varvec{\chi }\) as \(p(\mathbf{y}|\varvec{\chi })\), at which point we can perform Bayesian inference on the parameters using Bayes’ theorem. The use of MCHMC thus allows us to quantify the effects of our parameters on the model continuously without needing to sample the model continuously while approaching the typical set conditioned to the observations.
5 Experiments
We experimented with the methods in numerical simulations within a shoebox-shaped room of size \(7.0~\textrm{m} \times 6.4~\textrm{m} \times 2.7~\textrm{m}\) with the center of the room being the center of the coordinate system. The simulations were conducted using the image source method [40]. The reflection coefficients and maximum order of terms for the calculations were obtained using pyroomacoustics [41]. We also generated noise with a probabilistic wind noise simulator [42, 43]. The choice of wind noise was due to two factors: the first factor is that wind noise is a relevant form of unpredictable, probabilistic noise, as it can result from devices such as air conditioning units. The second factor is that wind noise has a much less predictable probability distribution than Gaussian noise, so it should serve to establish how well each method performs under less than ideal conditions.
The source region \(\Omega _{\text{S}}\) was considered to be a sphere of radius \(0.5~\textrm{m}\) centered at \((0,0,0)~\textrm{m}\). The receiver region was also a sphere of radius \(0.5~\textrm{m}\), but centered at \((-2.5,-1.75,-0.2)~\textrm{m}\). The source and receiver positions were both placed in identical dual layer spherical arrays with a disposition determined by t-design [44, 45] for \(t=4\) in both layers, resulting in a total of 50 points and 2500 measurements of the ATF. These configurations can be seen in Fig. 3.
The signals were added noise signals that had their standard deviation adjusted so that each sample would have a signal-to-noise ratio (\(\textrm{SNR}\)) of \(20~\textrm{dB}\), as the ATF interpolation problem is performed under quite controlled situations. We experimented with three different reflection coefficients for the wall equivalent to reverberation times \(T_{60} \in \{0.2, 0.4, 0.8\}~\textrm{s}\). Finally, each frequency was multiplied by a complex number chosen randomly with complex uniform distribution, meant to represent the direct field component, as it was also part of the estimation.
Our proposed formulation of deriving the model parameters of the mixed kernel using GPR, Proposed (GPR), and NUTS sampling, Proposed (MCHMC), were compared to GPR for the uniform weight kernel, Uniform, and the sunken sphere kernel, Sunken sphere. In order to also validate the choice of a dual kernel model and quantify the effects of only adopting one of the models, we also compared the proposed methods to the directed kernel by itself, Directed only, and the residual kernel by itself, Residual only. The numerical integrator in (33) for Residual only, Proposed (GPR) and Proposed (MCHMC) was based on Lebedev quadrature [46] of order 15. The bias directions \(\{\varvec{\eta }_d\}_{d=1}^D\) introduced in (25) of Directed only, Proposed (GPR) and Proposed (MCHMC) were calculated using Lebedev quadrature of order 7. Each kernel model was trained independently of the others and the integration grid orders were chosen empirically.
5.1 Settings for parameter optimization
The parameters for the GPR estimation were obtained by optimizing the posterior \(\log\) likelihood. The regularization constant \(\lambda\) was set to \(10^{-2}\) for all the kernels, as its true value should not be considered a parameter of GPR. Doing otherwise can lead to overfitting. The coefficients \(\varvec{\alpha }\) and \(\alpha _0\) were determined as described in (41) and (42). The other kernel parameters were determined using gradient descent apart from \(\varvec{\gamma }\), which was optimized using the reduced gradient descent algorithm [19] with line searching in order to guarantee the unit norm was preserved. The descent direction associated with \(\varvec{\gamma }\) is \(\varvec{\delta } = [\delta _1,\ \delta _2,\ \dots ,\ \delta _N]^\textsf{T}\), where
where \(\mathcal {L}_{\text{opt}}\) represents (10) when the coefficients \(\alpha _0\) and \(\varvec{\alpha }\) are considered to be (42) and (41), respectively, \(\gamma _{\max}\) is the current biggest value in \(\varvec{\gamma }\). This constrained optimization was implemented in the Julia programming language [47] using the Optim library [48, 49] under the Flux framework [50].
For the MCHMC regression on the mixed kernel, the settings correspond to defining prior distributions for our model parameters and how they will effect change in our posterior. For these priors, we want \(\varvec{\gamma }\) to be composed of only positive numbers and to promote sparsity. As these coefficients are always acting together with the coefficients \(\varvec{\beta }\), we decided to place them in a hierarchical model:
where \(\mathcal {E}\) is the exponential distribution and \(\Gamma ^{-1}\) is the inverse gamma distribution. The logic behind this parameter choice is simple: our hypothesis that the directed component is composed of a select number of reverberant field components, hence the usage of sparsity-inducing criteria. The inverse gamma distribution skews for smaller values of \(\beta _d\). Conversely, each \(\beta _d\) was chosen to have an exponential distribution because the only requirement we have for it is that it is not negative. Additionally, if \(\gamma _d\) does happen to still be sparse, our directed and residual models would be given further credence.
Given that our coefficient vector \(\varvec{\alpha }\) has a total of 2500 parameters, we opted to keep it deterministic in order to lessen computational load and it was calculated using (41). However, for this estimation, the regularization constant \(\lambda\) was considered to be a variable of the model due to MCHMC admitting quantified uncertainty. As such, it was assigned an exponential distribution with a minimum value of \(10^{-3}\). Another parameter we will introduce for the sampling stage is the amplitude of the direct component \(\alpha _0\), which will serve as an alternative to the simple division applied in (42), as these operations can be rather unstable and costly by requiring repeated operations with the matrix inverse. Noise was quantified by assigning a distribution dependent on \(\lambda\). Finally, we have no conditions for our network parameters \(\varvec{\theta }\). Given the freedom embedded in these parameters, we assigned them Gaussian priors.
where \(\mathcal {N}\) is the real-valued normal distribution.
Finally, the model was fitted to our observations by calculating the Gram matrix \(\mathbf{K}\), sampling the predictions \(\hat{\mathbf{y}}\) and adding the error to them:
which were set to be sampled 1000 times with the NUTS implementation in [23]. Once we obtained the posterior distribution of the data model, we can use Bayes’ theorem in order to get the marginal likelihoods of our parameters. Finally, the parameters are estimated by sampling the parameter space around the mean within one standard deviation and picking the simulations that determined highest likelihood.
5.2 Evaluation metrics
We evaluated all competing kernel methods using two different criteria. The first criteria was the evaluation of each method on aggregate, within the entire region. The second criterion was a visualization of the ATF spatially and analysis of pointwise performance.
For the first criterion, we sampled the target region in a regular grid with a step of \(0.2~\textrm{m}\) in the source and receiver regions, resulting in \(L_{\text{e}} = 217\) evaluation source positions \(\{\mathbf{s}_{\textrm{e},l}\}_{l=1}^{L_{\text{e}}} \subset \Omega _{\text{S}}\) and \(M_{\text{e}} = 217\) evaluation receiver positions \(\{\mathbf{r}_{\textrm{e},m}\}_{m=1}^{M_{\text{e}}} \subset \Omega _{\textrm{R}}\), giving us a total of \(N_{\text{e}} = M_{\text{e}} L_{\text{e}} = 47,089\) evaluation pairs. For the aggregate evaluation, our chosen metric was the normalized mean square error (\(\textrm{NMSE}\)) calculated for these evaluation points as
where \(\sum \limits _{m,l} = \sum \nolimits _{m=1}^{M_{\text{e}}} \sum \nolimits _{l=1}^{L_{\text{e}}}\).
The second evaluation was performed by simulating the ATF caused by a single source at the center of \(\Omega _{\text{S}}\) \(\mathbf{s}_0 = [0,0,0]^\textsf{T}\) for a reverberation time of \(0.4~\textrm{s}\). We simulated the ATFs in the square area \(\{(x,y,z):-3< x< -2,\ -2.25< y < -1.25,\ z=-0.2\}\), in order to visualize how well the methods can reconstruct ATFs. We evaluated both the ability of each kernel function to reconstruct the real part of the ATF as well as the pointwise normalized square error (\(\textrm{NSE}\)) defined as
Finally, we also mapped out the gain for the final weight function w of the Bayesian regression for the same frequency analyzed for the visualization criterion, \(1.65~\textrm{kHz}\), in order to show how well our hypothesis about the combined kernel held up.
5.3 Experimental results
The results for our aggregate error analysis can be seen in Fig. 4, which show Proposed was able to achieve lower error across every frequency, both Proposed (GPR) and Proposed (MCHMC). For all reverberation times and frequencies, Uniform was the worst estimation, as was expected due to this kernel having no parameters to learn. For the lowest reverberation time of \(0.2~\textrm{s}\), Directed only and Sunken sphere had very similar performance to Proposed (GPR), as the present reverberant field is expected to be mostly wave components that have not been deeply impacted by the environment. Under these conditions, the simpler Directed only model is likely sufficient to perform estimations.
However, for both \(0.4~\textrm{s}\) and \(0.8~\textrm{s}\), the proposed methods stay somewhat stable while Directed only and Sunken sphere performance degrade and approach that of Residual only as the reverberation time increases. As was within expectations, the Residual only kernel weight was not capable of interpolating the ATF properly when alone, likely due to the way it is structured. The fact we included error estimation and as such introduced uncertainty to Proposed (MCHMC) resulted in the most robust evaluations across all metrics.
The attempted reconstructions of the real part can be seen in Fig. 5, showing that once again Proposed (GPR) and Proposed (MCHMC) performed the closest reconstructions of the ATF in a pointwise basis, likely due to the representative power of the kernel model, which considered a more complete representation of the reverberant field than the other kernels. Perhaps due to the abundance of measurement points from the array present in the region, it might be difficult to tell apart the differences between Proposed (GPR) and Proposed (MCHMC). However, we can see more clearly that Proposed (MCHMC) outperforms Proposed (GPR) in Fig. 6, in which it is clear that Proposed (MCHMC) has higher overall representative power spatially too.
Uniform was not capable of properly interpolating the ATF pointwise, showing big dark regions and very low gain even where the approximation was somewhat close. Sunken sphere has a very smooth and uniform error estimation, likely due to how simple its weight estimation is. Interestingly, the gain in the direction pointing to the source and in the directions orthogonal to it is rather stable, likely due to how the weight is configured. Residual only shows unstable error behavior, showing that the residual kernel alone is not proper for interpolations without an added kernel capable of representing strong directionality.
Finally, we also decided to observe the kernel weights of Proposed (MCHMC) in order to confirm whether or not the Monte Carlo simulation-based estimation was capable of learning the patterns we expected. These gain plots can be seen in Fig. 7. The Directed kernel weight has regions of high gain separated by regions of low gain, indicating that the distribution of \(\varvec{\gamma }\) dependent on \(\varvec{\beta }\) induced sparsity. The residual weight also shows somewhat complimentary behavior to the directed weight, having some notably darker spots where the directed weight observes its peaks. The combined full weight shows that the gain of the directed kernel has more influence on the overall shape of the weight function, which would explain the directed kernel performing better by itself than the residual kernel. Interestingly, we do see that gains with lower elevation seem to feature more strongly on the directed weight, likely due to the array for the receiver region being placed closer to the ground. This lower gain is being compensated by the residual weight, which has higher gain for higher elevations.
6 Conclusion
We introduced a fully adaptive kernel interpolation model for the acoustic transfer function between regions that considers directed and residual reverberant fields. This comprehensive kernel estimation achieves this complete field representation by considering separate kernel models for directed and residual fields. The directed kernel aims to represent wave components with strong directionality, and as such is expressed as a combination of functions that bias the estimation toward particular directions. The residual kernel is meant to represent much less predictable reverberant field components with lower amplitudes, and was represented by a neural network weight. This joint kernel had its parameters chosen by a probabilistic criterion using both Gaussian process regression and a Markov chain Monte Carlo sampler in order to insert uncertainty into the estimation, as well as explore the behavior of the model as the parameters are sampled. This uncertainty-aware estimation outperformed the competing kernel estimations in numerical simulations using a noise simulator for wind noise.
References
M. Cobos, J. Ahrens, K. Kowalczyk, A. Politis, An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction. EURASIP J. Audio. Speech. Music. Process. 2022, 10 (2022). https://doi.org/10.1186/s13636-022-00242-x
Y. Haneda, S. Makino, Y. Kaneda, N. Koizumi, ARMA modeling of a room transfer function at low frequencies. J. Acoust. Soc. Japan (E) 15, 353–355 (1994). https://doi.org/10.1250/ast.15.353
Y. Haneda, Y. Kaneda, N. Kitawaki, Common-acoustical-pole and residue model and its application to spatial interpolation and extrapolation of a room transfer function. IEEE Trans. Speech Audio Process. 7(6), 709–717 (1999). https://doi.org/10.1109/89.799696
R. Mignot, G. Chardon, L. Daudet, Low frequency interpolation of room impulse responses using compressed sensing. IEEE/ACM Trans. Audio Speech Lang. Process. 22(1), 205–216 (2014). https://doi.org/10.1109/TASLP.2013.2286922
N. Antonello, E. De Sena, M. Moonen, P.A. Naylor, T. van Waterschoot, Room impulse response interpolation using a sparse spatio-temporal representation of the sound field. IEEE/ACM Trans. Audio Speech Lang. Process. 25(10), 1929–1941 (2017). https://doi.org/10.1109/TASLP.2017.2730284
O. Das, P. Calamia, S.V.A. Gari, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP). Room impulse response interpolation from a sparse set of measurements using a modal architecture (2021), pp. 960–964. https://doi.org/10.1109/ICASSP39728.2021.9414399
Z. Liang, W. Zhang, T.D. Abhayapala, Sound field reconstruction using neural processes with dynamic kernels. EURASIP J. Audio Speech Music Process. 2024 (2024). https://doi.org/10.1186/s13636-024-00333-x
M. Pezzoli, D. Perini, A. Bernardini, F. Borra, F. Antonacci, A. Sarti, Deep prior approach for room impulse response reconstruction. Sensors 22(7, 2710) (2022). https://doi.org/10.3390/s22072710
X. Karakonstantis, D. Caviedes-Nozal, A. Richard, E. Fernandez-Grande, Room impulse response reconstruction with physics-informed deep learning. J. Acoust. Soc. Amer. 155(2), 1048–1059 (2024). https://doi.org/10.1121/10.0024750
E.G. Williams, Fourier Acoustics (Academic Press, London, 1999)
P.N. Samarasinghe, T.D. Abhayapala, M.A. Poletti, T. Betlehem, An efficient parameterization of the room transfer function. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2217–2227 (2015). https://doi.org/10.1109/TASLP.2015.2475173
J.G.C. Ribeiro, N. Ueno, S. Koyama, H. Saruwatari, in Proc. IEEE Sensor Array Multichannel Signal Process. Workshop (SAM). Kernel interpolation of acoustic transfer function between regions considering reciprocity (2020). https://doi.org/10.1109/SAM48682.2020.9104256
J.G.C. Ribeiro, N. Ueno, S. Koyama, H. Saruwatari, Region-to-region kernel interpolation of acoustic transfer functions constrained by physical properties. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 2944–2954 (2022). https://doi.org/10.1109/TASLP.2022.3201368
J.G.C. Ribeiro, S. Koyama, H. Saruwatari, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP). Region-to-region kernel interpolation of acoustic transfer function with directional weighting (Singapore, 2022), pp. 576–580. https://doi.org/10.1109/ICASSP43922.2022.9746842
R. Horiuchi, S. Koyama, J.G.C. Ribeiro, N. Ueno, H. Saruwatari, in Proc. IEEE Int. Workshop Appl. Signal Process. Audio Acoust. (WASPAA). Kernel learning for sound field estimation with l1 and l2 regularizations (2021), pp. 261–265. https://doi.org/10.1109/WASPAA52581.2021.9632731
K.V. Mardia, P.E. Jupp, Directional Statistics (Wiley, Chichester, 2009)
J.G.C. Ribeiro, S. Koyama, H. Saruwatari, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP). Kernel interpolation of acoustic transfer functions with adaptive kernel for directed and residual reverberations (2023), pp. 1–5. https://doi.org/10.1109/ICASSP49357.2023.10095429
J.G.C. Ribeiro, S. Koyama, R. Horiuchi, H. Saruwatari, Sound field estimation based on physics-constrained kernel interpolation adapted to environment. IEEE/ACM Trans. Audio, Speech, Lang. Process. (2023). (Preprint). https://doi.org/10.36227/techrxiv.24455380.v1
D.G. Luenberger, Y. Ye, Linear and Nonlinear Programming (Springer Cham, Gewerbestrasse, 2016)
M.A. Amaral Turkman, C.D. Paulino, P. Müller, Computational Bayesian Statistics: An Introduction. Institute of Mathematical Statistics Textbooks (Cambridge University Press, Cambridge, 2019). https://doi.org/10.1017/9781108646185
C.E. Rasmussen, C.K.I. Williams, Gaussian processes for Machine Learning (MIT Press, Cambridge, 2006)
K.P. Murphy, Probabilistic Machine Learning (MIT Press, Cambridge, 2022)
H. Ge, K. Xu, Z. Ghahramani, in Int. Conf. Artif. Intell. Stat., (AISTATS). Turing: a language for flexible probabilistic inference (Playa Blanca, 2018), pp. 1682–1690. http://proceedings.mlr.press/v84/ge18b.html. Accessed 29 Oct 2023
M. Betancourt, A conceptual introduction to Hamiltonian Monte Carlo (2018). https://doi.org/10.48550/arXiv.1701.02434
M.D. Hoffman, A. Gelman, The no-u-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15(1), 1593–1623 (2014)
S. Koyama, M. Nakada, J.G.C. Ribeiro, H. Saruwatari, in Proc. IEEE Int. Workshop Appl. Signal Process. Audio Acoust. (WASPAA). Kernel interpolation of incident sound field in region including scattering objects (2023), pp. 1–5. https://doi.org/10.1109/WASPAA58266.2023.10248156
P.N. Samarasinghe, T.D. Abhayapala, W. Kellermann, Acoustic reciprocity: An extension to spherical harmonics domain. J. Acoust. Soc. Amer. 142(4), EL337–343 (2017). https://doi.org/10.1121/1.5002078
W. Rudin, Functional Analysis (McGraw-Hill, New York City, 1991)
J.H. Manton, P.O. Amblard, A primer on reproducing kernel Hilbert spaces. Found. Trends® Signal Process. 8(1-2), 1–126 (2015). https://doi.org/10.1561/2000000050
B. Schölkopf, R. Herbrich, A.J. Smola, in Comput. Learn. Theory, ed. by D. Helmbold, B. Williamson. A generalized representer theorem (Springer Berlin, Berlin, 2001), pp. 416–426. https://doi.org/10.1007/3-540-44581-1_27
M. Ikehata, The Herglotz wave function, the Vekua transform and the enclosure method. Hiroshima Math. J. 35 (2005).https://doi.org/10.32917/hmj/1150998324
D. Colton, P. Monk, in Topics in Computational Wave Propagation: Direct and Inverse Problems, ed. by M. Ainsworth, P. Davies, D. Duncan, B. Rynne, P. Martin. Herglotz Wave Functions in Inverse Electromagnetic Scattering Theory (Springer, Berlin, 2003), pp. 367–394. https://doi.org/10.1007/978-3-642-55483-4_10
N. Ueno, S. Koyama, H. Saruwatari, Directionally weighted wave field estimation exploiting prior information on source direction. IEEE Trans. Signal Process. 69, 2383–2395 (2021). https://doi.org/10.1109/TSP.2021.3070228
H. Ito, S. Koyama, N. Ueno, H. Saruwatari, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP). Spatial active noise control based on kernel interpolation with directional weighting (IEEE, Barcelona, 2020), pp. 8399–8403
W. Rudin, Real and Complex Analysis (McGraw-Hill, New York City, 1986)
A.D. Jagtap, Y. Shin, K. Kawaguchi, G.E. Karniadakis, Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions. Neurocomput. 468, 165–180 (2022). https://doi.org/10.1016/j.neucom.2021.10.036
A. Mohammad-Djafari, Regularization, bayesian inference, and machine learning methods for inverse problems. Entropy 23(12) (2021). https://doi.org/10.3390/e23121673
E. Çinlar, Probability and Stochastics (Springer, New York, 2011)
D. Caviedes-Nozal, N.A.B. Riis, F.M. Heuchel, J. Brunskog, P. Gerstoft, E. Fernandez-Grande, Gaussian processes for sound field reconstruction. J. Acoust. Soc. Amer. 149(2), 1107–1119 (2021). https://doi.org/10.1121/10.0003497
J.B. Allen, D.A. Berkley, Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Amer. 65(4), 943–950 (1979). https://doi.org/10.1121/1.382599
R. Scheibler, E. Bezzam, I. Dokmanić, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) 2018. Pyroomacoustics: A python package for audio room simulation and array processing algorithms (2018), pp. 351–355. https://doi.org/10.1109/ICASSP.2018.8461310
C.M. Nelke, P. Vary, in 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC). Measurement, analysis and simulation of wind noise signals for mobile communication devices (2014), pp. 327–331. https://doi.org/10.1109/IWAENC.2014.6954312
C.M. Nelke, P. Vary. Wind noise database. https://www.iks.rwth-aachen.de/forschung/tools-downloads/databases/wind-noise-database. Accessed 29 Oct 2023
F. Zotter, M. Frank, A. Sontacchi, in Proc. EAA EuroRegio, Congr. Sound Vibr. The virtual t-design ambisonics-rig using VBAP (EAA, Ljubljana, 2010)
X. Chen, R.S. Womersley. Spherical t-design with \(d=(t+1)^{\wedge }2\) points. http://www.polyu.edu.hk/ama/staff/xjchen/sphdesigns.html. Accessed 18 Oct 2023
V.I. Lebedev, D.N. Laikov, A quadrature formula for the sphere of the 131st algebraic order of accuracy. Doklady Math. 59, 477–481 (1999)
J. Bezanson, A. Edelman, S. Karpinski, V.B. Shah, Julia: A fresh approach to numerical computing. SIAM Rev. 59(1), 65–98 (2017). https://doi.org/10.1137/141000671
P.K. Mogensen, A.N. Riseth, Optim: A mathematical optimization package for Julia. J. Open Source Softw. 3(24), 615 (2018). https://doi.org/10.21105/joss.00615
V.K. Dixit, C. Rackauckas. Optimization.jl: A unified optimization package (2023). https://doi.org/10.5281/zenodo.7738525
M. Innes, E. Saba, K. Fischer, D. Gandhi, M.C. Rudilosso, N.M. Joy, T. Karmali, A. Pal, V. Shah, Fashionable modelling with Flux. Comput. Res. Repo. (CoRR) (2018). arXiv:1811.01457. Accessed 29 Oct 2023
Acknowledgements
Not applicable.
Funding
This work was supported by JSPS KAKENHI Grant Number 23K24864 and JST FOREST Grant Number JPMJFR216M.
Author information
Authors and Affiliations
Contributions
J. G. C. Ribeiro conducted the experiments and analysis. His supervisors S. Koyama and H. Saruwatari offered guidance and carried out necessary revisions to the manuscript. The authors agree to the contents of this document.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix 1: Derivation of GPR loss
Appendix 1: Derivation of GPR loss
In order to determine the value of the coefficients, we look at the prior distributions:
where \(\mathcal {N}_\mathbb {C}\) is the complex-valued normal distribution, \(\sigma\) is the estimated standard deviation of the noise. Then, we can apply these considerations to the prior distributions in (10) in order to obtain:
Since the logarithm is a monotonically increasing function, we can show that (10) is equivalent to:
where \(\hat{\alpha }_0\) is the optimal coefficient of the direct component, \(\varvec{\chi }\) are the trainable parameters of the kernel weights (e.g. \(\varvec{\beta }\), \(\varvec{\gamma }\) and \(\varvec{\theta }\) for the proposed kernel weight), and \(\hat{\varvec{\chi }}\) are the optimized kernel parameters. We can then reorganize this criterion by eliminating the constants in order to arrive at:
where \(\hat{\varvec{\alpha }}\) is the optimal coefficient vector of the kernel estimation.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ribeiro, J.G.C., Koyama, S. & Saruwatari, H. Physics-constrained adaptive kernel interpolation for region-to-region acoustic transfer function: a Bayesian approach. J AUDIO SPEECH MUSIC PROC. 2024, 43 (2024). https://doi.org/10.1186/s13636-024-00362-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13636-024-00362-6