 Research
 Open access
 Published:
Spherical harmonic covariance and magnitude function encodings for beamformer design
EURASIP Journal on Audio, Speech, and Music Processing volume 2021, Article number: 41 (2021)
Abstract
Microphone and speaker array designs have increasingly diverged from simple topologies due to diversity of physical host geometries and use cases. Effective beamformer design must now account for variation in the array’s acoustic radiation pattern, spatial distribution of target and noise sources, and intended beampattern directivity. Relevant tasks such as representing complex pressure fields, specifying spatial priors, and composing beampatterns can be efficiently synthesized using spherical harmonic (SH) basis functions. This paper extends the expansion of common stationary covariance functions onto the SHs and proposes models for encoding magnitude functions on a sphere. Conventional beamformer designs are reformulated in terms of magnitude density functions and beampatterns along SH bases. Applications to speaker farfield response fitting, crosstalk cancelation design, and microphone beampattern fitting are presented.
1 Introduction
Acoustic array designs have undergone many iterations of improvement as physical simulation data becomes more accurate and abundant. Rapid microphone and speaker array prototyping efforts have moved into simulation to work in tandem with both industrial and product design. Beamforming sits at the intersection where upstream device geometry and hardware constrain the acoustics and performance of the array, impacting the audio application qualities downstream.
Microphone beamforming improves the signaltonoise ratio of automatic speechrecognition and wakeword detection [6], estimate source directionofarrival [18], and separate acousticobjects [30]. Speaker beamforming generalizes crosstalk cancelation [34] and can widen acousticimages by exciting wall reflections [37]. Farfield beamformer performance improves when steering vectors incorporate the local geometry in their freefield acoustic responses [5]. Commercial software such as COMSOL [8] can readily import device geometries in acoustic simulations. Therefore, a common basis representation for beamformer design synthesizes different requirements and simulations across applications.
The complex spherical harmonic (SH) functions [19, 24, 27] form one such basis that have found varied uses for data fitting, analysis, and synthesis. Headrelated transfer functions can be efficiently encoded along the SH bases for binaural reproduction [2, 11, 36]. Planewaves decomposition and beamforming on spherical microphone arrays are directly formulated along the SH bases [9, 21, 25, 27]. For nonspherical arrays, speaker and microphone farfield directivity are decomposed along the SHs from acoustic responses at large finite distances [3] and integrated with room impulse response models [22, 29]. Acoustic directivity can also be fitted to magnitude beampatterns along the SH bases [20, 26].
Magnitude function synthesis on a sphere is less wellknown in beamforming but appears in interpolation and Bayesian methods such as Kriging [32] and Gaussian processes [28]. We show how these techniques that specify covariance or kernel functions can relate acoustic directivity with spherical probablistic density functions to better design, fit, and evaluate beampatterns. This improves upon several prototyping stages which we list as the following contributions:

1
We fit nonparametric models to acoustic directivity and show that their basis functions can be expanded along SHs to give superior biasvariance tradeoffs than spatial crosscorrelation models [19] and direct SH fittings [14].

2
We derive the analytic SH expansions of several Matérn covariance functions of chordal distances which in previous works have only been shown to be valid on the sphere [12, 13, 16, 31]. Expansion errors under shapeconstrained design parameters quickly converges to 0 for increasing max truncationorders and ensuring model tractability.

3
We generalize the acoustic directivity index (DI) [19, 27] to support nonuniform spatial weighting or probability density functions (PDFs) in terms of SHs. This reduces bias of previous discrete sampling schemes that must draw from finite measurements and integrates over target and noise sources’ directional priors indicative of realworld performance compared to flat priors.

4
We propose a mixture and product of SH encodings model for synthesizing beampattern targets and accelerate beampattern fitting optimization [20] by analytic derivatives and integration over SH encoded acoustic directivity.

5
We present case studies for fitting transcoded SHs to simulated speaker responses’ anechoic acoustic directivity, evaluating expected crosstalk cancelation performances under varying spatial priors, and ranking optimal crosstalk cancelation beampatterns w.r.t. loss functions.
The paper contents are listed as follows: Section 2 introduces the SH notation and derive several useful operators along the SH bases. Section 3 presents novel encodings of Matérn covariance function expansions of chordal distances along the SH bases and proposes the mixture of encodings (MoE) and product of encodings (PoE) models for data fitting and magnitude function synthesis. Section 4 introduces a new probabilistic treatment of DI and then reformulates conventional minimum variance distortionless response (MVDR) and generalized Rayleigh quotient (GRQ) beamformers in terms of SH encoded spatial priors. Section 5 extends beampattern fitting methods such as least squares (LS), magnitude least squares (MLS), and magnitude squared least squares (MSLS) into the SH bases. Section 6 establishes experiments on anechoic farfield directivity fitting by radial basis function (RBF) transcoding, performance impact of spatial priors on crosstalk cancelation, and tradeoffs between beampattern fitting methods. Sections 7 and 8 discuss the findings and conclude the work.
2 Background and notation
The spherical harmonics are orthogonal basis functions over spherical coordinates [19]:
where for notation, we denote l, m as the order and degree, respectively, (θ, ϕ) the physics convention spherical coordinates for colatitude and azimuth respectively, and \(P_{l}^{m} (\cos \theta)\) the associated Legendre polynomials. A function f(θ,ϕ) defined over spherical coordinates can be expanded along the SH bases:
where \(C_{l}^{m} \in \mathbb {C}\) are the expansion coefficients for basis functions of order l and degree m, and P the maximum order or the number of truncation terms in the SH expansion. The number degrees m for order l totals 2l+1 and number of basis functions upto max truncationorder P is (P+1)^{2}. Therefore, a vector of coefficients \(\boldsymbol {C} \in \mathbb {C}^{(P+1)^{2} \times 1} \) and SH function evaluations \(\boldsymbol {Y}(\theta, \phi) \in \mathbb {C}^{(P+1)^{2} \times 1} \) are denoted in bold and arranged in ascending basis orders [19]. We derive several vector operators along the SH coefficients of f(θ,ϕ) from Eq. 2.
Conjugation: The complex conjugate ()^{∗} of function f(θ,ϕ) flips the order of m and negates odd m harmonics using the conjugate SH property \(Y_{l}^{m*}(\theta, \phi) = (1)^{m} Y_{l}^{m}(\theta, \phi)\) in [27]. The vectorized formulation permutes the SH coefficients
via the exchange matrix \(\boldsymbol {J}_{l} \in \mathbb {R}^{(2 l + 1) \times (2 l + 1) }\) defined in [15]. A realvalued function f(θ,ϕ)=f^{∗}(θ,ϕ) therefore has SH coefficients satisfying \(\boldsymbol {C} = \widetilde {\boldsymbol {C}}\).
Multiplication: The product two SH basis function is closed under addition
where \(M = m + \bar {m}\) and c() the ClebschGordan coefficients defined in [4, 10]. We can therefore expand the product of functions \(f(\theta, \phi) \approx \boldsymbol {C}_{f}^{T} \boldsymbol {Y}(\theta, \, \phi), \, \boldsymbol {C}_{f} \in \mathbb {C}^{(P+1)^{2} \times 1}\) and \(\bar {f}(\theta, \phi) \approx \boldsymbol {C}_{\bar {f}}^{T} \boldsymbol {Y}(\theta, \, \phi), \, \boldsymbol {C}_{\bar {f}} \in \mathbb {C}^{(\bar {P} + 1)^{2} \times 1} \) in terms of a product operator ◇ of SH coefficients
where L, M are indices for the order and degree respectively, \(\bar {m} = M  m\), and \(C_{l}^{m} \in \boldsymbol {C}_{f}, \bar {C}_{\bar {l}}^{\bar {m}} \in \boldsymbol {C}_{\bar {f}}, \boldsymbol {C}_{f\bar {f}} \in \mathbb {C}^{(P + \bar {P} + 1)^{2} \times 1}\).
Squared modulus: A function’s magnitude squared can be computed along SH bases via the product operator of the SH expansion with its conjugate in Eqs. 3 and 5 given by
3 Function encodings and fitting
We define a stationary covariance function K over random variables X(r) that depends only on the chordal distance d(Θ) on a unit sphere:
where \(\boldsymbol {r} \in \mathbb {R}^{3}\) is a cartesian unit vector representing a direction or point on the unit sphere, and Θ= cos−1(r·r^{′}) the central angle between r,r^{′}.
As a motivating example, let the random variable X(r) be the sound pressure P(r,Ω,k) at position r and wave number k due to a unit amplitude wave incident to a direction drawn from random variable Ω with uniform density function f(Ω)=(4π)^{−1} over the unit sphere. This is equivalent to the wellknown spatial crosscorrelation between sound pressure at two points in a diffuse field [19] which after substitution into Eq. 7 yields the unnormalized sinc kernel:
where the positive semidefiniteness follows from expanding P(r,Ω,k)P^{∗}(r^{′},Ω,k) along SHs in [19] to supply the weights λ and orthonormal bases ψ to Mercer’s Theorem \(K(\boldsymbol {r}, \boldsymbol {r'}) = \sum _{l=0}^{\infty } \lambda _{l} \psi _{l} (\boldsymbol {r}) \psi _{l} (\boldsymbol {r'})\) [23]. Moreover, Mercer’s theorem allows a valid covariance function to be defined without explicit specification of its random variables X(r) provided that the former is expressible along an orthonormal bases such as the SHs. We derive the analytic SH expansions of several wellknown stationary covariance functions or RBFs.
3.1 Analytic encodings
Stationary covariance functions can be expanded into the SH domain via the Legendre addition theorem:
where expansion weights b_{l}(g, ℓ) belong to some function g parameterized by ℓ. The Legendre polynomial P_{l}(cosΘ) of the angular distance 0≤Θ≤π is expanded into products of SHs evaluated at free coordinates (θ, ϕ) w.r.t. a center or fixed coordinate (θ_{∗}, ϕ_{∗}). The SH coefficients after substituting Eq. 9 into 2 are given by
where Ξ is the set of parameters and \( \boldsymbol {C}^{T}_{\Xi }\) the vector of coefficients \(C_{l}^{m} (\Xi)\). The function specific weights b_{l}(g, ℓ) must be realvalued and depend only on the order l as the conjugate SH property in Eq. 3 ensures that \( f(\theta, \phi, \, \Xi) \in \mathbb {R}\) iff \(\boldsymbol {C}^{T}_{\Xi } = \widetilde {\boldsymbol {C}}^{T}_{\Xi }\). The exact values of b_{l}(g, ℓ) can be computed via the sine expansion of the Legendre polynomials [16, 31] given by
where expansion coefficients g(n,ℓ) integrate the product of covariance function K and the Chebyshev polynomials cos(nΘ)=T_{n}(cosΘ).
We now consider members of the family of halfinteger order Matérn covariance functions [1] of chordal distances d(Θ) in Eq. 7 given by
where \(p \in \mathbb {N}^{+}\) and Γ() is the Gamma function. These RBFs are products of the Laplace kernel and polynomial functions which can be combined with the Chebyshev polynomials after a change of variables
where F(a,b,z) is the generalized hypergeometric function (see Appendix Eq. 48). The modified integral after substituting Eqs. 12 and 13 into g(n,ℓ) is given by
where \(dt = \cos {\frac {\Theta }{2}} \, d \Theta \). After symbolic integration [35] w.r.t. t for cases \(\nu = \left \{\frac {1}{2}, \frac {3}{2}, \frac {5}{2}, \infty \right \}\), the novel expansion coefficients are presented as follows:
Matérnν=1/2 (Exponential): Nondifferentiable function at Θ=0.
Matérnν=3/2: Oncedifferentiable product of linear polynomial and exponential functions.
Matérnν=5/2: Twicedifferentiable product of quadratic polynomial and exponential function.
Matérnν=∞ (squared exponential): Infinitely differentiable function everywhere.
where I_{n} is the nth order modified Bessel function of the first kind.
Design parameters: The bandwidth term ℓ which parameterizes the RBFs can be chosen to satisfy constraints such as the full width at half magnitude (FWHM) and the minimum. The FWHM for RBFs on the sphere gives twice the radians at which the function is half the maximum amplitude (\(K_{\nu }(\Theta, \ell) = \frac {1}{2}\)) and minimized at Θ=π as shown in Fig. 1. For Matérn covariance functions, the FWHM can be found for a given ℓ and conversely, ℓ for a desired target response:
where T_{dB} is the target decibel response at T_{Θ} radians (See Appendix Eqs. 50–55 for analytic expressions).
For specifying sharp peaks and discontinuities on a sphere, a sharp covariance function is desirable. Given Matérn functions of different classes \(K_{\nu _{1}}, K_{\nu _{2}}\) where ν_{1}<ν_{2} and identical FWHM, the less differentiable function \(K_{\nu _{1}}\) is bounded above for \(0 < \Theta < \frac {\textrm {FWHM}}{2}\) and below for \(\frac {\textrm {FWHM}}{2} < \Theta < \pi \) as shown in Fig. 1. A sharp peak function can therefore be parameterized with small ν,T_{dB} and T_{Θ} but requires higher max truncationorder terms P or SH bases to achieve the same reconstruction error as shown in Fig. 2. The root mean square error (RMSE) curve RMSE(P) is asymptotically steeper for smoother function classes (larger ν) and bounded below by functions with larger bandwidth ℓ within the same class. We now propose new techniques for fitting SH encoded RBFs to farfield directivity responses and composing magnitude functions on the sphere.
3.2 Mixture of encodings
A set of M complex target responses (X_{i}, θ_{i}, ϕ_{i}) can be modeled by a weighted summation of N distinct SH encoded functions f(θ, ϕ, Ξ_{j}) in Eq. 10 given by
where c is a scalar constant, and w_{j} are unknown weights. The system of equations is given by
where \(\boldsymbol {A} \in \mathbb {C}^{M \times N}, \boldsymbol {w} \in \mathbb {C}^{N \times 1}, \boldsymbol {b}, \, \boldsymbol {c} \in \mathbb {C}^{M \times 1}\) are the system matrix, vectorized unknowns, targets, and scalar constant respectively. The SH expansion of a constant over spherical coordinates is given by \(2 c \sqrt {\pi } \, Y_{0}^{0} (\theta, \phi)\). The solution vector w can be found by direct inversion of A when N=M and by leastsquares methods for overdetermined cases N<M or when constraints on w are specified.
For N=M and common RBF parameterizations Ξ_{i}={θ_{i}, ϕ_{i}, g, ℓ} centered over distinct target responses (X_{i}, θ_{i}, ϕ_{i}), MoE converges to RBF interpolation for valid kernel functions as the max truncationorder P increases; SH encodings converge to covariance functions at the limit \({\lim }_{P \to +\infty } f(\theta _{i}, \, \phi _{i}, \, \Xi _{j}) = K_{\nu }(\Theta _{{ij}}, \ell _{j})\). Moreover, SH encodings for arbitrary P are valid kernels as the Legendre polynomials in Eq. 9 are also valid kernels [16]. As a result, the original kernel functions can be initially fitted to a large number of samples using the original kernel functions before expanding into the SH bases. For example, native methods such as Gaussian processes [28] use the logmarginal likelihood to fit bandwidth ℓ and specify the posterior mean weights
computed from a common covariance function where Θ_{ij} is the central angle between (θ_{i},ϕ_{i}) and (θ_{j},ϕ_{j}). We refer to this schema as MoERBF transcoding given that a nonparametric RBF interpolation of N supports is encoded rather than fitted to (P+1)^{2} SH basis functions. The biasvariance tradeoff after transcoding is thus more sensitive to the choice of the kernel function and its smoothness and locality on the sphere rather than the number of SH bases (See Section 6.1 for experiment).
We now consider the case where nonnegative RBFs are fitted to magnitude only responses X≥0. At the limit where P→∞ and f(θ_{i}, ϕ_{i}, Ξ_{j})=K_{ν}(Θ_{ij},ℓ_{j}), the interpolant f(θ,ϕ) can be constrained to be nonnegative by finding the nonnegative leastsquares solution w to the realvalues of Eq. 21 given by
where c=0 and w≥0. For finite order SH encodings however, f(θ_{i}, ϕ_{i}, Ξ_{j}) and f(θ,ϕ) may ripple below 0 due to truncation error. If constant c>0 is raised, then the sign of \(\mathbb {R}[\mathbf {b}  \boldsymbol {c}]\) forces each encoding f(θ, ϕ, Ξ_{i}) to contribute as either a peak or a null (See example in Fig. 3a) and the latter may overshoot and induce negative values in f(θ,ϕ). One solution is to fit SH encodings of a kernel’s squared modulus from Eq. 6 with nonnegative weights w≥0 given by
where \(\bar {f}(\theta, \, \phi, \, \Xi _{j})\) is an SH encoding of a kernel and its squared modulus is strictly nonnegative and does not ripple below 0. We note that a kernel’s squared modulus is a valid kernel due to the general rule that the product of kernels is a kernel [28]. All modulus squared Matérn kernels are thereby suitable candidates for magnitude fitting with slight modifications to parameter selection.
Parameter optimization of ℓ depends on the Matérn class ν; exponential and squared exponential \(K_{\frac {1}{2}}(\Theta, \ell) = \left {K_{\frac {1}{2}}(\Theta, 2 \ell)}\right ^{2}, K_{\infty }(\Theta, \ell) = \left {K_{\infty }(\Theta, \ell \sqrt {2})}\right ^{2}\) are closed under the squared modulus and so bandwidth ℓ is simply scaled. For other halfinteger ν classes, the kernels remain products of exponential and polynomial functions but are not part of the Matérn family; bandwidth ℓ needs rederivation w.r.t. the objective function. For parameter design of ℓ, targeting half the desired dB response T_{dB}/2 at T_{Θ} in Eq. 19 is sufficient.
3.3 Product of encodings
The PoE model or the product of N SH encodings is useful for fitting to loworder target responses given by
where \(w_{j} \in \mathbb {C}\) are unknown weights and the product of SH encodings 1+f(θ, ϕ, Ξ_{j}) are closed under SH multiplication via Eq. 5; SH expansion of the unit constant is \(2 \sqrt {\pi } \, Y_{0}^{0} (\theta, \phi)\). PoEs can model sharper functions than MoEs (See example in Fig. 3b) as the derivative of products are multiplicative whereas the derivative of mixtures is additive. PoEs however do require substantially more bases than MoE as the product of N SH encodings of Pth max truncationorder spans (NP+1)^{2} bases. Data fitting is also not closedform due to nonlinear objective functions.
Consider the nonlinear leastsquares solution that minimizes the surrogate log function
where (X_{i}, θ_{i}, ϕ_{i}) are the M target responses. For gradientbased optimization, the Jacobian given by
can be used to update w.
In the case of magnitude target responses, a nonnegative PoE can be fitted by constraining the weights to be real and bounded w_{j}≥−1. For the case of N=M, the weights can be further constrained such that each encoding contributes a peak or null to f(θ,ϕ) analogous to the MoE formulation in Eq. 23:
where \(f(\theta, \phi, \Xi _{j}) = \frac {\bar {f}(\theta, \phi, \Xi _{j})}{\bar {f}(\theta _{j}, \phi _{j}, \Xi _{j})}\) normalize SH encodings \(\bar {f}(\theta, \phi, \Xi _{j})\) of kernels at their center of expansions (θ_{j},ϕ_{j}) to prevent overshooting. PoEs can therefore model loworder cascades of spatial peak and null magnitude functions. We now show how to extend MoE and PoE models into beamformer designs.
4 Conventional beamformer extensions
An N channel array can be oriented towards directional sources by parameterizing socalled steering vectors in terms of each channel’s directivity or anechoic farfield responses. A steering vector d(θ, ϕ) and the resulting beampattern B(θ,ϕ) can be parameterized along SH bases given by
where \( \boldsymbol {C}_{d} \in \mathbb {C}^{(P+1)^{2} \times N}\) is the column matrix of SH coefficients fitted to N sets of acoustic directivities, and \(\boldsymbol {w} \in \mathbb {C}^{N \times 1}\) are the unknown beamformer’s weights. The beamformer’s DI evaluates the power output gain at (θ_{∗},ϕ_{∗}) w.r.t. the power average over the spherical coordinates [19, 27]:
Maximizing DI is the basis of conventional beamformers such as MVDR which we show can be generalized for nonuniform spatial sampling or density functions in both numerator and denominator.
4.1 Spatial prior encodings
Across applications, target and noise sources may appear in varied locations. An array can be steered to boost or nullify a source if its exact locations are known. If their locations are uncertain, an informative prior can instead be specified. Let the uncertainty in sampling from a source direction and its associated steering vector be modeled as a random variable with probability density function f_{p}(θ, ϕ). We can specify a spatial density function on the unit sphere in terms of SHs by normalizing MoE or PoE functions from Eqs. 20 and 25 to satisfy nonnegative and unity integral constraints given as follows:
where the formulations given in Eqs. 23, 24, and 28 ensure that f_{p}(θ, ϕ) is both nonnegative and realvalued. The unity integral constraint is satisfied by normalizing the SH coefficients \(\boldsymbol {C}_{p} \in \mathbb {C}^{(P+1)^{2} \times 1}\) by the scaled 0th order coefficient \(2 \sqrt {\pi } \, C_{0}^{0}\) following the integration of SH functions property in [27].
The statistical moments of the random variable of steering vectors d(θ,ϕ) are computed from the product of its SH coefficients in Eq. 29 and that of the density function f_{p}(θ,ϕ) in Eq. 31. The weighted steering vectors are closed under SH multiplication via Eq. 5:
where \(\boldsymbol {Q} \in \mathbb {C}^{(2P + 1)^{2} \times N} \) are SH product coefficients. The steering vector’s mean μ_{d} and covariance Σ_{d} taken over the spherical coordinates are therefore analytic following the SH orthogonality property in [19, 27] and multiplicative closure property of Eq. 32:
where \(\boldsymbol {\bar {Q}}\) is truncated to the first (P+1)^{2} rows of Q.
A formulation that avoids truncating Q uses multiplicative associate property to compute the row i and column j elements of covariance \(\mathbb {E} \left [{\boldsymbol {d} \boldsymbol {d}^{H}}\right ]\) in Eq. 33 from SH products of pairwise channel responses (columns i,j of d(θ,ϕ) in Eq. 29) given by
Note that the density function f_{p}(θ,ϕ) must be encoded to have twice the max truncationorder \(\boldsymbol {C}_{p} \in \mathbb {C}^{(2P+1)^{2} \times 1}\) of the steering vectors. The density function is truncated back to maxorder P only when computing the mean μ_{d} in Eq. 33 for consistency with d(θ,ϕ). With the steering vectors’ first and second moment statistics defined, we now propose a novel probablistic beamformer formulation.
4.2 Steerable probabilistic beamforming
In farfield beamforming, let the target and noise source directions be modeled by random variables with density functions f_{A}(θ,ϕ),f_{R}(θ,ϕ) respectively satisfying Eq. 31. Define the probabilistic directivity index (PDI) as the power ratio of the beampattern from Eq. 29 integrated w.r.t. the target and noise density functions given by
PDI generalizes DI from Eq. 30 as the latter can be specified via dirac delta density function f_{A}(θ,ϕ)=δ_{θ,ϕ} for the target and uniform density function \(f_{R}(\theta, \phi) = \frac {1}{4\pi }\) for the noise. Maximizing the PDI is equivalent to finding a maximum SNR filter via the GRQ formulation in [19, 27] given by
where \(\boldsymbol {A}= \mathbb {E}_{f_{A}}\left [{\boldsymbol {d} \boldsymbol {d}^{H}}\right ], \, \boldsymbol {R} = \mathbb {E}_{f_{R}}\left [{\boldsymbol {d} \boldsymbol {d}^{H}}\right ] \in \mathbb {C}^{N \times N}\) are the noncentered acceptance and rejection covariance matrices respectively. Note that the distortionless constraint for some steering vector d(θ,ϕ) is optional but useful for normalizing the beamformer response at a look direction. The solution vector w are the beamformer weights and can be found via the generalized eigenvalue decomposition [33].
GRQ beamformers generalize instances of MVDR just as DI is a special case of PDI. If the acceptance covariance matrix is constructed from a single point source A=d(θ,ϕ)d^{H}(θ,ϕ) and the rejection covariance matrix computed over noise sources distributed (uniform for DI definition in Eq. 30) over spherical coordinates, then the solution to Eq. 36 is the MVDR beamformer given by
where the look direction can be steered by evaluating the SH bases at (θ,ϕ) spherical coordinates. In the more general case, the SH encoded density functions f_{A}(θ,ϕ), f_{R}(θ,ϕ) can be steered under the SO(3) SH rotation operator [17]. We now turn to a related fitting problem that replaces SH encoded density functions with target SH encoded magnitude response functions.
5 Beampattern fitting formulations
Let an N channel beamformer and target beampattern functions along SH bases be given by
where B(θ,ϕ) is the complex beamformer response, \(\boldsymbol {C}_{A} \in \mathbb {C}^{(P+1)^{2} \times N}\) the channel’s SH fitted farfield response weights, \(\boldsymbol {f}_{A}(\theta, \phi) \in \mathbb {C}^{N \times 1}\) the channel farfield responses, \(\boldsymbol {w} \in \mathbb {C}^{N \times 1}\) the unknown weights. The target magnitude beamformer response f_{B}(θ,ϕ) is a nonnegative real function with constant phase specified via MoE or PoE models in Eqs. 23, 24, and 28 along the SH bases. Beampattern fitting thus solves for weights w s.t. B(θ,ϕ)≈f_{B}(θ,ϕ) in terms of SH coefficients C_{A},C_{B}.
5.1 Direct least squares
The direct LS fit of weights w in B(θ,ϕ) to target f_{B}(θ,ϕ) in Eq. 38 minimizes the squared modulus of the complex residuals between target and response beampatterns. Both magnitude and phase responses are fitted in the LS objective function given by
where by SH orthogonality, the residuals are transformed into SH coefficients. The minimizer is therefore the ordinary least squares (OLS) solution to C_{A}w=C_{B} given by
OLS however overconstrains the residuals as the beamformer’s phase responses are unnecessarily fitted to a constant (0 radians), causing the minimizer to underfit the target magnitude responses. This is addressed in the socalled discrete MLS formulation.
5.2 Magnitude least squares
The magnitude response can be fitted by minimizing the squared error between the modulus beampattern B(θ,ϕ) and target f_{B}(θ,ϕ) integrated over the spherical coordinates. The integrated MLS fitting can be approximated by the discrete MLS formulation using quadrature or sampling the beampattern over a dense uniform spaced spherical coordinate grid of M directions given by
The solution w for discrete MLS can be found via the localvariable exchange (LVE) method [20] given by
which introduces a vector of complex unit variables \(\boldsymbol {z} \in \mathbb {C}^{M \times 1}\) and converts Eq. 41 into a LS problem. The solution alternates between phase matching the magnitude target f_{B}(θ_{i},ϕ_{i}) to the corresponding beamformer response f_{Aj}(θ_{i},ϕ_{i}) with the freevariable z_{i} and finding the OLS solution \(\boldsymbol {\hat {w}}\) until convergence. An analogous solution for the integrated MLS formulation is more difficult as z is replaced with the complex unit function \(\phantom {\dot {i}\!}e^{j f_{z}(\theta, \phi) }\) s.t. an unwrapped phase function f_{z}(θ,ϕ)=Y^{T}(θ,ϕ)C_{z} is constrained to be realvalued following Eq. 3 but may also be discontinuous (e.g., spatial nulls may invert the phase). Instead, we introduce a novel MSLS formulation which we extend along SHs to show that it does not require discretization over the sphere to optimize.
5.3 Magnitude squared least squares
The magnitude response can be fitted by minimizing the squared error between the squared modulus beampattern B(θ,ϕ)^{2} and target \(f_{B}^{2}(\theta, \phi)\). The MSLS objective function is given by
where ⊗ is the Kronecker product, \(\boldsymbol {G}_{B} \in \mathbb {C}^{(2P + 1)^{2} \times 1}\) the SH coefficients of the squared magnitude target beampattern, \(\boldsymbol {G}_{A}(i, j) \in \mathbb {C}^{(2P + 1)^{2} \times 1}\) the pairwise SH products of channel i and j’s farfield SH coefficients \(\boldsymbol {C}_{A_{i}}\) and \(\boldsymbol {C}_{A_{j}}\) respectively, and \(\boldsymbol {G}_{A} \in \mathbb {C}^{(2P + 1)^{2} \times N^{2}}\) the column matrix of the pairwise products (column k=N(i−1)+j+1 is G_{A}(i,j)). The derivation follows from expanding the squared magnitude of the beamformer response into weighted SH products via the distributive property of product operator in Eqs. 5, 6 and then rearranging the terms into a matrix Kroneckervector product given by
The MSLS objective function in Eq. 43 is quartic w.r.t. w and the solution can be found via iterative methods such as gradient descent and GaussNewton (see Appendix Eq. 49 for derivatives). Note that the quartic objective function causes the fitted squared magnitude response to overweight larger values resulting in overfitted peaks and underfitted nulls w.r.t. the target beampattern. We now turn to experimental studies to validate our methods in the remainder of the work.
6 Experiments
6.1 Acoustic directivity fitting
In microphone and speaker prototyping, it is necessary to characterize the impact of the physical array’s anechoic farfield acoustic directivity. Fitting the SH bases to acoustic directivity transforms the responses into a spatially continuous and spatially bandlimited representation for subsequent beamform design. We present a case study comparing several SH fitting methods and their biasvariance tradeoff to the anechoic farfield responses of a custom downfiring midrange speaker on a table. In simulation [8], a pressure source at the speaker cone center generates a response that we uniformly sample at R=1 meter radius over a grid of 7.5 degree azimuth and 2 degree elevation increments. Let the dataset be specified by \(\boldsymbol {\mathcal {D}} = \left \{\boldsymbol {X}, \, \boldsymbol {\theta }, \, \boldsymbol {\phi } \right \}\) where \(\boldsymbol {X} \in \mathbb {C}^{N \times 1}\) is the vector of N complex pressure samples at frequency with wavelength smaller than the simulation distance, and \(\boldsymbol {\theta }, \, \boldsymbol {\phi } \in \mathbb {R}^{N \times 1}\) vectors of the sample azimuth and colatitude in radians respectively.
Let a reference function \(\bar {f}(\theta, \phi) = \boldsymbol {C}_{\bar {f}}^{T} \boldsymbol {Y}(\theta, \phi)\) be the SH bases fitted upto maxorder P=30 to dataset \(\boldsymbol {\mathcal {D}}\) via the truncated singular value decomposition (TSVD) [14] as follows:
where matrix \(\boldsymbol {A} \in \mathbb {C}^{N \times (P+1)^{2}}\) is the rowmatrix of SH bases evaluated at θ, ϕ spherical coordinates. The SVD of A is given by left and right singular vectors \(\boldsymbol {U} \in \mathbb {C}^{N \times (P+1)^{2}}, \boldsymbol {V} \in \mathbb {C}^{(P+1)^{2} \times (P+1)^{2}}\) respectively and the singular values \(\boldsymbol {S} \in \mathbb {R}^{(P+1)^{2} \times (P+1)^{2}}\). The inverse matrix \(\widetilde {\boldsymbol {S}}^{1} \in \mathbb {R}^{(P+1)^{2} \times (P+1)^{2}}\) is the diagonal of reciprocal singular values greater than α=0.01 fraction of the largest singular value. The fitted SH weights \(\boldsymbol {C}_{\bar {f}} \in \mathbb {C}^{(P+1)^{2} \times 1}\) follows from inverting the truncated singular values and vectors.
Candidate SH fitting methods are compared to the reference fit under 10 randomized trials of 4fold crossvalidation (40 permutations of pairs of training and test sets). A single trial randomizes the sample indices of dataset \(\boldsymbol {\mathcal {D}}\) before partitioning the samples into 4 mutually exclusive training sets of size \(\frac {N}{4}\); test sets complement each of the training sets as shown in Fig. 4a. The MoERBF transcoding method in Eq. 22 with Matérn kernel functions \(K_{\left \{\frac {1}{2}, \frac {3}{2}, \frac {5}{2}, \infty \right \}}\) in Eq. 12 as well as the sinc kernel in Eq. 8 are then fitted over each training set before inference over the test sets. RBF fitting under a Gaussian process model maximizes the logmarginal likelihood quantity of the kernel function hyperparameters ℓ (or wavenumber k for sinc kernel) under noisy observation assumptions (A is the equivalent gram matrix with noise variance σ=10^{−6} as diagonal loading) [28]. After convergence, the least squares solution w to Eq. 21 are found and the MoE function f(θ,ϕ) in Eq. 20 are expanded into C_{f} weighted SH bases via Eq. 10.
The set of candidate fittings for one of the crossvalidation trials are visualized in Fig. 4. The TSVD candidate of lower max truncationorder (P=6) SH bases underfits the training set resulting in overly smooth magnitude and phase responses shown in Fig. 4b. Increasing the fit’s maxorder SH bases however does not improve generalization as shown by the larger errors in Table 1. Instead, fitting by nonparametric RBF expansions at each of the training samples outperforms parametric fitting of SH bases due to the former’s biasvariance tradeoff. Among the set of RBF kernels to choose from, the sharp but longtailed function \(K_{\frac {1}{2}}\) had superior generalization as shown in Fig. 4d. Conversely smooth but shorttailed functions such as K_{∞} overfits to the training set when the samples are sparse and the underlying function rough as shown in Fig. 4f. This results in poor fits near discontinuities in the farfield response such as nulls; decreasing the kernel’s bandwidth ℓ narrows the function for fitting to local features at the cost of poor generalization at more distant samples. Choosing an unsuitable kernel such as the sinc function when the soundfield is not diffuse also results in poor generalization as the acoustic targets’ phase are regular but the sinc kernel has multiple zerocrossings that cause discontinuities.
The generalization errors of the crossvalidation trials are shown in Table 1 via two metrics. The normalized root mean square error (NRMSE) as defined in Appendix Eq. 56 averages the squared error between the test samples and the model inference f(θ,ϕ)≈Y^{T}(θ,ϕ)C_{f} evaluated at the sample’s spherical coordinates; normalization constant is the standard deviation of the dataset. The normalized root mean squared integrated error (NRMSIE) as defined in Appendix Eq. 57 averages the squared error between the candidate and reference function \(\bar {f}(\theta, \phi)\) over the continuous spherical coordinates; normalization constant is the standard deviation of the reference function. The NRMSE metric is restricted to only test samples whereas NRMSIE incorporates both training and test sample by computing the total distance between any two functions on the sphere. For the TSVD modelorder candidates, both metrics are minimized when the model order is empirically found to be P=6; further increasing the number of basis functions results in severe overfitting as seen in the case of P=12. For the MoERBF kernel candidates, \(K_{\left \{\frac {1}{2}, \frac {3}{2}, \frac {5}{2}\right \}}\) had the best performances with low mean errors within a standard deviation of each other; K_{∞} however is both too smooth and local resulting in larger error mean and variance across trials.
6.2 Beamformer crossdensity priors
MVDR and GRQ beamformers designed with spatial priors for target and noise sources may be mismatched in realworld scenarios. For example, density functions f_{A}(θ,ϕ),f_{R}(θ,ϕ) may be updated a posteriori with target and noise source location data collected from camera data. Spatial priors that depend on the array’s location in the environment such as the center, side, corner of a room may be separately defined. We show how PDI in Eq. 35 may be used to evaluate both the expected beamformer performance when the density functions matches the design priors and the performance degradation when they mismatch.
Consider the combinations of target and noise source types with spatial priors specified in Table 2. A stationary point source has a direction sampled from a Dirac delta density function. Diffuse and reverberant noise has equal energy in all directions and thus sampled from a uniform density function. Moving sources and directional noise have spatial priors that contain modes for varying blur and spread due to jitter motion and early wall reflections respectively. Beamformers designed with spatial priors suited to their application are shown to outperform those that use uninformative or inaccurate priors.
One such application is speaker crosstalk cancelation (XTC) where a beamformer jointly maximizes the response at one receiver while minimizing the responses at all other receivers. If the receiver locations are nonstationary in the case of a moving head with two ears, then a speaker beamformer must avoid overfitting to a narrow region. Instead, we treat the ipsilateral and contralateral ears w.r.t. left and right inputs as target and noise sources respectively with locations sampled from a wide density function. The former has the acceptance XTC PDF f_{A}(θ,ϕ)=Y^{T}(θ,ϕ)C_{A} given by the normalized PoEs of two kernel functions K_{∞} fitted to target responses X={12_{dB}, −12_{dB}} at their center of expansions. The latter has the rejection XTC* PDF \(f_{R}(\theta, \phi) = \boldsymbol {Y}^{T}(\theta, \phi) \boldsymbol {C}_{A}^{*}\) which simply reflects the acceptance PDF over the median plane due to symmetry of the ears by conjugating the SH weights. The PoE design parameters are presented as follows:
The acceptance PDF shown in Fig. 4g oversamples the elevated ipsilateral quartersphere region and undersampling near the contralaternal ear. The rejection PDF shown in Fig. 4j oversamples the elevated contralateral quartersphere region and undersampling near the ipsilateral ear.
Farfield anechoic responses belonging to an N=4 channel planar array are simulated at 1 meter distances and then fitted to P=12 maxorder SH functions. Farfield MVDR and GRQ beamformers are then optimized for PDIs with Dirac, uniform, XTC and XTC* density functions specified in Table 3 and their beampatterns shown in Fig. 5. The cases are indexed by the PDI’s accept and reject PDFs (f_{A},f_{R}) and their optimal beamformers specified:

(Dirac, Uniform): MVDR maximizes the response at one ipsilateral ear location (\(\theta = \frac {\pi }{3}, \phi = \frac {\pi }{12}\)), minimizes overall leakage.

(Dirac, XTC*): MVDR maximizes the response at one ipsilateral ear location (\(\theta = \frac {\pi }{3}, \phi = \frac {\pi }{12}\)), minimizes leakage near contralateral ears.

(XTC, Uniform): GRQ maximizes the responses near ipsilateral ears, minimizes overall leakage.

(XTC, XTC*): GRQ maximizes the responses and minimizes the leakage near ipsilateral and contralateral ears respectively.
Beamforming PDIs are largest when f_{A} is the Dirac function; conventional two channel XTC designs set f_{R} to the Dirac function which places a null at the secondary receiver location to further increase the array cancelation gain but may mismatch if the receiver moves. Setting f_{R} to the uniform function minimizes the leakage in all directions whereas the XTC* density presents a compromise. For crossdensity PDIs, beamforming performance maximally degrades when both f_{A},f_{R} mismatch in the case of (Dirac, Uniform) and (XTC, XTC*). Beamformer GRQ (XTC, XTC*) is 6 dB lower than MVDR (Dirac, Uniform) if receiver locations align with PDI (Dirac, Uniform) conditions. GRQ (XTC, XTC*) is more than 2.5 dB higher than MVDR (Dirac, Uniform) for more realistic PDI (XTC, XTC*) conditions.
We can also evaluate conventional XTC designs under PDI priors. For example, the constant parameter regularized XTC^{Footnote 1} [7] can be generalized for multiple channels in terms of Eq. 35 by setting the accept and reject densities to the dirac delta functions at the accept and reject center angles (θ_{1},ϕ_{1})∈Ξ_{1} and (θ_{2},ϕ_{2})∈Ξ_{2} in Eq. 46 respectively and adding a diagonal loading parameter λ to the rejection covariance matrix:
GRQ beamformers that maximize the PDI under Eq. 47 for varying λ are then evaluated w.r.t. the crossdensity priors in Table 3 and their expected cancelation gains shown in Fig. 6. The XTC gain at the precise target (θ_{1},ϕ_{1}) and cancelation (θ_{2},ϕ_{2}) angles grows to infinity as λ→0. In practice, we prefer to choose the λ that maximizes a PDI with suitable priors. For example, λ=6.5e−7 maximizes the PDI with (XTC, XTC*) priors at 3 dB expected cancelation gain and sharply declines for larger λ. If the accept region is restricted to the exact target angle in the case of the (Dirac, XTC*) prior, then λ=1.5e−6 maximizes the PDI.
6.3 Beampattern fitting
In applications that require an array to archive a specific beampattern, conventional beamformers such as MVDR and GRQ are illsuited. The former penalizes a beampattern’s mismatch w.r.t. a target whereas the latter maximizes the expected beampattern’s array gain between two spatial regions. An idealized target beampattern may be specified in some applications such as stereo crosstalk cancelation that places a peak in one region, a null at another, and has sufficient leakage control or sidelobe attenuation everywhere else (see Fig. 3). Given an acoustic simulation of a physical array, we evaluate how close a beamformer can fit to a beampattern.
Farfield anechoic responses belonging to an N=8 channel array on both sides of a wedge (2×4 rectangle array) are simulated at 1 meter distances and then fitted to max truncationorder P=14 SH functions. Two target beampatterns are specified along SH bases of the same max truncationorder using the MoE and PoE models as shown in Fig. 3. The PoE target response has a more directive peak but smoother null than the MoE target. We fit separate beamformer weights to the target beampatterns by minimizing the LS, MLS, and MSLS loss functions from Eqs. 39, 41, and 43 via the OLS, LVE, and the GaussNewton methods, respectively. Last, we compare beampatterns by evaluating them across loss functions.
The fitting errors for each set of beamformer weights are shown in Table 4. Rows represent the set of loss function evaluated over beampatterns generated from the optimization method specified in the column. Diagonal entries correspond to the optimizer’s fitting errors and are the minima along each row. Offdiagonal entries correspond to the row’s loss function evaluated on beampatterns produced by other fitting methods where we expect larger errors. Similarity scores in parentheses normalize the errors on each row by dividing by the diagonal entry. Beampatterns generated by MLS and MSLS methods induce large errors and low similarity scores w.r.t. the LS beampattern due to having variable phase responses; MLS and MSLS beampatterns are not unique under arbitrary phase shifts of the beamformer weights. Conversely, LS beampatterns are fitted to both magnitude and phase responses, thereby underfitting the magnitude and inducing larger MLS and MSLS errors. Last, MLS and MSLS produce beampatterns that are more closely matched in magnitude than their phase; MLS and MSLS generated beampatterns have higher similarity scores than that of LS beampatterns.
The beampatterns visualized in Fig. 7 illustrate the tradeoffs made by minimizing each loss function. LS fitted beampatterns that are constrained to match the complex target function overfits to the 0 radian phase component at the expense of the magnitude component. MLS fitted beampatterns more closely match the target magnitude responses as the phase responses can vary freely. MSLS beampatterns overfit to regions where the target magnitude response are large due to squaring the magnitude before finding the least squares solution. For the MoE target response, all the beampatterns ignored the sharp null, fitting instead to the wide main lobe. For the PoE target response, the beampatterns fitted both the sharp main lobe and the wide null.
7 Results and discussion
We derived the SH expansion of Matérn covariance functions of chordal distance over a unit sphere in terms of generalized hypergeometric functions. Simulated anechoic farfield acoustic responses are fitted to RBFs and then transcoded into SH bases, resulting in large data compression. Crossvalidation experiments show that the MoERBF models have superior biasvariance tradeoff compared to direct SH fitting via the TSVD method. Choosing a covariance function with smoothness and locality characteristics that match the spatial acoustic responses improve generalization along unseen areas on the sphere.
Spatial density functions and beampattern targets are then designed using MoE and PoE methods. This enabled conventional beamforming to be reformulated in terms of PDI and probablistic steering vectors sampled from the SH encoded farfield response and spatial density functions respectively. Experiments show how PDI incorporates uncertainty of ear locations in crosstalk cancelation designs and provide a means to evaluate cancelation performance when the priors change. PDI also provides a useful criteria for controlling the amount of diagonal loading in regularized crosstalk cancelation designs.
Beampattern fitting methods such as LS, MLS, and MSLS are at last reformulated along SH bases. For integrated MLS, it remains an open question whether realvalued phase functions can be fitted along SHs due to spatial phase wrapping and discontinuities. For MSLS, we give closedform gradients and derivatives along the SH bases. Several metrics for comparing beampattern similarity and errors are presented. Experiments demonstrate how effective each method fits a target crosstalk cancelation beampattern from a simulated acoustic array. LS method underfits the magnitude beampattern and should only be used if a flat phase response over the sphere is desirable. MLS equally fits to both peaks and nulls whereas MSLS overfits to peaks in the target beampattern.
8 Conclusions
Datadependent farfield beamforming requires an efficient representation for synthesizing simulated acoustic data, spatial density priors, and beampattern designs. The SH basis functions are one such representation where data fitting, kernel function expansions, and magnitude function designs can be encoded or transcoded from other methods commonly used in practice. This allows conventional beamforming and beampattern fitting methods to be reformulated in terms of SH bases yielding solutions that are both closedform and parametric in the spherical coordinates. As a result, we efficiently fitted and stored the acoustic simulations of an array in a continuous and generalizable format. We evaluated beamforming performance under varying use cases via spatial probabilistic priors and beampattern targets. Last, new device geometries, microphone / speaker topologies, and array designs are iterated upon and compared in rapid prototyping offline.
9 Appendix
The generalized hypergeometric function is expressed in terms of the Porchhammer symbol given by
where Γ() is the gamma function.
The MSLS residual, Jacobian, Gradient, and GaussNewton Hessian are given by
9.1 RBF design parameters
The exponential and squared exponential RBFs in Eqs. 15, 18 have FWHM equal to twice the solution of \(e^{\frac {d^{p}(\Theta) }{p \ell ^{p}}} = \frac {1}{2}\) where \(d(\Theta) = \pm \ell (p \ln 2)^{\frac {1}{p}}\). The FWHM in terms of Θ radians is given by
The design parameter ℓ for a target response pair T_{dB} decibels at T_{Θ} radians is given by
For Matérn ν=3/2, the FWHM and design parameter ℓ are found by reducing the RBF in Eq. 16 into the Lambert W function given by
where W_{−1}(z) branch is used to ensure \(\frac {d}{\ell } \geq 0\).
For Matérn ν=5/2, the FWHM and design parameter ℓ are found using either bisection or Newton’s method as the RBF in Eq. 17 has a negative first derivative for all positive w given by
9.2 Error metrics
The NRMSE is computed over N_{∗} test samples \( \tilde {X}\) and scaled by the full samples X’s standard deviation σ:
The NRMSIE between function f(θ,ϕ) and target function \(\bar {f}(\theta, \phi)\) over the spherical coordinates reduces to the squared SH residual error and normalized by the target function’s standard deviation:
Availability of data and materials
The datasets generated and/or analyzed in the current study are not publicly available as they are synthesized from proprietary designs of Amazon.com Inc.
Notes
The Tikhonov regularized solution in [7] give identical PDI and XTC gain in Fig. 6.
Abbreviations
 BP:

Beampattern
 DI:

Directivity index
 FWHM:

Full width at half magnitude
 GRQ:

Generalized Rayleigh quotient
 LS:

Least squares
 LVE:

Localvariable exchange
 MoE:

Mixture of encodings
 MLS:

Magnitude least squares
 MSLS:

Magnitude squared least squares
 MVDR:

Minimum variance distortionless response
 NRMSE:

Normalized root mean square error
 NRMSIE:

Normalized root mean square integrated error
 OLS:

Ordinary least squares
 PDF:

Probability density function
 PDI:

Probabilistic directivity index
 PoE:

Product of encodings
 RBF:

Radial basis function
 RMSE:

Root mean square error
 SH:

Spherical harmonic
 SNR:

Signaltonoise ratio
 TSVD:

Truncated singular value decomposition
 XTC:

Crosstalk cancelation
References
M. Abramowitz, I. A. Stegun (eds.), Handbook of mathematical functions with formulas, graphs, and mathematical tables. Vol. 55 (US Government printing office, 1972).
J. Ahrens, M. R. P. Thomas, I. Tashev, in Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Hollywood, California, USA. HRTF magnitude modeling using a nonregularized leastsquares fit of spherical harmonics coefficients on incomplete data, (2012).
J. Ahrens, S. Bilbao, Computation of spherical harmonic representations of source directivity based on the finitedistance signature. IEEE/ACM Trans. Audio Speech Lang. Process.29:, 83–92 (2020).
G. Arfken, “Spherical Harmonics” and “Integrals of the Products of Three Spherical Harmonics.” §12.6 and 12.9 in Mathematical Methods for Physicists, 3rd ed (Academic Press, Orlando, 1985).
A. Chhetri, M. Mansour, W. Kim, G. Pan, in 2019 27th European Signal Processing Conference (EUSIPCO). On acoustic modeling for broadband beamforming (IEEE, 2019), pp. 1–5.
A. Chhetri, et al., in 26th European Signal Processing Conference (EUSIPCO). Multichannel audio frontend for farfield automatic speech recognition (IEEE, 2018).
E. Y. Choueiri, Optimal crosstalk cancellation for binaural audio with two loudspeakers. Vol. 28 (Princeton University, 2008).
COMSOL MultiphysicsⓇ v. 5.6, www.comsol.com (COMSOL AB, Stockholm. https://www.comsol.com/support/knowledgebase/1223.
R. Duraiswami, et al., in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. Planewave decomposition analysis for spherical microphone arrays, (2005).
A. R. Edmonds, Angular Momentum in Quantum Mechanics (Princeton University Press, Princeton, 1957).
M. J. Evans, J. A. S. Angus, A. I. Tew, Analyzing headrelated transfer function measurements using surface spherical harmonics. J. Acoust. Soc. Am.104(4), 2400–2411 (1998).
T. Gneiting, Strictly and nonstrictly positive definite functions on spheres. Bernoulli. 19(4), 1327–1349 (2013).
J. Guinness, M. Fuentes, Isotropic covariance functions on spheres: Some properties and modeling considerations. J. Multivar. Anal.143:, 143–152 (2016).
P. C. Hansen, The truncated SVD as a method for regularization. BIT Numer. Math.27(4), 534–553 (1987).
R. A. Horn, C. R. Johnson, Matrix Analysis (2nd ed.) (Cambridge University Press, Cambridge, 2012).
C. Huang, H. Zhang, S. M. Robeson, On the validity of commonly used covariance and variogram functions on the sphere. Math. Geosci.43(6), 721–733 (2011).
J. Ivanic, K. Ruedenberg, Rotation matrices for real spherical harmonics. Direct determination by recursion. J. Phys. Chem.100(15), 6342–6347 (1996).
D. P. Jarrett, E. A. P. Habets, P. A. Naylor, in 2010 18th European Signal Processing Conference. 3D source localization in the spherical harmonic domain using a pseudointensity vector, (2010), pp. 442–446.
D. P. Jarrett, E. A. P. Habets, P. A. Naylor, Theory and applications of spherical microphone array processing, vol. 9 (Springer, New York, 2017).
P. W. Kassakian, Convex approximation and optimization with applications in magnitude filter design and radiation pattern synthesis. Diss (University of California, Berkeley, 2006).
Z. Li, R. Duraiswami, Flexible and optimal design of spherical microphone arrays for beamforming. IEEE Trans. Audio Speech Lang. Process.15(2), 702–714 (2007).
Y. Luo, W. Kim, in 2020 28th European Signal Processing Conference (EUSIPCO). Fast SourceRoomReceiver Acoustics Modeling, (2021), pp. 51–55.
J. Mercer, Xvi. functions of positive and negative type, and their connection the theory of integral equations. Philosophical transactions of the royal society of London. Ser. A Containing Pap. Math. Phys. Character. 209(441458), 415–446 (1909).
C. Müller, Spherical harmonics. Vol. 17 (Springer, 2006).
M. Park, B. Rafaely, Soundfield analysis by planewave decomposition using spherical microphone array. J. Acoust. Soc. Am.118(5), 3094–3103 (2005).
A. M. Pasquala, J. R. Arrudaa, P. Herzogb, Optimal array pattern synthesis with desired magnitude response. J. Acoust. Soc. Am.123(5), 3653–3653 (2008).
B. Rafaely, Fundamentals of spherical array processing, Vol. 16 (Springer International Publishing, 2018).
C. E. Rasmussen, C. K. I. Williams, Gaussian processes for machine learning (MIT Press, Cambridge, 2005).
P. N. Samarasinghe, et al., Spherical harmonics based generalized image source method for simulating room acoustics. J. Acoust. Soc. Am.144(3), 1381–1391 (2018).
H. Saruwatari, et al., Blind source separation combining independent component analysis and beamforming. EURASIP J. Adv. Signal Process.11(2003), 1–12 (2003).
I. Schoenberg, Positive definite functions on spheres. Duke Math. J.1:, 172 (1988).
M. L. Stein, Interpolation of Spatial Data: Some Theory for Kriging (Springer Series in Statistics, New York, 1999).
V. Loan, F. Charles, Generalizing the singular value decomposition. SIAM J. Numer. Anal.13(1), 76–83 (1976).
D. B. Ward, G. W. Elko, in Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP ’98 (Cat. No.98CH36181), 6. Optimum loudspeaker spacing for robust crosstalk cancellation, (1998), pp. 3541–3544.
Wolfram Research, Inc., Mathematica, Version 12.3.1 (Wolfram Research, Inc., Champaign, 2021). https://www.wolfram.com/mathematica.
D. N. Zotkin, R. Duraiswami, N. A. Gumerov, in 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. Regularized HRTF fitting using spherical harmonics, (2009), pp. 257–260.
F. Zotter, et al., A beamformer to play with wall reflections: The icosahedral loudspeaker. Comput. Music J.41(3), 50–68 (2017).
Acknowledgements
We thank the audio technology team at Amazon for providing acoustic simulation data (Jim Sun, Sanjay Yengul) and management (Wontak Kim).
Funding
The work is funded by Amazon.com Inc.
Author information
Authors and Affiliations
Contributions
Authors’ contributions
Yuancheng Luo is the sole author of the work. The author read and approved the final manuscript.
Authors’ information
Yuancheng Luo received his PhD. from the Department of Computer Science at the University of Maryland, College Park in 2014. He is currently a research scientist in the audio hardwaretechnology team at Amazon Inc.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Luo, Y. Spherical harmonic covariance and magnitude function encodings for beamformer design. J AUDIO SPEECH MUSIC PROC. 2021, 41 (2021). https://doi.org/10.1186/s13636021002307
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13636021002307