Spherical harmonic covariance and magnitude function encodings for beamformer design

Microphone and speaker array designs have increasingly diverged from simple topologies due to diversity of physical host geometries and use cases. Effective beamformer design must now account for variation in the array’s acoustic radiation pattern, spatial distribution of target and noise sources, and intended beampattern directivity. Relevant tasks such as representing complex pressure fields, specifying spatial priors, and composing beampatterns can be efficiently synthesized using spherical harmonic (SH) basis functions. This paper extends the expansion of common stationary covariance functions onto the SHs and proposes models for encoding magnitude functions on a sphere. Conventional beamformer designs are reformulated in terms of magnitude density functions and beampatterns along SH bases. Applications to speaker far-field response fitting, cross-talk cancelation design, and microphone beampattern fitting are presented.


Introduction
Acoustic array designs have undergone many iterations of improvement as physical simulation data becomes more accurate and abundant. Rapid microphone and speaker array prototyping efforts have moved into simulation to work in tandem with both industrial and product design. Beamforming sits at the intersection where upstream device geometry and hardware constrain the acoustics and performance of the array, impacting the audio application qualities downstream.
Microphone beamforming improves the signal-to-noise ratio of automatic speech-recognition and wake-word detection [6], estimate source direction-of-arrival [18], and separate acoustic-objects [30]. Speaker beamforming generalizes cross-talk cancelation [34] and can widen acoustic-images by exciting wall reflections [37]. Far-field beamformer performance improves when steering vectors Correspondence: luoyuancheng@gmail.com; mikeluo@amazon.com Audio Hardware-Technology, Amazon Inc., Cambridge, MA, USA incorporate the local geometry in their free-field acoustic responses [5]. Commercial software such as COM-SOL [8] can readily import device geometries in acoustic simulations. Therefore, a common basis representation for beamformer design synthesizes different requirements and simulations across applications.
The complex spherical harmonic (SH) functions [19,24,27] form one such basis that have found varied uses for data fitting, analysis, and synthesis. Head-related transfer functions can be efficiently encoded along the SH bases for binaural reproduction [2,11,36]. Plane-waves decomposition and beamforming on spherical microphone arrays are directly formulated along the SH bases [9,21,25,27]. For non-spherical arrays, speaker and microphone far-field directivity are decomposed along the SHs from acoustic responses at large finite distances [3] and integrated with room impulse response models [22,29]. Acoustic directivity can also be fitted to magnitude beampatterns along the SH bases [20,26].
Magnitude function synthesis on a sphere is less wellknown in beamforming but appears in interpolation and Bayesian methods such as Kriging [32] and Gaussian processes [28]. We show how these techniques that specify covariance or kernel functions can relate acoustic directivity with spherical probablistic density functions to better design, fit, and evaluate beampatterns. This improves upon several prototyping stages which we list as the following contributions: 1 We fit non-parametric models to acoustic directivity and show that their basis functions can be expanded along SHs to give superior bias-variance trade-offs than spatial cross-correlation models [19] and direct SH fittings [14]. 2 We derive the analytic SH expansions of several Matérn covariance functions of chordal distances which in previous works have only been shown to be valid on the sphere [12,13,16,31]. Expansion errors under shape-constrained design parameters quickly converges to 0 for increasing max truncation-orders and ensuring model tractability. 3 We generalize the acoustic directivity index (DI) [19,27] to support non-uniform spatial weighting or probability density functions (PDFs) in terms of SHs. This reduces bias of previous discrete sampling schemes that must draw from finite measurements and integrates over target and noise sources' directional priors indicative of real-world performance compared to flat priors. 4 We propose a mixture and product of SH encodings model for synthesizing beampattern targets and accelerate beampattern fitting optimization [20] by analytic derivatives and integration over SH encoded acoustic directivity. 5 We present case studies for fitting transcoded SHs to simulated speaker responses' anechoic acoustic directivity, evaluating expected cross-talk cancelation performances under varying spatial priors, and ranking optimal cross-talk cancelation beampatterns w.r.t. loss functions.
The paper contents are listed as follows: Section 2 introduces the SH notation and derive several useful operators along the SH bases. Section 3 presents novel encodings of Matérn covariance function expansions of chordal distances along the SH bases and proposes the mixture of encodings (MoE) and product of encodings (PoE) models for data fitting and magnitude function synthesis. Section 4 introduces a new probabilistic treatment of DI and then reformulates conventional minimum variance distortionless response (MVDR) and generalized Rayleigh quotient (GRQ) beamformers in terms of SH encoded spatial priors. Section 5 extends beampattern fitting methods such as least squares (LS), magnitude least squares (MLS), and magnitude squared least squares (MSLS) into the SH bases. Section 6 establishes experiments on anechoic far-field directivity fitting by radial basis function (RBF) transcoding, performance impact of spatial priors on cross-talk cancelation, and trade-offs between beampattern fitting methods. Sections 7 and 8 discuss the findings and conclude the work.

Background and notation
The spherical harmonics are orthogonal basis functions over spherical coordinates [19]: where for notation, we denote l, m as the order and degree, respectively, (θ, φ) the physics convention spherical coordinates for colatitude and azimuth respectively, and P m l (cos θ) the associated Legendre polynomials. A function f (θ, φ) defined over spherical coordinates can be expanded along the SH bases: where C m l ∈ C are the expansion coefficients for basis functions of order l and degree m, and P the maximum order or the number of truncation terms in the SH expansion. The number degrees m for order l totals 2l + 1 and number of basis functions upto max truncation-order P is (P + 1) 2 . Therefore, a vector of coefficients C ∈ C (P+1) 2 ×1 and SH function evaluations Y (θ, φ) ∈ C (P+1) 2 ×1 are denoted in bold and arranged in ascending basis orders [19]. We derive several vector operators along the SH coefficients of f (θ, φ) from Eq. 2.
Conjugation: The complex conjugate () * of function f (θ, φ) flips the order of m and negates odd m harmonics using the conjugate SH property [27]. The vectorized formulation permutes the SH coefficients via the exchange matrix J l ∈ R (2l+1)×(2l+1) defined in [15]. A real-valued function f (θ, φ) = f * (θ, φ) therefore has SH coefficients satisfying C = C. Luo where M = m +m and c() the Clebsch-Gordan coefficients defined in [4,10]. We can therefore expand the product of functions in terms of a product operator of SH coefficients where L, M are indices for the order and degree respectively,m = M − m, and C m l ∈ C f ,Cm Squared modulus: A function's magnitude squared can be computed along SH bases via the product operator of the SH expansion with its conjugate in Eqs. 3 and 5 given by

Function encodings and fitting
We define a stationary covariance function K over random variables X(r) that depends only on the chordal distance d( ) on a unit sphere: where r ∈ R 3 is a cartesian unit vector representing a direction or point on the unit sphere, and = cos −1 r · r the central angle between r, r .
As a motivating example, let the random variable X(r) be the sound pressure P(r, , k) at position r and wave number k due to a unit amplitude wave incident to a direction drawn from random variable with uniform density function f ( ) = (4π) −1 over the unit sphere. This is equivalent to the well-known spatial cross-correlation between sound pressure at two points in a diffuse field [19] which after substitution into Eq. 7 yields the unnormalized sinc kernel: where the positive semi-definiteness follows from expanding P(r, , k)P * (r , , k) along SHs in [19] to supply the weights λ and orthonormal bases ψ to Mercer's Theorem K(r, r ) = ∞ l=0 λ l ψ l (r)ψ l (r ) [23]. Moreover, Mercer's theorem allows a valid covariance function to be defined without explicit specification of its random variables X(r) provided that the former is expressible along an orthonormal bases such as the SHs. We derive the analytic SH expansions of several well-known stationary covariance functions or RBFs.

Analytic encodings
Stationary covariance functions can be expanded into the SH domain via the Legendre addition theorem: where expansion weights b l (g, ) belong to some function g parameterized by . The Legendre polynomial P l (cos ) of the angular distance 0 ≤ ≤ π is expanded into products of SHs evaluated at free coordinates (θ, φ) w.r.t. a center or fixed coordinate (θ * , φ * ). The SH coefficients after substituting Eq. 9 into 2 are given by where is the set of parameters and C T the vector of coefficients C m l ( ). The function specific weights b l (g, ) must be real-valued and depend only on the order l as the conjugate SH property in Eq. 3 ensures that f (θ, φ, ) ∈ R iff C T = C T . The exact values of b l (g, ) can be computed via the sine expansion of the Legendre polynomials [16,31] given by where expansion coefficients g(n, ) integrate the product of covariance function K and the Chebyshev polynomials cos(n ) = T n (cos ).
We now consider members of the family of half-integer order Matérn covariance functions [1] of chordal distances d( ) in Eq. 7 given by where p ∈ N + and () is the Gamma function. These RBFs are products of the Laplace kernel and polynomial functions which can be combined with the Chebyshev polynomials after a change of variables cos(n ) = cos 2n 2 = T 2n cos 2 where F(a, b, z) is the generalized hypergeometric function (see Appendix Eq. 48). The modified integral after substituting Eqs. 12 and 13 into g(n, ) is given by where dt = cos 2 d . After symbolic integration [35] w.r.t. t for cases ν = 1 2 , 3 2 , 5 2 , ∞ , the novel expansion coefficients are presented as follows: Matérn ν = 1/2 (Exponential): Non-differentiable function at = 0.
where I n is the nth order modified Bessel function of the first kind.
Design parameters: The bandwidth term which parameterizes the RBFs can be chosen to satisfy constraints such as the full width at half magnitude (FWHM) and the minimum. The FWHM for RBFs on the sphere gives twice the radians at which the function is half the maximum amplitude (K ν ( , ) = 1 2 ) and minimized at = π as shown in Fig. 1. For Matérn covariance functions, the FWHM can be found for a given and conversely, for a desired target response: where T dB is the target decibel response at T radians (See Appendix Eqs. 50-55 for analytic expressions). For specifying sharp peaks and discontinuities on a sphere, a sharp covariance function is desirable. Given Matérn functions of different classes K ν 1 , K ν 2 where ν 1 < ν 2 and identical FWHM, the less differentiable function K ν 1 is bounded above for 0 < < FWHM 2 and below for FWHM 2 < < π as shown in Fig. 1. A sharp peak function can therefore be parameterized with small ν, T dB and T but requires higher max truncation-order terms P or SH bases to achieve the same reconstruction error as shown in Fig. 2. The root mean square error (RMSE) curve RMSE(P) is asymptotically steeper for smoother function classes (larger ν) and bounded below by functions with larger bandwidth within the same class. We now propose new techniques for fitting SH encoded RBFs to far-field directivity responses and composing magnitude functions on the sphere.

Mixture of encodings
where c is a scalar constant, and w j are unknown weights.
The system of equations is given by where A ∈ C M×N , w ∈ C N×1 , b, c ∈ C M×1 are the system matrix, vectorized unknowns, targets, and scalar constant respectively. The SH expansion of a constant over spherical coordinates is given by 2c √ π Y 0 0 (θ, φ). The solution vector w can be found by direct inversion of A when N = M and by least-squares methods for overdetermined cases N < M or when constraints on w are specified.
For N = M and common RBF parameterizations MoE converges to RBF interpolation for valid kernel functions as the max truncation-order P increases; SH encodings converge to covariance functions at the . Moreover, SH encodings for arbitrary P are valid kernels as the Legendre polynomials in Eq. 9 are also valid kernels [16]. As a result, the original kernel functions can be initially fitted to a large number of samples using the original kernel functions before expanding into the SH bases. For example, native methods such as Gaussian processes [28] use the log-marginal likelihood to fit bandwidth and specify the posterior mean weights computed from a common covariance function where ij is the central angle between (θ i , φ i ) and (θ j , φ j ). We refer to this schema as MoE-RBF transcoding given that a nonparametric RBF interpolation of N supports is encoded rather than fitted to (P + 1) 2 SH basis functions. The biasvariance trade-off after transcoding is thus more sensitive to the choice of the kernel function and its smoothness and locality on the sphere rather than the number of SH bases (See Section 6.1 for experiment).
We now consider the case where non-negative RBFs are fitted to magnitude only responses X ≥ 0. At the limit can be constrained to be non-negative by finding the non-negative least-squares solution w to the real-values of Eq. 21 given by where c = 0 and w ≥ 0. For finite order SH encodings however, f (θ i , φ i , j ) and f (θ, φ) may ripple below 0 due to truncation error. If constant c > 0 is raised, then the sign of R[ b − c] forces each encoding f (θ, φ, i ) to contribute as either a peak or a null (See example in Fig. 3a) and the latter may overshoot and induce negative values in f (θ, φ). One solution is to fit SH encodings of a kernel's squared modulus from Eq. 6 with non-negative weights w ≥ 0 given by wheref (θ, φ, j ) is an SH encoding of a kernel and its squared modulus is strictly non-negative and does not ripple below 0. We note that a kernel's squared modulus is a valid kernel due to the general rule that the product of kernels is a kernel [28]. All modulus squared Matérn kernels are thereby suitable candidates for magnitude fitting with slight modifications to parameter selection. Parameter optimization of depends on the Matérn class ν; exponential and squared exponential K 1 are closed under the squared modulus and so bandwidth is simply scaled. For other half-integer ν classes, the kernels remain products of exponential and polynomial functions but are not part of the Matérn family; bandwidth needs re-derivation w.r.t. the objective function. For parameter design of , targeting half the desired dB response T dB /2 at T in Eq. 19 is sufficient.

Product of encodings
The PoE model or the product of N SH encodings is useful for fitting to low-order target responses given by where w j ∈ C are unknown weights and the product of SH encodings 1 + f (θ, φ, j ) are closed under SH multiplication via Eq. 5; SH expansion of the unit constant  MoEs (See example in Fig. 3b) as the derivative of products are multiplicative whereas the derivative of mixtures is additive. PoEs however do require substantially more bases than MoE as the product of N SH encodings of Pth max truncation-order spans (NP + 1) 2 bases. Data fitting is also not closed-form due to non-linear objective functions.
Consider the non-linear least-squares solution that minimizes the surrogate log function where (X i , θ i , φ i ) are the M target responses. For gradientbased optimization, the Jacobian given by can be used to update w.
In the case of magnitude target responses, a nonnegative PoE can be fitted by constraining the weights to be real and bounded w j ≥ −1. For the case of N = M, the weights can be further constrained such that each encoding contributes a peak or null to f (θ, φ) analogous to the MoE formulation in Eq. 23: normalize SH encodings f (θ, φ, j ) of kernels at their center of expansions (θ j , φ j ) to prevent over-shooting. PoEs can therefore model loworder cascades of spatial peak and null magnitude functions. We now show how to extend MoE and PoE models into beamformer designs.

Conventional beamformer extensions
An N channel array can be oriented towards directional sources by parameterizing so-called steering vectors in terms of each channel's directivity or anechoic far-field responses. A steering vector d(θ, φ) and the resulting beampattern B(θ, φ) can be parameterized along SH bases given by where C d ∈ C (P+1) 2 ×N is the column matrix of SH coefficients fitted to N sets of acoustic directivities, and w ∈ C N×1 are the unknown beamformer's weights. The beamformer's DI evaluates the power output gain at (θ * , φ * ) w.r.t. the power average over the spherical coordinates [19,27]: Maximizing DI is the basis of conventional beamformers such as MVDR which we show can be generalized for non-uniform spatial sampling or density functions in both numerator and denominator.

Spatial prior encodings
Across applications, target and noise sources may appear in varied locations. An array can be steered to boost or nullify a source if its exact locations are known. If their locations are uncertain, an informative prior can instead be specified. Let the uncertainty in sampling from a source direction and its associated steering vector be modeled as a random variable with probability density function f p (θ, φ). We can specify a spatial density function on the unit sphere in terms of SHs by normalizing MoE or PoE functions from Eqs. 20 and 25 to satisfy non-negative and unity integral constraints given as follows: where the formulations given in Eqs. 23, 24, and 28 ensure that f p (θ, φ) is both non-negative and real-valued. The unity integral constraint is satisfied by normalizing the SH coefficients C p ∈ C (P+1) 2 ×1 by the scaled 0th order coefficient 2 √ π C 0 0 following the integration of SH functions property in [27].
The statistical moments of the random variable of steer- ing vectors d(θ, φ) are computed from the product of its SH coefficients in Eq. 29 and that of the density function f p (θ, φ) in Eq. 31. The weighted steering vectors are closed under SH multiplication via Eq. 5: where Q ∈ C (2P+1) 2 ×N are SH product coefficients. The steering vector's mean μ d and covariance d taken over the spherical coordinates are therefore analytic following the SH orthogonality property in [19,27] and multiplicative closure property of Eq. 32: whereQ is truncated to the first (P + 1) 2 rows of Q.
A formulation that avoids truncating Q uses multiplicative associate property to compute the row i and column j elements of covariance E dd H in Eq. 33 from SH products of pair-wise channel responses (columns i, j of d(θ, φ) in Eq. 29) given by Note that the density function f p (θ, φ) must be encoded to have twice the max truncation-order C p ∈ C (2P+1) 2 ×1 of the steering vectors. The density function is truncated back to max-order P only when computing the mean μ d in Eq. 33 for consistency with d(θ, φ). With the steering vectors' first and second moment statistics defined, we now propose a novel probablistic beamformer formulation.

Steerable probabilistic beamforming
In far-field beamforming, let the target and noise source directions be modeled by random variables with density functions f A (θ, φ), f R (θ, φ) respectively satisfying Eq. 31. Define the probabilistic directivity index (PDI) as the power ratio of the beampattern from Eq. 29 integrated w.r.t. the target and noise density functions given by PDI generalizes DI from Eq. 30 as the latter can be specified via dirac delta density function f A (θ, φ) = δ θ,φ for the target and uniform density function f R (θ, φ) = 1 4π for the noise. Maximizing the PDI is equivalent to finding a maximum SNR filter via the GRQ formulation in [19,27] given by where are the non-centered acceptance and rejection covariance matrices respectively. Note that the distortionless constraint for some steering vector d(θ, φ) is optional but useful for normalizing the beamformer response at a look direction. The solution vector w are the beamformer weights and can be found via the generalized eigenvalue decomposition [33]. GRQ beamformers generalize instances of MVDR just as DI is a special case of PDI. If the acceptance covariance matrix is constructed from a single point source A = d(θ, φ)d H (θ, φ) and the rejection covariance matrix computed over noise sources distributed (uniform for DI definition in Eq. 30) over spherical coordinates, then the solution to Eq. 36 is the MVDR beamformer given by where the look direction can be steered by evaluating the SH bases at (θ, φ) spherical coordinates. In the more general case, the SH encoded density functions f A (θ, φ), f R (θ, φ) can be steered under the SO(3) SH rotation operator [17]. We now turn to a related fitting problem that replaces SH encoded density functions with target SH encoded magnitude response functions.

Beampattern fitting formulations
Let an N channel beamformer and target beampattern functions along SH bases be given by where B(θ, φ) is the complex beamformer response, C A ∈ C (P+1) 2 ×N the channel's SH fitted far-field response weights, f A (θ, φ) ∈ C N×1 the channel far-field responses, w ∈ C N×1 the unknown weights. The target magnitude beamformer response f B (θ, φ) is a non-negative real function with constant phase specified via MoE or PoE models in Eqs. 23, 24, and 28 along the SH bases. Beampattern fitting thus solves for weights w s.t. B(θ, φ) ≈ f B (θ, φ) in terms of SH coefficients C A , C B .

Direct least squares
The direct LS fit of weights w in B(θ, φ) to target f B (θ, φ) in Eq. 38 minimizes the squared modulus of the complex residuals between target and response beampatterns. Both magnitude and phase responses are fitted in the LS objective function given by where by SH orthogonality, the residuals are transformed into SH coefficients. The minimizer is therefore the ordinary least squares (OLS) solution to C A w = C B given by arg min OLS however over-constrains the residuals as the beamformer's phase responses are unnecessarily fitted to a constant (0 radians), causing the minimizer to under-fit the target magnitude responses. This is addressed in the so-called discrete MLS formulation.

Magnitude least squares
The magnitude response can be fitted by minimizing the squared error between the modulus beampattern |B(θ, φ)| and target f B (θ, φ) integrated over the spherical coordinates. The integrated MLS fitting can be approximated by the discrete MLS formulation using quadrature or sampling the beampattern over a dense uniform spaced spherical coordinate grid of M directions given by The solution w for discrete MLS can be found via the local-variable exchange (LVE) method [20] given bỹ which introduces a vector of complex unit variables z ∈ C M×1 and converts Eq. 41 into a LS problem. The solution alternates between phase matching the magnitude target f B (θ i , φ i ) to the corresponding beamformer response f Aj (θ i , φ i ) with the free-variable z i and finding the OLS solutionŵ until convergence. An analogous solution for the integrated MLS formulation is more difficult as z is replaced with the complex unit function e jf z (θ,φ) s.t. an unwrapped phase function f z (θ, φ) = Y T (θ, φ)C z is constrained to be real-valued following Eq. 3 but may also be discontinuous (e.g., spatial nulls may invert the phase). Instead, we introduce a novel MSLS formulation which we extend along SHs to show that it does not require discretization over the sphere to optimize.

Magnitude squared least squares
The magnitude response can be fitted by minimizing the squared error between the squared modulus beampattern |B(θ, φ)| 2 and target f 2 B (θ, φ). The MSLS objective function is given by where ⊗ is the Kronecker product, G B ∈ C (2P+1) 2 ×1 the SH coefficients of the squared magnitude target beampattern, G A (i, j) ∈ C (2P+1) 2 ×1 the pair-wise SH products of channel i and j's far-field SH coefficients C A i and C A j respectively, and G A ∈ C (2P+1) 2 ×N 2 the column matrix of the pair-wise products (column k = N(i − 1) + j + 1 is G A (i, j)). The derivation follows from expanding the squared magnitude of the beamformer response into weighted SH products via the distributive property of product operator in Eqs. 5, 6 and then rearranging the terms into a matrix Kronecker-vector product given by The MSLS objective function in Eq. 43 is quartic w.r.t. w and the solution can be found via iterative methods such as gradient descent and Gauss-Newton (see Appendix Eq. 49 for derivatives). Note that the quartic objective function causes the fitted squared magnitude response to over-weight larger values resulting in over-fitted peaks and under-fitted nulls w.r.t. the target beampattern. We now turn to experimental studies to validate our methods in the remainder of the work.

Acoustic directivity fitting
In microphone and speaker prototyping, it is necessary to characterize the impact of the physical array's anechoic far-field acoustic directivity. Fitting the SH bases to acoustic directivity transforms the responses into a spatially continuous and spatially band-limited representation for subsequent beamform design. We present a case study comparing several SH fitting methods and their bias-variance trade-off to the anechoic far-field responses of a custom down-firing mid-range speaker on a table. In simulation [8], a pressure source at the speaker cone center generates a response that we uniformly sample at R = 1 meter radius over a grid of 7.5 degree azimuth and 2 degree elevation increments. Let the dataset be specified by D = {X, θ , φ} where X ∈ C N×1 is the vector of N complex pressure samples at frequency with wavelength smaller than the simulation distance, and θ , φ ∈ R N×1 vectors of the sample azimuth and colatitude in radians respectively.
Let a reference functionf (θ, φ) = C T f Y (θ, φ) be the SH bases fitted upto max-order P = 30 to dataset D via the truncated singular value decomposition (TSVD) [14] as follows: where matrix A ∈ C N×(P+1) 2 is the row-matrix of SH bases evaluated at θ , φ spherical coordinates. The SVD of A is given by left and right singular vectors U ∈ C N×(P+1) 2 , V ∈ C (P+1) 2 ×(P+1) 2 respectively and the sin- 2 ,∞ in Eq. 12 as well as the sinc kernel in Eq. 8 are then fitted over each training set before inference over the test sets. RBF fitting under a Gaussian process model maximizes the log-marginal likelihood quantity of the kernel function hyperparameters (or wavenumber k for sinc kernel) under noisy observation assumptions (A is the equivalent gram matrix with noise variance σ = 10 −6 as diagonal loading) [28]. After convergence, the least squares solution w to Eq. 21 are found and the MoE function f (θ, φ) in Eq. 20 are expanded into C f weighted SH bases via Eq. 10.
The set of candidate fittings for one of the crossvalidation trials are visualized in Fig. 4. The TSVD candidate of lower max truncation-order (P = 6) SH bases under-fits the training set resulting in overly smooth magnitude and phase responses shown in Fig. 4b. Increasing the fit's max-order SH bases however does not improve generalization as shown by the larger errors in Table 1. Instead, fitting by non-parametric RBF expansions at each TSVD bias-variance is sensitive to max-order P whereas RBFs are sensitive to the kernel function sharpness; K 1 2 , 5 2 generalizes well, K ∞ is too smooth, sinc too sharp of the training samples outperforms parametric fitting of SH bases due to the former's bias-variance trade-off. Among the set of RBF kernels to choose from, the sharp but long-tailed function K 1 2 had superior generalization as shown in Fig. 4d. Conversely smooth but short-tailed functions such as K ∞ over-fits to the training set when the samples are sparse and the underlying function rough as shown in Fig. 4f. This results in poor fits near discontinuities in the far-field response such as nulls; decreasing the kernel's bandwidth narrows the function for fitting to local features at the cost of poor generalization at more distant samples. Choosing an unsuitable kernel such as the sinc function when the sound-field is not diffuse also results in poor generalization as the acoustic targets' phase are regular but the sinc kernel has multiple zero-crossings that cause discontinuities. The generalization errors of the cross-validation trials are shown in Table 1 via two metrics. The normalized root mean square error (NRMSE) as defined in Appendix Eq. 56 averages the squared error between the test samples and the model inference f (θ, φ) ≈ Y T (θ, φ)C f evaluated at the sample's spherical coordinates; normalization constant is the standard deviation of the dataset. The normalized root mean squared integrated error (NRMSIE) as defined in Appendix Eq. 57 averages the squared error between the candidate and reference functionf (θ, φ) over the continuous spherical coordinates; normalization constant is the standard deviation of the reference function. The NRMSE metric is restricted to only test samples whereas NRMSIE incorporates both training and test sample by computing the total distance between any two functions on the sphere. For the TSVD model-order candidates, both metrics are minimized when the model order is empirically found to be P = 6; further increasing had the best performances with low mean errors within a standard deviation of each other; K ∞ however is both too smooth and local resulting in larger error mean and variance across trials.

Beamformer cross-density priors
MVDR and GRQ beamformers designed with spatial priors for target and noise sources may be mismatched in real-world scenarios. For example, density functions f A (θ, φ), f R (θ, φ) may be updated a posteriori with target and noise source location data collected from camera data. Spatial priors that depend on the array's location in the environment such as the center, side, corner of a room may be separately defined. We show how PDI in Eq. 35 may be used to evaluate both the expected beamformer performance when the density functions matches the design priors and the performance degradation when they mismatch.
Consider the combinations of target and noise source types with spatial priors specified in Table 2. A stationary point source has a direction sampled from a Dirac delta density function. Diffuse and reverberant noise has equal energy in all directions and thus sampled from a uniform density function. Moving sources and directional noise have spatial priors that contain modes for varying blur and spread due to jitter motion and early wall reflections respectively. Beamformers designed with spatial priors suited to their application are shown to outperform those that use uninformative or inaccurate priors.
One such application is speaker cross-talk cancelation (XTC) where a beamformer jointly maximizes the response at one receiver while minimizing the responses at all other receivers. If the receiver locations are nonstationary in the case of a moving head with two ears, then a speaker beamformer must avoid over-fitting to a narrow region. Instead, we treat the ipsilateral and contralateral ears w.r.t. left and right inputs as target and noise sources respectively with locations sampled from a wide density function. The former has the acceptance XTC PDF f A (θ, φ) = Y T (θ, φ)C A given by the normalized PoEs of two kernel functions K ∞ fitted to target responses X = {12 dB , −12 dB } at their center of expansions. The latter has the rejection XTC* PDF f R (θ, φ) = Y T (θ, φ)C * A which simply reflects the acceptance PDF over the median plane due to symmetry of the ears by conjugating the The acceptance PDF shown in Fig. 4g oversamples the elevated ipsilateral quarter-sphere region and undersampling near the contralaternal ear. The rejection PDF shown in Fig. 4j oversamples the elevated contralateral quarter-sphere region and undersampling near the ipsilateral ear.
Far-field anechoic responses belonging to an N = 4 channel planar array are simulated at 1 meter distances and then fitted to P = 12 max-order SH functions. Farfield MVDR and GRQ beamformers are then optimized for PDIs with Dirac, uniform, XTC and XTC* density functions specified in Table 3 and their beampatterns shown in Fig. 5. The cases are indexed by the PDI's accept and reject PDFs (f A , f R ) and their optimal beamformers specified: • (Dirac, Uniform): MVDR maximizes the response at one ipsilateral ear location (θ = π 3 , φ = − π 12 ), minimizes overall leakage. Beamforming PDIs are largest when f A is the Dirac function; conventional two channel XTC designs set f R to the Dirac function which places a null at the secondary receiver location to further increase the array cancelation gain but may mismatch if the receiver moves. Setting f R to the uniform function minimizes the leakage in all directions whereas the XTC* density presents a compromise.  [7] can be generalized for multiple channels in terms of Eq. 35 by setting the accept and reject densities to the dirac delta functions at the accept and reject center angles (θ 1 , φ 1 ) ∈ 1 and (θ 2 , φ 2 ) ∈ 2 in Eq. 46 respectively and adding a diagonal loading parameter λ to the rejection covariance matrix: GRQ beamformers that maximize the PDI under Eq. 47 for varying λ are then evaluated w.r.t. the cross-density priors in Table 3 and their expected cancelation gains shown in Fig. 6. The XTC gain at the precise target (θ 1 , φ 1 ) and cancelation (θ 2 , φ 2 ) angles grows to infinity as λ → 0. In practice, we prefer to choose the λ that maximizes a PDI with suitable priors. For example, λ = 6.5e−7 maximizes the PDI with (XTC, XTC*) priors at 3 dB expected cancelation gain and sharply declines for larger λ. If the accept region is restricted to the exact target angle in the case of the (Dirac, XTC*) prior, then λ = 1.5e−6 maximizes the PDI. 1 The Tikhonov regularized solution in [7] give identical PDI and XTC gain in Fig. 6.

Beampattern fitting
In applications that require an array to archive a specific beampattern, conventional beamformers such as MVDR and GRQ are ill-suited. The former penalizes a beampattern's mismatch w.r.t. a target whereas the latter maximizes the expected beampattern's array gain between two spatial regions. An idealized target beampattern may be specified in some applications such as stereo cross-talk cancelation that places a peak in one region, a null at another, and has sufficient leakage control or side-lobe attenuation everywhere else (see Fig. 3). Given an acoustic simulation of a physical array, we evaluate how close a beamformer can fit to a beampattern.
Far-field anechoic responses belonging to an N = 8 channel array on both sides of a wedge (2 × 4 rectangle array) are simulated at 1 meter distances and then fitted to max truncation-order P = 14 SH functions. Two target beampatterns are specified along SH bases of the same max truncation-order using the MoE and PoE models as shown in Fig. 3. The PoE target response has a more directive peak but smoother null than the MoE target. We fit separate beamformer weights to the target beampatterns by minimizing the LS, MLS, and MSLS loss functions from Eqs. 39, 41, and 43 via the OLS, LVE, and the Gauss-Newton methods, respectively. Last, we compare beampatterns by evaluating them across loss functions.
The fitting errors for each set of beamformer weights are shown in Table 4. Rows represent the set of loss function evaluated over beampatterns generated from the optimization method specified in the column. Diagonal entries correspond to the optimizer's fitting errors and are the minima along each row. Off-diagonal entries correspond to the row's loss function evaluated on beampatterns produced by other fitting methods where we expect larger errors. Similarity scores in parentheses Fig. 6 Constant parameter λ regularized beamformers [7] parameterized by Eq. 47 achieve unbounded XTC gains. Their beampatterns evaluated w.r.t. PDI give the expected XTC gain over different accept and reject density functions normalize the errors on each row by dividing by the diagonal entry. Beampatterns generated by MLS and MSLS methods induce large errors and low similarity scores w.r.t. the LS beampattern due to having variable phase responses; MLS and MSLS beampatterns are not unique under arbitrary phase shifts of the beamformer weights. Conversely, LS beampatterns are fitted to both magnitude and phase responses, thereby under-fitting the magnitude and inducing larger MLS and MSLS errors. Last, MLS and MSLS produce beampatterns that are more closely matched in magnitude than their phase; MLS and MSLS generated beampatterns have higher similarity scores than that of LS beampatterns.
The beampatterns visualized in Fig. 7 illustrate the trade-offs made by minimizing each loss function. LS fitted beampatterns that are constrained to match the

Results and discussion
We derived the SH expansion of Matérn covariance functions of chordal distance over a unit sphere in terms of generalized hypergeometric functions. Simulated anechoic far-field acoustic responses are fitted to RBFs and then transcoded into SH bases, resulting in large data compression. Cross-validation experiments show that the MoE-RBF models have superior bias-variance trade-off compared to direct SH fitting via the TSVD method. Choosing a covariance function with smoothness and locality characteristics that match the spatial acoustic responses improve generalization along unseen areas on the sphere. Spatial density functions and beampattern targets are then designed using MoE and PoE methods. This enabled conventional beamforming to be reformulated in terms of PDI and probablistic steering vectors sampled from the SH encoded far-field response and spatial density Fig. 7 Beampatterns are shown for far-field beamformers fitted to target MoE and PoE targets specified in Fig. 3 using LS, MLS, and MSLS methods. LS over-fits to a 0 radian phase response. MLS achieves a closer fit to the magnitude target. MSLS over-fits to areas where the magnitude target is large functions respectively. Experiments show how PDI incorporates uncertainty of ear locations in cross-talk cancelation designs and provide a means to evaluate cancelation performance when the priors change. PDI also provides a useful criteria for controlling the amount of diagonal loading in regularized cross-talk cancelation designs.
Beampattern fitting methods such as LS, MLS, and MSLS are at last reformulated along SH bases. For integrated MLS, it remains an open question whether realvalued phase functions can be fitted along SHs due to spatial phase wrapping and discontinuities. For MSLS, we give closed-form gradients and derivatives along the SH bases. Several metrics for comparing beampattern similarity and errors are presented. Experiments demonstrate how effective each method fits a target cross-talk cancelation beampattern from a simulated acoustic array. LS method underfits the magnitude beampattern and should only be used if a flat phase response over the sphere is desirable. MLS equally fits to both peaks and nulls whereas MSLS overfits to peaks in the target beampattern.

Conclusions
Data-dependent far-field beamforming requires an efficient representation for synthesizing simulated acoustic data, spatial density priors, and beampattern designs. The SH basis functions are one such representation where data fitting, kernel function expansions, and magnitude function designs can be encoded or transcoded from other methods commonly used in practice. This allows conventional beamforming and beampattern fitting methods to be reformulated in terms of SH bases yielding solutions that are both closed-form and parametric in the spherical coordinates. As a result, we efficiently fitted and stored the acoustic simulations of an array in a continuous and generalizable format. We evaluated beamforming performance under varying use cases via spatial probabilistic priors and beampattern targets. Last, new device geometries, microphone / speaker topologies, and array designs are iterated upon and compared in rapid prototyping offline.