### 3.1 Natural-gradient sign algorithm (NGSA)

The natural gradient is derived from the multiplication of the inverse of a Fisher information matrix *F*^{−1} and a gradient of the cost function [18]. The matrix *F* is calculated using the covariance of the gradient for the log-likelihood function (Eq. (6)), as follows:

$$\begin{array}{*{20}l} \boldsymbol{F} &:= \mathrm{E}\left[{\left\{ \frac{\partial\log L(\boldsymbol{h})}{\partial\boldsymbol{h}} \right\} \left\{ \frac{\partial\log L(\boldsymbol{h})}{\partial\boldsymbol{h}} \right\}^{\mathsf{T}}}\right] \end{array} $$

(11)

$$\begin{array}{*{20}l} &= \mathrm{E}\left[{\left\{ \frac{\text{sgn}(\varepsilon[n])}{\sigma} \right\}^{2} \boldsymbol{x}[n]\boldsymbol{x}[n]^{\mathsf{T}}}\right] \end{array} $$

(12)

$$\begin{array}{*{20}l} &= \frac{1}{\sigma^{2}} \mathrm{E}\left[{\boldsymbol{x}[n]\boldsymbol{x}[n]^{\mathsf{T}}}\right] \quad (\mathrm{a.s.}) \end{array} $$

(13)

$$\begin{array}{*{20}l} &= \frac{1}{\sigma^{2}} \boldsymbol{R}, \end{array} $$

(14)

where *R* is the autocorrelation matrix of the input signal. Note that Eq. (13) holds because {sgn(*x*)}^{2}=1 is satisfied if *x*≠0. Using Eq. (14), we obtain the NGSA as follows:

$$\begin{array}{*{20}l} \boldsymbol{h}[n+1] = \boldsymbol{h}[n] + \mu_{\text{NGSA}} \text{sgn}(\varepsilon[n]) \boldsymbol{R}^{-1} \boldsymbol{x}[n], \end{array} $$

(15)

where *μ*_{NGSA} denotes the step-size parameter and *R* is assumed to be a regular matrix. In addition, the NGSA can be derived by replacing *ε*[*n*] with sgn(*ε*[*n*]) in the LMS/Newton algorithm [21], which is an approximation of the Newton method for the LMS algorithm.

The NGSA adaptation rule (Eq. (15)) satisfies the following inequality:

$$\begin{array}{*{20}l} {\lim}_{n\to \infty} \frac{1}{n} \sum_{k=1}^{n} \mathrm{E}\left[{|\varepsilon[k]|}\right] \leq \varepsilon_{\text{min}} + \mu_{\text{NGSA}} \frac{h}{\lambda_{\text{min}}}, \end{array} $$

(16)

where \(\varepsilon _{\text {min}}=\mathrm {E}\left [{|v[n]|}\right ], h=(1/2)\mathrm {E}\left [{\|{\boldsymbol {x}[n]}\|_{2}^{2}}\right ]\), and *λ*_{min} denotes the minimum eigenvalue of *R*. The proof of Eq. (16) follows that provided in [14] (see Appendix 1: “NGSA inequality”).

### 3.2 Normalized natural-gradient sign algorithm (NNGSA)

The NGSA encounters difficulties in determining *μ*_{NGSA} because its optimal settings vary according to the input signal. To overcome this difficulty, we introduce a variable step-size adaptation method that minimizes the posterior residual criterion; this approach is identical to that of the NLMS [22].

Let *μ*[*n*] and *ε*^{+}[*n*] be the adaptive step size and the posterior residual at time *n*, respectively. Then, *ε*^{+}[*n*] is calculated as

$$\begin{array}{*{20}l} & \varepsilon^{+}[n] := d[n] - \boldsymbol{h}[n+1]^{\mathsf{T}}\boldsymbol{x}[n] \end{array} $$

(17)

$$\begin{array}{*{20}l} &= d[n] - \left\{ \boldsymbol{h}[n] + \mu[n]\text{sgn}(\varepsilon[n]) \boldsymbol{R}^{-1} \boldsymbol{x}[n] \right\}^{\mathsf{T}} \boldsymbol{x}[n] \end{array} $$

(18)

$$\begin{array}{*{20}l} &= \varepsilon[n] - \mu[n]\text{sgn}(\varepsilon[n])\boldsymbol{x}[n]^{\mathsf{T}}\boldsymbol{R}^{-1}\boldsymbol{x}[n]. \end{array} $$

(19)

We let *ε*^{+}[*n*]=0; then, solving Eq. (19) for *μ*[*n*], we obtain

$$\begin{array}{*{20}l} \mu[n] = \frac{|\varepsilon[n]|}{\boldsymbol{x}[n]^{\mathsf{T}} \boldsymbol{R}^{-1} \boldsymbol{x}[n]}. \end{array} $$

(20)

Substituting Eq. (20) into Eq. (15), we obtain the NNGSA as follows:

$$\begin{array}{*{20}l} \boldsymbol{h}[n+1] = \boldsymbol{h}[n] + {\frac{\mu_{\text{NNGSA}} \varepsilon[n]}{\boldsymbol{x}[n]^{\mathsf{T}} \boldsymbol{R}^{-1} \boldsymbol{x}[n]} }\boldsymbol{R}^{-1} \boldsymbol{x}[n], \end{array} $$

(21)

where *μ*_{NNGSA}>0 denotes the scale parameter. If *μ*_{NNGSA}<2 holds and *h*[*n*] and *x*[*n*] are statistically independent, this adaptation rule achieves a first-order convergence rate. The proof of this proposition follows that of the NLMS provided in [22] (see Appendix 1: “NNGSA convergence condition”).

The NNGSA can be interpreted as a variable step-size modification of the LMS/Newton algorithm [23]. In [24], the authors state that [23] is a generalization of the recursive least squares (RLS) algorithm. Furthermore, it is evident that Eq. (21) is identical to the NLMS if *R*=*I*, where *I* denotes the identity matrix.

### 3.3 Geometric interpretation of NNGSA

The adaptation rule in Eq. (21) is used to solve the following optimization problem:

$$ \begin{aligned} & \underset{\boldsymbol{h}}{\text{argmin}}\ (\boldsymbol{h} - \boldsymbol{h}[n])^{\mathsf{T}} \boldsymbol{R} (\boldsymbol{h} - \boldsymbol{h}[n]), \\ & \text{subject to}\ d[n] = \boldsymbol{h}^{\mathsf{T}} \boldsymbol{x}[n]. \end{aligned} $$

(22)

The Lagrange multiplier can be used to solve the aforementioned problem. Therefore, Eq. (21) projects *h*[*n*] onto the hyperplane *W*={*h* | *d*[*n*]=*h*^{T}*x*[*n*]}, the metric of which is defined as *R* (see Fig. 3). Moreover, according to information geometry [25], the Kullback–Leibler divergence KL[·∥·] for models associated with the neighborhoods of parameter *h*[*n*] can be calculated as

$$\begin{array}{*{20}l} & \text{KL}[{p(\varepsilon[n] \mid \boldsymbol{h}[n])}\|{p(\varepsilon[n] \mid \boldsymbol{h})}] \\ & \approx \frac{1}{2}(\boldsymbol{h} - \boldsymbol{h}[n])^{\mathsf{T}} \boldsymbol{F} (\boldsymbol{h} - \boldsymbol{h}[n]) \end{array} $$

(23)

$$\begin{array}{*{20}l} &= \frac{1}{2\sigma^{2}} (\boldsymbol{h} - \boldsymbol{h}[n])^{\mathsf{T}} \boldsymbol{R} (\boldsymbol{h} - \boldsymbol{h}[n]). \end{array} $$

(24)

Thus, Eq. (21) can be considered the m-projection from model *p*(*ε*[*n*]∣*h*[*n*]) to the statistical manifold *S*={*p*(*ε*[*n*]∣*h*)∣*d*[*n*]=*h*^{T}*x*[*n*]}, the elements of which have the minimum posterior residual.

### 3.4 Efficient natural-gradient update method

The natural gradient *R*^{−1}*x*[*n*] must be calculated at every step. The Sherman–Morrison formula is typically used to reduce RLS complexity; however, this algorithm involves \(\mathcal {O}(N^{2})\) operations, which generate high cost in practical applications [26]. Therefore, we propose an efficient method to solve this problem.

We assume that the input signals follow the AR(*p*) process. The natural gradient at time *n*, i.e., \(\boldsymbol {m}[n] = [m_{1}[n],..., m_{N}[n]]^{\mathsf {T}} := \boldsymbol {K}_{p}^{-1} \boldsymbol {x}[n]\), can be updated as

$$ \begin{aligned} \boldsymbol{K}_{p}^{-1}\boldsymbol{x}[n+1] &= \left[ \begin{array}{c} m_{2}[n] \\ m_{3}[n] \\ \vdots \\ m_{N}[n] \\ 0 \end{array} \right] + m_{1}[n] \left[ \begin{array}{c} \psi_{1} \\ \psi_{2} \\ \vdots \\ \psi_{p} \\ \boldsymbol{0}_{N-p} \end{array} \right] \\ &\quad - m_{N}[n+1] \left[ \begin{array}{c} \boldsymbol{0}_{N-p-1} \\ \psi_{p} \\ \vdots \\ \psi_{1} \\ -1 \end{array} \right], \\ m_{N}[n+1] &= x[n+1] - \sum_{i=1}^{p} \psi_{i} x[n+1-i], \end{aligned} $$

(25)

where *0*_{N} is an *N*×1 zero vector. Equation (25) is followed by a direct calculation (see Appendix 1: “Derivation of efficient natural-gradient update method”). Furthermore, the Mahalanobis norm \(\boldsymbol {x}[n]^{\mathsf {T}} \boldsymbol {K}_{p}^{-1} \boldsymbol {x}[n]\) can be updated as follows:

$$\begin{array}{*{20}l} & \boldsymbol{x}[n+1]^{\mathsf{T}} \boldsymbol{K}_{p}^{-1} \boldsymbol{x}[n+1] \\ & = \boldsymbol{x}[n]^{\mathsf{T}} \boldsymbol{K}_{p}^{-1} \boldsymbol{x}[n] - m_{1}[n]^{2} + m_{N}[n+1]^{2}. \end{array} $$

(26)

Equation (25) requires 3*p* multiply–add (subtract) calculations, and Eq. (26) requires 2. Hence, we can update the natural gradient in \(\mathcal {O}(p)\) operations. Besides, Eq. (25) requires \(\mathcal {O}(N)\) space complexity since its referring to previous step gradient *m*[*n*]. Equation (25) is essentially the same as that of [27], in which a lattice filter (with partial autocorrelation coefficients) is used for gradient updating. The present method is suitable for norm updating.

Algorithm 1 describes the NNGSA coding procedure under the AR(*p*) assumption.

### 3.5 Application to LMS/Newton algorithm

We can apply the proposed procedure to the LMS/Newton algorithm:

$$\begin{array}{*{20}l} \boldsymbol{h}[n+1] &= \boldsymbol{h}[n] + \mu_{\text{LMSN}} \boldsymbol{R}_{p}^{-1} \boldsymbol{x}[n], \end{array} $$

(27)

$$\begin{array}{*{20}l} \boldsymbol{R}_{p}^{-1} &:= \sigma_{p}^{-1} \boldsymbol{K}_{p}^{-1}, \end{array} $$

(28)

where *μ*_{LMSN}>0 denotes the step-size parameter and *σ*_{p} is a constant that depends on *p*. For *p*=1, Eq. (27) achieves first-order convergence if

$$\begin{array}{*{20}l} \mu_{\text{LMSN}} < \frac{2(1 - \psi_{1})}{N(1 + \psi_{1})}. \end{array} $$

(29)

The proof of this proposition follows that for the LMS provided in [21], and employs the eigenvalue range of *R*_{1} [29] (see Appendix 1: “Convergence condition for LMS/Newton algorithm”).

### 3.6 Codec structure

This section describes the NARU encoder and decoder.

#### 3.6.1 Encoder

The NARU encoding procedure is illustrated in Fig. 4. Below, we describe each component of the NARU encoding procedure. Mid-side conversion The mid-side conversion eliminates the inter-channel correlation from the stereo signal. This conversion is expressed as follows:

$$\begin{array}{*{20}l} M &= \frac{L + R}{2}, \end{array} $$

(30)

$$\begin{array}{*{20}l} S &= L - R, \end{array} $$

(31)

where *L*,*R*,*M*, and *S* are the signals of the left, right, mid, and side channels, respectively. Pre-emphasis The pre-emphasis is the first-order FIR filter with a fixed coefficient, which is expressed as follows:

$$\begin{array}{*{20}l} y[n] = x[n] - \eta\ x[n-1], \end{array} $$

(32)

where *η* denotes a constant that satisfies *η*≈1, and *x*[*n*] and *y*[*n*] are the filter input and output at time *n*, respectively. This filter reduces the static offset of the input signal. Hence, we can prevent *R* from being ill-conditioned [28]. Here, we choose *η*=31/32=0.96875 because its division is implemented by a 5-bit arithmetic right shift. NGSA filter The NGSA filter is the core predictive model of this codec and is the highest-order (*N*≤64) FIR filter. Here, we adopt a rule that follows Algorithm 1 and set *d*[*n*]:=*x*[*n*+1] in Eq. (2) so that the filter equalizes to become the input signal. SA filter We cascade the SA filter after the NGSA filter, as this cascaded filter scheme [30] exhibits superior compression performance. This filter has a lower filter order than the NGSA (*N*≤8) and follows the same rule as the SA (Eq. (8)). Recursive Golomb coding This stage converts the residual signal to a compressed bitstream. We employ recursive Golomb coding [31] as the entropy coder; this is a refinement of the Golomb–Rice code and has exhibited acceptable performance in WavPack and TTA.

#### 3.6.2 Decoder

The decoder structure is shown in Fig. 5. As apparent from the figure, the decoding procedure is simply the inverse of the encoding procedure: the SA filter and NGSA filter produce the same predictions as for encoding at each instance and, hence, the input signal is perfectly reconstructed. Additionally, the de-emphasis follows

$$\begin{array}{*{20}l} x[n] &= y[n] + \eta\ x[n-1], \end{array} $$

(33)

and the left–right conversion is expressed as

$$\begin{array}{*{20}l} L &= M + \frac{S}{2}, \end{array} $$

(34)

$$\begin{array}{*{20}l} R &= M - \frac{S}{2}. \end{array} $$

(35)

### 3.7 Codec implementation

As part of this study, the developed codec was implemented. To ensure speed and portability, we implemented the codec in the C programming language [32]. All encoding/decoding procedures were implemented via fixed-point operations so that the decoder reconstructed the input signal perfectly. We published this implementation under the MIT license.

The fixed-point numbers were represented by 32-bit signed integers with 15 fractional bits. Note that, at present, the codec supports 16-bit linear pulse code modulation (PCM) Wav (Waveform Audio File Format) files only, to prevent multiplication overflow and to maintain implementation simplicity. We assume that the appropriate bit-width rounding is available for 24-bit Wav.