3.1 Natural-gradient sign algorithm (NGSA)
The natural gradient is derived from the multiplication of the inverse of a Fisher information matrix F−1 and a gradient of the cost function [18]. The matrix F is calculated using the covariance of the gradient for the log-likelihood function (Eq. (6)), as follows:
$$\begin{array}{*{20}l} \boldsymbol{F} &:= \mathrm{E}\left[{\left\{ \frac{\partial\log L(\boldsymbol{h})}{\partial\boldsymbol{h}} \right\} \left\{ \frac{\partial\log L(\boldsymbol{h})}{\partial\boldsymbol{h}} \right\}^{\mathsf{T}}}\right] \end{array} $$
(11)
$$\begin{array}{*{20}l} &= \mathrm{E}\left[{\left\{ \frac{\text{sgn}(\varepsilon[n])}{\sigma} \right\}^{2} \boldsymbol{x}[n]\boldsymbol{x}[n]^{\mathsf{T}}}\right] \end{array} $$
(12)
$$\begin{array}{*{20}l} &= \frac{1}{\sigma^{2}} \mathrm{E}\left[{\boldsymbol{x}[n]\boldsymbol{x}[n]^{\mathsf{T}}}\right] \quad (\mathrm{a.s.}) \end{array} $$
(13)
$$\begin{array}{*{20}l} &= \frac{1}{\sigma^{2}} \boldsymbol{R}, \end{array} $$
(14)
where R is the autocorrelation matrix of the input signal. Note that Eq. (13) holds because {sgn(x)}2=1 is satisfied if x≠0. Using Eq. (14), we obtain the NGSA as follows:
$$\begin{array}{*{20}l} \boldsymbol{h}[n+1] = \boldsymbol{h}[n] + \mu_{\text{NGSA}} \text{sgn}(\varepsilon[n]) \boldsymbol{R}^{-1} \boldsymbol{x}[n], \end{array} $$
(15)
where μNGSA denotes the step-size parameter and R is assumed to be a regular matrix. In addition, the NGSA can be derived by replacing ε[n] with sgn(ε[n]) in the LMS/Newton algorithm [21], which is an approximation of the Newton method for the LMS algorithm.
The NGSA adaptation rule (Eq. (15)) satisfies the following inequality:
$$\begin{array}{*{20}l} {\lim}_{n\to \infty} \frac{1}{n} \sum_{k=1}^{n} \mathrm{E}\left[{|\varepsilon[k]|}\right] \leq \varepsilon_{\text{min}} + \mu_{\text{NGSA}} \frac{h}{\lambda_{\text{min}}}, \end{array} $$
(16)
where \(\varepsilon _{\text {min}}=\mathrm {E}\left [{|v[n]|}\right ], h=(1/2)\mathrm {E}\left [{\|{\boldsymbol {x}[n]}\|_{2}^{2}}\right ]\), and λmin denotes the minimum eigenvalue of R. The proof of Eq. (16) follows that provided in [14] (see Appendix 1: “NGSA inequality”).
3.2 Normalized natural-gradient sign algorithm (NNGSA)
The NGSA encounters difficulties in determining μNGSA because its optimal settings vary according to the input signal. To overcome this difficulty, we introduce a variable step-size adaptation method that minimizes the posterior residual criterion; this approach is identical to that of the NLMS [22].
Let μ[n] and ε+[n] be the adaptive step size and the posterior residual at time n, respectively. Then, ε+[n] is calculated as
$$\begin{array}{*{20}l} & \varepsilon^{+}[n] := d[n] - \boldsymbol{h}[n+1]^{\mathsf{T}}\boldsymbol{x}[n] \end{array} $$
(17)
$$\begin{array}{*{20}l} &= d[n] - \left\{ \boldsymbol{h}[n] + \mu[n]\text{sgn}(\varepsilon[n]) \boldsymbol{R}^{-1} \boldsymbol{x}[n] \right\}^{\mathsf{T}} \boldsymbol{x}[n] \end{array} $$
(18)
$$\begin{array}{*{20}l} &= \varepsilon[n] - \mu[n]\text{sgn}(\varepsilon[n])\boldsymbol{x}[n]^{\mathsf{T}}\boldsymbol{R}^{-1}\boldsymbol{x}[n]. \end{array} $$
(19)
We let ε+[n]=0; then, solving Eq. (19) for μ[n], we obtain
$$\begin{array}{*{20}l} \mu[n] = \frac{|\varepsilon[n]|}{\boldsymbol{x}[n]^{\mathsf{T}} \boldsymbol{R}^{-1} \boldsymbol{x}[n]}. \end{array} $$
(20)
Substituting Eq. (20) into Eq. (15), we obtain the NNGSA as follows:
$$\begin{array}{*{20}l} \boldsymbol{h}[n+1] = \boldsymbol{h}[n] + {\frac{\mu_{\text{NNGSA}} \varepsilon[n]}{\boldsymbol{x}[n]^{\mathsf{T}} \boldsymbol{R}^{-1} \boldsymbol{x}[n]} }\boldsymbol{R}^{-1} \boldsymbol{x}[n], \end{array} $$
(21)
where μNNGSA>0 denotes the scale parameter. If μNNGSA<2 holds and h[n] and x[n] are statistically independent, this adaptation rule achieves a first-order convergence rate. The proof of this proposition follows that of the NLMS provided in [22] (see Appendix 1: “NNGSA convergence condition”).
The NNGSA can be interpreted as a variable step-size modification of the LMS/Newton algorithm [23]. In [24], the authors state that [23] is a generalization of the recursive least squares (RLS) algorithm. Furthermore, it is evident that Eq. (21) is identical to the NLMS if R=I, where I denotes the identity matrix.
3.3 Geometric interpretation of NNGSA
The adaptation rule in Eq. (21) is used to solve the following optimization problem:
$$ \begin{aligned} & \underset{\boldsymbol{h}}{\text{argmin}}\ (\boldsymbol{h} - \boldsymbol{h}[n])^{\mathsf{T}} \boldsymbol{R} (\boldsymbol{h} - \boldsymbol{h}[n]), \\ & \text{subject to}\ d[n] = \boldsymbol{h}^{\mathsf{T}} \boldsymbol{x}[n]. \end{aligned} $$
(22)
The Lagrange multiplier can be used to solve the aforementioned problem. Therefore, Eq. (21) projects h[n] onto the hyperplane W={h | d[n]=hTx[n]}, the metric of which is defined as R (see Fig. 3). Moreover, according to information geometry [25], the Kullback–Leibler divergence KL[·∥·] for models associated with the neighborhoods of parameter h[n] can be calculated as
$$\begin{array}{*{20}l} & \text{KL}[{p(\varepsilon[n] \mid \boldsymbol{h}[n])}\|{p(\varepsilon[n] \mid \boldsymbol{h})}] \\ & \approx \frac{1}{2}(\boldsymbol{h} - \boldsymbol{h}[n])^{\mathsf{T}} \boldsymbol{F} (\boldsymbol{h} - \boldsymbol{h}[n]) \end{array} $$
(23)
$$\begin{array}{*{20}l} &= \frac{1}{2\sigma^{2}} (\boldsymbol{h} - \boldsymbol{h}[n])^{\mathsf{T}} \boldsymbol{R} (\boldsymbol{h} - \boldsymbol{h}[n]). \end{array} $$
(24)
Thus, Eq. (21) can be considered the m-projection from model p(ε[n]∣h[n]) to the statistical manifold S={p(ε[n]∣h)∣d[n]=hTx[n]}, the elements of which have the minimum posterior residual.
3.4 Efficient natural-gradient update method
The natural gradient R−1x[n] must be calculated at every step. The Sherman–Morrison formula is typically used to reduce RLS complexity; however, this algorithm involves \(\mathcal {O}(N^{2})\) operations, which generate high cost in practical applications [26]. Therefore, we propose an efficient method to solve this problem.
We assume that the input signals follow the AR(p) process. The natural gradient at time n, i.e., \(\boldsymbol {m}[n] = [m_{1}[n],..., m_{N}[n]]^{\mathsf {T}} := \boldsymbol {K}_{p}^{-1} \boldsymbol {x}[n]\), can be updated as
$$ \begin{aligned} \boldsymbol{K}_{p}^{-1}\boldsymbol{x}[n+1] &= \left[ \begin{array}{c} m_{2}[n] \\ m_{3}[n] \\ \vdots \\ m_{N}[n] \\ 0 \end{array} \right] + m_{1}[n] \left[ \begin{array}{c} \psi_{1} \\ \psi_{2} \\ \vdots \\ \psi_{p} \\ \boldsymbol{0}_{N-p} \end{array} \right] \\ &\quad - m_{N}[n+1] \left[ \begin{array}{c} \boldsymbol{0}_{N-p-1} \\ \psi_{p} \\ \vdots \\ \psi_{1} \\ -1 \end{array} \right], \\ m_{N}[n+1] &= x[n+1] - \sum_{i=1}^{p} \psi_{i} x[n+1-i], \end{aligned} $$
(25)
where 0N is an N×1 zero vector. Equation (25) is followed by a direct calculation (see Appendix 1: “Derivation of efficient natural-gradient update method”). Furthermore, the Mahalanobis norm \(\boldsymbol {x}[n]^{\mathsf {T}} \boldsymbol {K}_{p}^{-1} \boldsymbol {x}[n]\) can be updated as follows:
$$\begin{array}{*{20}l} & \boldsymbol{x}[n+1]^{\mathsf{T}} \boldsymbol{K}_{p}^{-1} \boldsymbol{x}[n+1] \\ & = \boldsymbol{x}[n]^{\mathsf{T}} \boldsymbol{K}_{p}^{-1} \boldsymbol{x}[n] - m_{1}[n]^{2} + m_{N}[n+1]^{2}. \end{array} $$
(26)
Equation (25) requires 3p multiply–add (subtract) calculations, and Eq. (26) requires 2. Hence, we can update the natural gradient in \(\mathcal {O}(p)\) operations. Besides, Eq. (25) requires \(\mathcal {O}(N)\) space complexity since its referring to previous step gradient m[n]. Equation (25) is essentially the same as that of [27], in which a lattice filter (with partial autocorrelation coefficients) is used for gradient updating. The present method is suitable for norm updating.
Algorithm 1 describes the NNGSA coding procedure under the AR(p) assumption.
3.5 Application to LMS/Newton algorithm
We can apply the proposed procedure to the LMS/Newton algorithm:
$$\begin{array}{*{20}l} \boldsymbol{h}[n+1] &= \boldsymbol{h}[n] + \mu_{\text{LMSN}} \boldsymbol{R}_{p}^{-1} \boldsymbol{x}[n], \end{array} $$
(27)
$$\begin{array}{*{20}l} \boldsymbol{R}_{p}^{-1} &:= \sigma_{p}^{-1} \boldsymbol{K}_{p}^{-1}, \end{array} $$
(28)
where μLMSN>0 denotes the step-size parameter and σp is a constant that depends on p. For p=1, Eq. (27) achieves first-order convergence if
$$\begin{array}{*{20}l} \mu_{\text{LMSN}} < \frac{2(1 - \psi_{1})}{N(1 + \psi_{1})}. \end{array} $$
(29)
The proof of this proposition follows that for the LMS provided in [21], and employs the eigenvalue range of R1 [29] (see Appendix 1: “Convergence condition for LMS/Newton algorithm”).
3.6 Codec structure
This section describes the NARU encoder and decoder.
3.6.1 Encoder
The NARU encoding procedure is illustrated in Fig. 4. Below, we describe each component of the NARU encoding procedure. Mid-side conversion The mid-side conversion eliminates the inter-channel correlation from the stereo signal. This conversion is expressed as follows:
$$\begin{array}{*{20}l} M &= \frac{L + R}{2}, \end{array} $$
(30)
$$\begin{array}{*{20}l} S &= L - R, \end{array} $$
(31)
where L,R,M, and S are the signals of the left, right, mid, and side channels, respectively. Pre-emphasis The pre-emphasis is the first-order FIR filter with a fixed coefficient, which is expressed as follows:
$$\begin{array}{*{20}l} y[n] = x[n] - \eta\ x[n-1], \end{array} $$
(32)
where η denotes a constant that satisfies η≈1, and x[n] and y[n] are the filter input and output at time n, respectively. This filter reduces the static offset of the input signal. Hence, we can prevent R from being ill-conditioned [28]. Here, we choose η=31/32=0.96875 because its division is implemented by a 5-bit arithmetic right shift. NGSA filter The NGSA filter is the core predictive model of this codec and is the highest-order (N≤64) FIR filter. Here, we adopt a rule that follows Algorithm 1 and set d[n]:=x[n+1] in Eq. (2) so that the filter equalizes to become the input signal. SA filter We cascade the SA filter after the NGSA filter, as this cascaded filter scheme [30] exhibits superior compression performance. This filter has a lower filter order than the NGSA (N≤8) and follows the same rule as the SA (Eq. (8)). Recursive Golomb coding This stage converts the residual signal to a compressed bitstream. We employ recursive Golomb coding [31] as the entropy coder; this is a refinement of the Golomb–Rice code and has exhibited acceptable performance in WavPack and TTA.
3.6.2 Decoder
The decoder structure is shown in Fig. 5. As apparent from the figure, the decoding procedure is simply the inverse of the encoding procedure: the SA filter and NGSA filter produce the same predictions as for encoding at each instance and, hence, the input signal is perfectly reconstructed. Additionally, the de-emphasis follows
$$\begin{array}{*{20}l} x[n] &= y[n] + \eta\ x[n-1], \end{array} $$
(33)
and the left–right conversion is expressed as
$$\begin{array}{*{20}l} L &= M + \frac{S}{2}, \end{array} $$
(34)
$$\begin{array}{*{20}l} R &= M - \frac{S}{2}. \end{array} $$
(35)
3.7 Codec implementation
As part of this study, the developed codec was implemented. To ensure speed and portability, we implemented the codec in the C programming language [32]. All encoding/decoding procedures were implemented via fixed-point operations so that the decoder reconstructed the input signal perfectly. We published this implementation under the MIT license.
The fixed-point numbers were represented by 32-bit signed integers with 15 fractional bits. Note that, at present, the codec supports 16-bit linear pulse code modulation (PCM) Wav (Waveform Audio File Format) files only, to prevent multiplication overflow and to maintain implementation simplicity. We assume that the appropriate bit-width rounding is available for 24-bit Wav.