 Methodology
 Open access
 Published:
Improving signalgorithm convergence rate using natural gradient for lossless audio compression
EURASIP Journal on Audio, Speech, and Music Processing volume 2022, Article number: 12 (2022)
Abstract
In lossless audio compression, the predictive residuals must remain sparse when entropy coding is applied. The sign algorithm (SA) is a conventional method for minimizing the magnitudes of residuals; however, this approach yields poor convergence performance compared with the least mean square algorithm. To overcome this convergence performance degradation, we propose novel adaptive algorithms based on a natural gradient: the naturalgradient sign algorithm (NGSA) and normalized NGSA. We also propose an efficient naturalgradient update method based on the AR(p) model, which requires \(\mathcal {O}(p)\) multiply–add operations at every adaptation step. In experiments conducted using toy and real music data, the proposed algorithms achieve superior convergence performance to the SA. Furthermore, we propose a novel lossless audio codec based on the NGSA, called the naturalgradient autoregressive unlossy audio compressor (NARU), which is opensource and implemented in C. In a comparative experiment with existing, wellknown codecs, NARU exhibits superior compression performance. These results suggest that the proposed methods are appropriate for practical applications.
1 Introduction
Greater storage capacity is required to further enrich digital audio content [1]. Therefore, lossless audio coding, which allows audio data compression without information loss, is vital for various applications, such as lossless music delivery, editing, and recording [2]. Figure 1 depicts the general structure of a lossless audio codec [3]. First, the codec converts the audio signal to a residual via prediction using a mathematical model. Second, it compresses the residual through entropy coding. If the model provides an accurate prediction, the residual signal is sparse and, thus, high compression performance is achieved. The Shorten lossless codec [4] was one of the first codecs with the structure shown in Fig. 1, and several codecs that follow the same structure have been implemented since. For example, MPEG4ALS [5], ALAC [6], and FLAC [7] use linear predictive coding (LPC) as the predictive model, whereas WavPack [8], TTA [9], and Monkey’s Audio [10] use adaptive filters. LPC is generally formulated based on the assumption that the residual follows a Gaussian distribution; hence, FLAC and MPEG4ALS are based on Gaussian distribution. In contrast, Wavpack, TTA, and Monkey’s Audio are based on Laplacian distribution, using adaptive algorithms.
In entropy coding, the Golomb–Rice code [11] is generally employed, as this code is optimal when the residual follows a Laplace distribution. Therefore, a residual assumption for LPC is mismatched. To overcome this problem, Kameoka et al. [12] improved the compression rate by formulating an LPC under a Laplace distribution. The sign algorithm (SA) [13] is a practical choice for the adaptive algorithm when the residual follows a Laplace distribution; however, the SA converges at a considerably slower rate than that of the least mean square (LMS) algorithm [14].
To overcome this performance gap, several SA variants such as the convex combination [15] and the logarithmic cost function [16] have been proposed. However, these attempts have not yielded superior convergence performance to the normalized LMS (NLMS). Notably, the algorithm proposed by Gay and Douglas [17] outperforms the NLMS through the use of a natural gradient [18].
In this study, we improve the SA convergence performance using a natural gradient. We propose two novel adaptive algorithms: the naturalgradient sign algorithm (NGSA) and normalized NGSA (NNGSA) [19]. These algorithms employ \(\mathcal {O}(p)\) multiply–add operations to calculate the natural gradient at every step based on the pth order autoregressive model assumption for the input data. The proposed algorithms achieve superior convergence performance to the SA. Furthermore, we propose a novel lossless audio codec based on the NGSA, called the naturalgradient autoregressive unlossy audio compressor (NARU) (Taiyo Mineo, Hayaru Shouno: NARU: Naturalgradient AutoRegressive Unlossy Audio Compressor, submitted), which is implemented and published under the MIT license. NARU exhibits superior compression performance to existing codecs such as FLAC, WavPack, TTA, and MPEG4ALS. Moreover, its decoding speed is faster than that of Monkey’s Audio without strict optimization.
The remainder of this paper is organized as follows: Section 2 provides an overview of the relevant mathematical theories; Section 3 presents the proposed methods and the NARU codec structure; Section 4 reports computerbased experiments to demonstrate the performance of the proposed algorithms and codec; and Sections 5 and 6 present the discussion and conclusion, respectively.
2 Theoretical background
2.1 Adaptive filter
An overview of an adaptive filter is shown in Fig. 2. The input signal x[n] and observation noise v[n] are discretetime signal sequences. v[n] is a noise adding for unknown system outputs. In this study, x[n] is assumed to have weak stationarity and to be an ergodic process. Let h[n]=[h_{1}[n],...,h_{N}[n]]^{T} be the adaptive filter coefficients, where T represents the matrix transposition. This study employs a finite impulse response (FIR) filter. Hence, the filter output is denoted as h[n]^{T}x[n], where x[n]=[x[n−N+1],...,x[n]]^{T} represents the input vector. We denote the coefficient vector for an unknown system as h^{∗}. Filter adaptation is performed by updating the h[n] coefficients based on the observed signal
and the residual
2.2 Sign algorithm (SA)
The SA is derived using the maximum likelihood method under the assumption that ε[n] follows a Laplace distribution. The probability density function of the Laplace distribution is
where σ>0 represents the deviation. The likelihood L(h) and loglikelihood logL(h) functions for M independent and identically distributed (i.i.d.) samples are expressed as
We let M=1 because the SA adapts at each step. To maximize the likelihood, we partially differentiate logL(h) with respect to h
where sgn(·) denotes the sign function, which is defined as
The SA adaptation rule is expressed as
where μ>0 denotes the stepsize parameter.
2.3 Autoregressive model
To simplify the inverse calculation for an autocorrelation matrix for the input signal, we introduce an autoregressive model. Here, AR(p) indicates the autoregressive model with order p that satisfies the following equation for signal s:
where ν[n] is a sample from an independent standard normal distribution. The ith row and jth column element of the inverse autocovariance matrix for the AR(p) process \(\boldsymbol {K}_{p}^{1}\) is calculated explicitly as [20]
where i≥j,ψ_{0}=1, and L is the matrix size satisfying L>2p.
3 Proposed methods
3.1 Naturalgradient sign algorithm (NGSA)
The natural gradient is derived from the multiplication of the inverse of a Fisher information matrix F^{−1} and a gradient of the cost function [18]. The matrix F is calculated using the covariance of the gradient for the loglikelihood function (Eq. (6)), as follows:
where R is the autocorrelation matrix of the input signal. Note that Eq. (13) holds because {sgn(x)}^{2}=1 is satisfied if x≠0. Using Eq. (14), we obtain the NGSA as follows:
where μ_{NGSA} denotes the stepsize parameter and R is assumed to be a regular matrix. In addition, the NGSA can be derived by replacing ε[n] with sgn(ε[n]) in the LMS/Newton algorithm [21], which is an approximation of the Newton method for the LMS algorithm.
The NGSA adaptation rule (Eq. (15)) satisfies the following inequality:
where \(\varepsilon _{\text {min}}=\mathrm {E}\left [{v[n]}\right ], h=(1/2)\mathrm {E}\left [{\{\boldsymbol {x}[n]}\_{2}^{2}}\right ]\), and λ_{min} denotes the minimum eigenvalue of R. The proof of Eq. (16) follows that provided in [14] (see Appendix 1: “NGSA inequality”).
3.2 Normalized naturalgradient sign algorithm (NNGSA)
The NGSA encounters difficulties in determining μ_{NGSA} because its optimal settings vary according to the input signal. To overcome this difficulty, we introduce a variable stepsize adaptation method that minimizes the posterior residual criterion; this approach is identical to that of the NLMS [22].
Let μ[n] and ε^{+}[n] be the adaptive step size and the posterior residual at time n, respectively. Then, ε^{+}[n] is calculated as
We let ε^{+}[n]=0; then, solving Eq. (19) for μ[n], we obtain
Substituting Eq. (20) into Eq. (15), we obtain the NNGSA as follows:
where μ_{NNGSA}>0 denotes the scale parameter. If μ_{NNGSA}<2 holds and h[n] and x[n] are statistically independent, this adaptation rule achieves a firstorder convergence rate. The proof of this proposition follows that of the NLMS provided in [22] (see Appendix 1: “NNGSA convergence condition”).
The NNGSA can be interpreted as a variable stepsize modification of the LMS/Newton algorithm [23]. In [24], the authors state that [23] is a generalization of the recursive least squares (RLS) algorithm. Furthermore, it is evident that Eq. (21) is identical to the NLMS if R=I, where I denotes the identity matrix.
3.3 Geometric interpretation of NNGSA
The adaptation rule in Eq. (21) is used to solve the following optimization problem:
The Lagrange multiplier can be used to solve the aforementioned problem. Therefore, Eq. (21) projects h[n] onto the hyperplane W={h  d[n]=h^{T}x[n]}, the metric of which is defined as R (see Fig. 3). Moreover, according to information geometry [25], the Kullback–Leibler divergence KL[·∥·] for models associated with the neighborhoods of parameter h[n] can be calculated as
Thus, Eq. (21) can be considered the mprojection from model p(ε[n]∣h[n]) to the statistical manifold S={p(ε[n]∣h)∣d[n]=h^{T}x[n]}, the elements of which have the minimum posterior residual.
3.4 Efficient naturalgradient update method
The natural gradient R^{−1}x[n] must be calculated at every step. The Sherman–Morrison formula is typically used to reduce RLS complexity; however, this algorithm involves \(\mathcal {O}(N^{2})\) operations, which generate high cost in practical applications [26]. Therefore, we propose an efficient method to solve this problem.
We assume that the input signals follow the AR(p) process. The natural gradient at time n, i.e., \(\boldsymbol {m}[n] = [m_{1}[n],..., m_{N}[n]]^{\mathsf {T}} := \boldsymbol {K}_{p}^{1} \boldsymbol {x}[n]\), can be updated as
where 0_{N} is an N×1 zero vector. Equation (25) is followed by a direct calculation (see Appendix 1: “Derivation of efficient naturalgradient update method”). Furthermore, the Mahalanobis norm \(\boldsymbol {x}[n]^{\mathsf {T}} \boldsymbol {K}_{p}^{1} \boldsymbol {x}[n]\) can be updated as follows:
Equation (25) requires 3p multiply–add (subtract) calculations, and Eq. (26) requires 2. Hence, we can update the natural gradient in \(\mathcal {O}(p)\) operations. Besides, Eq. (25) requires \(\mathcal {O}(N)\) space complexity since its referring to previous step gradient m[n]. Equation (25) is essentially the same as that of [27], in which a lattice filter (with partial autocorrelation coefficients) is used for gradient updating. The present method is suitable for norm updating.
Algorithm 1 describes the NNGSA coding procedure under the AR(p) assumption.
3.5 Application to LMS/Newton algorithm
We can apply the proposed procedure to the LMS/Newton algorithm:
where μ_{LMSN}>0 denotes the stepsize parameter and σ_{p} is a constant that depends on p. For p=1, Eq. (27) achieves firstorder convergence if
The proof of this proposition follows that for the LMS provided in [21], and employs the eigenvalue range of R_{1} [29] (see Appendix 1: “Convergence condition for LMS/Newton algorithm”).
3.6 Codec structure
This section describes the NARU encoder and decoder.
3.6.1 Encoder
The NARU encoding procedure is illustrated in Fig. 4. Below, we describe each component of the NARU encoding procedure. Midside conversion The midside conversion eliminates the interchannel correlation from the stereo signal. This conversion is expressed as follows:
where L,R,M, and S are the signals of the left, right, mid, and side channels, respectively. Preemphasis The preemphasis is the firstorder FIR filter with a fixed coefficient, which is expressed as follows:
where η denotes a constant that satisfies η≈1, and x[n] and y[n] are the filter input and output at time n, respectively. This filter reduces the static offset of the input signal. Hence, we can prevent R from being illconditioned [28]. Here, we choose η=31/32=0.96875 because its division is implemented by a 5bit arithmetic right shift. NGSA filter The NGSA filter is the core predictive model of this codec and is the highestorder (N≤64) FIR filter. Here, we adopt a rule that follows Algorithm 1 and set d[n]:=x[n+1] in Eq. (2) so that the filter equalizes to become the input signal. SA filter We cascade the SA filter after the NGSA filter, as this cascaded filter scheme [30] exhibits superior compression performance. This filter has a lower filter order than the NGSA (N≤8) and follows the same rule as the SA (Eq. (8)). Recursive Golomb coding This stage converts the residual signal to a compressed bitstream. We employ recursive Golomb coding [31] as the entropy coder; this is a refinement of the Golomb–Rice code and has exhibited acceptable performance in WavPack and TTA.
3.6.2 Decoder
The decoder structure is shown in Fig. 5. As apparent from the figure, the decoding procedure is simply the inverse of the encoding procedure: the SA filter and NGSA filter produce the same predictions as for encoding at each instance and, hence, the input signal is perfectly reconstructed. Additionally, the deemphasis follows
and the left–right conversion is expressed as
3.7 Codec implementation
As part of this study, the developed codec was implemented. To ensure speed and portability, we implemented the codec in the C programming language [32]. All encoding/decoding procedures were implemented via fixedpoint operations so that the decoder reconstructed the input signal perfectly. We published this implementation under the MIT license.
The fixedpoint numbers were represented by 32bit signed integers with 15 fractional bits. Note that, at present, the codec supports 16bit linear pulse code modulation (PCM) Wav (Waveform Audio File Format) files only, to prevent multiplication overflow and to maintain implementation simplicity. We assume that the appropriate bitwidth rounding is available for 24bit Wav.
4 Experiment results
This section reports the evaluation results for the proposed algorithms and codec.
4.1 Adaptive algorithm comparison
4.1.1 Toydata experiments
We observed the convergence performance under the following artificial settings. The elements of the unknown parameter h^{∗} were randomly chosen with a uniform distribution of [−1,1], the filter order N was set to 5, and the observation noise v[n] was white Gaussian noise with −20, −40, and −60 dB variances. These settings were adopted from [16]. We calculated the mean square deviation (MSD) criteria ∥h^{∗}−h∥_{2} from 200 independent trials. In addition, we set p=1 and the following step sizes for the proposed algorithms: μ_{NGSA}=0.01,μ_{NNGSA}=0.1, and μ_{LMSN}=0.01. We implemented the algorithms in Python 3.8.1 and performed simulations using an Intel^{®} Corei7 2.8 GHz Dual Core CPU with 16 GB RAM.
First, we tracked the MSD learning curves for x[n] with a variance of 0 dB. Figure 6 shows a comparison between the results obtained for the proposed algorithms and the SA, NLMS, and RLS (see Fig. 9 in Appendix 2 for −20 and −60 dB results). We set various step sizes for the SA and NLMS and employed various forgetting factors λ for the RLS. Figure 6 shows that the NGSA and NNGSA achieved almost the same performance as the SA and NLMS, respectively. This is because \(\boldsymbol {R}^{1}_{1} \approx \boldsymbol {I}\) holds for i.i.d. noise input.
Second, we observed the case in which the Gaussian noise is correlated with x[n]←x[n]+x[n−1]×0.8. Figure 7 shows the correlation results (see Fig. 10 in Appendix 2 for −20 and −60 dB results). The SA and NLMS exhibited poorer convergence performance than for the noncorrelated noise input (Fig. 6). Moreover, the steadystate errors for the proposed algorithms also deteriorated. This is because R was close to being illconditioned, and the righthand side of Eq. (16) was large.
4.1.2 Realdata experiments
We observed the absolute error (AE) for filter prediction using real music data from the Real World Computing (RWC) music dataset [33]. In this experiment, we assumed that the input data was composed of an audio data signal only and that the reference output and observation noise was zero (silence). We set the same configurations for the proposed algorithms as in the toydata experiments. Figure 8 shows the AE curves obtained for the first second (at a 44100 Hz sampling rate) for the left channel of the tune “When the Saints Go Marching In.” From Fig. 8, the NNGSA and LMS/Newton exhibited superior performance to the NLMS and approximately the same performance as the RLS. However, the NGSA with AR(1) exhibited considerably poorer performance. We assume that this poor performance stemmed from a greater steadystate error for the NGSA, which arose from longterm (≈ 10000 samples) signal stationarity.
4.2 Codec evaluation
We observed the compression performance under the following settings, treating the following existing codecs as competitors: FLAC version 1.3.2 with “highest compression” option (8). WavPack version 5.4.0 with “very high quality” option (hh). TTA version 2.3 with default setting. Monkey’s Audio version 6.14 with “extra high” option (c4000). MPEG4ALS RM23 with default setting. We did not use the optimum compression option (7) as the required encoding time was unrealistic. NARU The NGSA filter order was set to 64, the AR order was 1, and the SA filter order was 8.
There were two evaluation criteria:
We employed the RWC music dataset [33] detailed in Table 1 and measured the root mean square (RMS) amplitude for each music data element. All the music data elements were formatted as Wav files, with 16bit/sample, a stereochannel setting, and a 44100 Hz sampling rate. The experiments were conducted on a Windows 10 OS PC having an Intel^{®} Core^{™} i79750H 2.6 GHz CPU with 32 GB RAM.
The compression ratio and decoding speed results are presented in Tables 2 and 3, respectively.
5 Discussion
The proposed algorithms clearly achieved superior convergence performance to the SA and NLMS for correlated signal inputs. Furthermore, the NNGSA and LMS/Newton algorithms exhibited similar performance to the RLS, as indicated in [24]. The NNGSA is not superior to NLMS and RLS in both aspects of convergence speed and steadystate error. However, the NNGSA showed superior performance than the NLMS in highly correlated signals (Fig. 7). In general, digital audio signals exhibit high autocorrelation in small order. Hence, we suggested that the NNGSA showed superior convergence speed than the NLMS for empirical data. Furthermore, the NNGSA time complexity for update gradient is O(p) per adaptation; hence, its complexity is faster than RLS, which employs the Sharman–Morrison formula (O(N^{2})). Therefore, we concluded that the NNGSA was a more accurate predictive algorithm than the SA and practical application to a lossless audio codec.
However, the proposed algorithms suffer from two major problems with regard to practical applications. First, matrix R must be singular and dependent on input signals. For example, a static offset will be zero mean, variance, and autocorrelations by preemphasis. One approach to resolving this problem is to introduce regularization, which would involve calculation of the inverse matrix for R+γI (γ>0) instead of R. Second, the AR coefficients ψ_{i} (i=1,...,p) must be calculated before the adaptation process, which can generate difficulties for streaming data processing.
As apparent from Tables 2 and 3, although Monkey’s Audio yielded the best average compression performance, it also exhibited the lowest decoding speed. This is because Monkey’s Audio uses a rich prediction/coding scheme, with a convolutional neural network for prediction and arithmetic coding. In addition, FLAC yielded an inverse trend, i.e., it exhibited the highest decoding speed and poorest compression performance.
NARU exhibited superior compression performance to FLAC, WavPack, TTA, and MPEG4ALS. This method showed strength in the classical and jazz categories, whereas WavPack exhibited superior performance for popular music. We believe that NARU excels for quieter music, as classical and jazz music tends to have lower signal amplitudes than popular music (see Table 1).
6 Conclusions
We proposed two novel adaptive algorithms that introduce a natural gradient to the SA. The adaptive stepsize algorithm, NNGSA, exhibits certain similarities with wellknown algorithms such as NLMS and RLS. Furthermore, we demonstrated the superior performance of the proposed algorithms compared with the SA via toydata and realmusicdata experiments. In a future study, we will introduce an iterative method for estimation of the AR coefficients and expansion methods for affine projection algorithms [22].
We also proposed a novel lossless audio codec scheme based on the NGSA, namely NARU, which exhibited superior compression performance to existing codecs such as FLAC, WavPack, TTA, and MPEG4ALS. The NARU decoding speed was lower than those of the other codecs, excluding Monkey’s Audio. We found that the filter prediction and coefficient updating processes occupied the majority of the CPU time. Thus, we expect an acceleration of this process through optimization, e.g., though loop unrolling and explicit use of SIMD instructions. Finally, it is remarkable that NARU achieves competitive performance compared to other stateoftheart codecs despite its simple implementation.
In future work, we will add support for a highresolution bit (24bit or higher) Wav and perform further optimization for practical applications, including hardware support. We also plan to employ multichannel decorrelation methods [34] to compression rate improvement for multichannel audio.
We believe that the proposed methods are acceptable to other signal processing tasks, e.g., noise cancellation, audio enhancement, and system identification.
7 Appendix 1: Proposition proofs
For convenience in the following proofs, we employ the residual vector θ[n] between an unknown parameter h^{∗} and a current parameter h[n], as
and we define an exponent for the autocorrelation matrix R as
where Q is an orthogonal matrix and Λ is a diagonal matrix for which the diagonal elements are eigenvalues of R.
7.1 NGSA inequality
When Eq. (15) is employed,
where μ:=μ_{NGSA}. Multiplying both sides by \(\boldsymbol {R}^{\frac {1}{2}}\) from the left, and taking the square of the L2 norm \(\{\cdot }\_{2}^{2}\), we obtain
Taking the mean of Eq. (42) yields
where \(r = \mathrm {E}\left [{\{\boldsymbol {R}^{\frac {1}{2}} \boldsymbol {\theta }[1]}\_{2}^{2}}\right ]\). Dividing both sides by 2nμ and rearranging, we obtain
Hence, we obtain Eq. (16) by n→∞.
7.2 NNGSA convergence condition
In the case where Eq. (21) is employed,
where \(\varepsilon ^{\ast }[n] = d[n]  \boldsymbol {h}^{\ast \mathsf {T}}\boldsymbol {x}[n], \boldsymbol {P}[n] = \frac {\boldsymbol {R}^{1}\boldsymbol {x}[n]\boldsymbol {x}[n]^{\mathsf {T}}}{\boldsymbol {x}[n]^{\mathsf {T}} \boldsymbol {R}^{1} \boldsymbol {x}[n]}\) and μ:=μ_{NNGSA}. Taking the mean of Eq. (47), we obtain
as the mean gradient for the unknown parameter is 0. Furthermore, h[n] and x[n] are statistically independent, such that
We can denote E[P[n]] as
where \(\boldsymbol {q}[n] = \boldsymbol {\Lambda }^{\frac {1}{2}} \boldsymbol {Q}^{\mathsf {T}} \boldsymbol {x}[n], \boldsymbol {R}_{\boldsymbol {q}} = \mathrm {E}\left [{\frac {\boldsymbol {q}[n]\boldsymbol {q}[n]^{\mathsf {T}}}{\boldsymbol {q}[n]^{\mathsf {T}}\boldsymbol {q}[n]}}\right ]\). Furthermore,
holds as R_{q} is symmetric, where Q_{q} is an orthogonal matrix and Λ_{q} is the diagonal matrix in which the elements are eigenvalues of R_{q}. Hence, Eq. (49) is rewritten as
Therefore, to satisfy \({\lim }_{n\to \infty }\mathrm {E}\left [{\boldsymbol {\theta }[n]}\right ] = \boldsymbol {0}\),
is required, where λ_{qi} is the eigenvalue of R_{q}. Here, R_{q} is a positive semidefinite matrix and
holds. Hence, the eigenvalue range is
Therefore, the convergence condition is obtained when maxi∈{1,...,N}λ_{qi}=1.
7.3 Derivation of efficient naturalgradient update method
Employing Eq. (10), the elements of m[n] can be calculated as
Hence, we can denote m[n+1] as follows:
and the Mahalanobis norm \(\boldsymbol {x}[n]^{\mathsf {T}} \boldsymbol {K}_{p}^{1} \boldsymbol {x}[n] = \boldsymbol {m}[n]^{\mathsf {T}} \boldsymbol {x}[n]\) can be updated as follows:
7.4 Convergence condition for LMS/Newton algorithm
In the case that Eq. (27) is used,
where μ:=μ_{LMSN}. Taking the mean of both sides, we obtain
Here, Eq. (64) exploits the statistical independence between x[n] and h[n], and Eq. (65) utilizes the Wiener–Hopf solution. Subtracting h^{∗} from both sides, we have
Hence, for h[n] to converge to h^{∗},
is required [21], where η_{max} is the maximum eigenvalue of \(\boldsymbol {R}_{1}^{1}\boldsymbol {R}\). Furthermore, the eigenvalue range of R_{1} satisfies [29] the following:
More roughly, eigenvalues λ_{k} (k=1,...,N) satisfy
Therefore, employing the Rayleigh quotient,
Here, Eq. (73) exploits the fact that the maximum eigenvalue of R is smaller than tr[R]=Nσ^{2}.
8 Appendix 2: Toydata experiment results for other configurations
Figures 9 and 10 show learning curves for toydata experiments for −20 and −60 dB variance configurations.
Availability of data and materials
The NARU codec implementation is available at https://github.com/aikiriao/NARU.
Abbreviations
 SA:

Sign algorithm
 NGSA:

Naturalgradient sign algorithm
 LMS:

Least mean square
 NLMS:

Normalized least mean square
 NNGSA:

Normalized naturalgradient sign algorithm
 FIR:

Finite impulse response
 RLS:

Recursive least squares
 PCM:

Pulse code modulation
 MSD:

Mean square deviation
 AE:

Absolute error
 RWC:

Realworld computing
 RMS:

Root mean square
References
K. Konstantinides, An introduction to super audio CD and DVDaudio. IEEE Signal Proc. Mag.20(4), 71–82 (2003).
T. Moriya, N. Harada, Y. Kamamoto, H. Sekigawa, MPEG4 ALS international standard for lossless audio coding. NTT Tech. Rev.4(8), 40–45 (2006).
M. Hans, R. W. Schafer, Lossless compression of digital audio. IEEE Signal Proc. Mag.18(4), 21–32 (2001).
T. Robinson, Shorten: simple lossless and nearlossless waveform compression. Technical Report, Cambridge Univ., Eng. Dept. (1994).
T. Liebchen, MPEG4 ALSthe standard for lossless audio coding. J. Acoust. Soc. Korea. 28(7), 618–629 (2009).
Apple Lossless Audio Codec (2011). https://macosforge.github.io/alac/. Accessed 23 Apr 2022.
FLAC  free lossless audio codec (2011). https://xiph.org/flac/. Accessed 23 Apr 2022.
WavPack audio compression (2004). http://www.wavpack.com. Accessed 23 Apr 2022.
TTA lossless audio codec  true audio compressor algorithms (2005). http://tausoft.org/wiki/True_Audio_Codec_Overview. Accessed 23 Apr 2022.
Monkey’s Audio  a fast and powerful lossless audio compressor (2000). https://monkeysaudio.com. Accessed 23 Apr 2022.
R. F. Rice, in Appl. Digit. Image Process. III, 207. Practical universal noiseless coding, (1979), pp. 247–267.
H. Kameoka, Y. Kamamoto, N. Harada, T. Moriya, A linear predictive coding algorithm minimizing the GolombRice code length of the residual signal. Trans. Inst. Electron. Inf. Commun. Eng. A. 91:, 1017–1025 (2008).
P. S. Diniz, et al., Adaptive Filtering, vol. 4 (Springer, Massachusetts, 1997).
A. Gersho, Adaptive filtering with binary reinforcement. IEEE Trans. Inf. Theory. 30(2), 191–199 (1984).
L. Lu, H. Zhao, K. Li, B. Chen, A novel normalized sign algorithm for system identification under impulsive noise interference. Circ. Syst. Signal Proc.35(9), 3244–3265 (2016).
M. O. Sayin, N. D. Vanli, S. S. Kozat, A novel family of adaptive filtering algorithms based on the logarithmic cost. IEEE Trans. Signal Process.62(17), 4411–4424 (2014).
S. L. Gay, S. C. Douglas, in 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2. Normalized natural gradient adaptive filtering for sparse and nonsparse systems (IEEENew York, 2002), p. 1405.
S. I. Amari, Natural gradient works efficiently in learning. Neural Comput.10(2), 251–276 (1998).
T. Mineo, H. Shouno, in 2021 29th European Signal Processing Conference (EUSIPCO). Improving convergence rate of sign algorithm using natural gradient method (IEEENew York, 2021), pp. 51–55.
M. Siddiqui, On the inversion of the sample covariance matrix in a stationary autoregressive process. Ann. Math. Stat.29(2), 585–588 (1958).
W. Bernard, D. S. Samuel, Adaptive signal processing (Prentice Hall, Englewood Cliffs, 1985).
S. S. Haykin, Adaptive Filter Theory (Pearson Education India, 2005).
P. S. Diniz, L. W. Biscainho, Optimal variable step size for the LMS/Newton algorithm with application to subband adaptive filtering. IEEE Trans. Signal Process.40(11), 2825–2829 (1992).
P. S. Diniz, M. L. de Campos, A. Antoniou, Analysis of LMSNewton adaptive filtering algorithms with variable convergence factor. IEEE Trans. Signal Process.43(3), 617–627 (1995).
S. I. Amari, DifferentialGeometrical Methods in Statistics, vol. 28 (Springer, New York, 2012).
T. Petillon, A. Gilloire, S. Theodoridis, The fast newton transversal filter: an efficient scheme for acoustic echo cancellation in mobile radio. IEEE Trans. Signal Process.42(3), 509–518 (1994).
B. FarhangBoroujeny, Fast LMS/Newton algorithms based on autoregressive modeling and their application to acoustic echo cancellation. IEEE Trans. Signal Process.45(8), 1987–2000 (1997).
J. E. Markel, A. H. Gray, Linear Prediction of Speech (Springer, Berlin, 1982).
U. Grenander, G. Szegö, Toeplitz Forms and Their Applications (Univ of California Press, California, 1958).
H. Huang, P. Franti, D. Huang, S. Rahardja, Cascaded RLS–LMS prediction in MPEG4 lossless audio coding. IEEE Trans. Audio Speech Lang. Process.16(3), 554–562 (2008).
D. Salomon, Data compression: the complete reference (Springer Science & Business Media, Berlin/Heidelberg, 2007).
(International Organization for Standardization, Geneva, 1990).
M. Goto, H. Hashiguchi, T. Nishimura, R. Oka, RWC music database: popular, classical and jazz music databases. Ismir. 2:, 287–288 (2002).
Y. Kamamoto, N. Harada, T. Moriya, N. Ito, N. Ono, T. Nishimoto, S. Sagayama, in 2009 IEEE 13th International Symposium on Consumer Electronics. An efficient lossless compression of multichannel timeseries signals by MPEG4 ALS (IEEENew York, 2009), pp. 159–163.
Acknowledgements
The authors thank the associate editor and the anonymous reviewers for their constructive comments and useful suggestions.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Authors’ contributions
Taiyo Mineo: software and writing—original draft. Hayaru Shouno: writing, review, and editing. Both authors read and approved the final manuscript.
Authors’ information
Taiyo Mineo received a B. Eng. from the University of ElectroCommunications, Tokyo, in 2014, and received an M. Eng. from the Tokyo Institute of Technology in 2016. He was employed by CRI Middleware Co., Ltd., from 2016 to 2020, and is currently pursuing a Ph.D. in information engineering at the University of ElectroCommunications. His research interests include signal processing and machine learning.
Hayaru Shouno received a Ph.D. in Engineering from Osaka University, Osaka, in 1999. He is currently a Professor at the Graduate School of Informatics and Engineering, the University of ElectroCommunications, Tokyo. His research interests include computer vision and machine learning involving neural networks. He is an Action Editor of Neural Networks and an elected governor of the Asia Pacific Neural Network Society (APNNS).
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mineo, T., Shouno, H. Improving signalgorithm convergence rate using natural gradient for lossless audio compression. J AUDIO SPEECH MUSIC PROC. 2022, 12 (2022). https://doi.org/10.1186/s1363602200243w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1363602200243w