Estimating the first and second derivatives of discrete audio data

A new method for estimating the first and second derivatives of discrete audio signals intended to achieve higher computational precision in analyzing the performance and characteristics of digital audio systems is presented. The method could find numerous applications in modeling nonlinear audio circuit systems, e.g., for audio synthesis and creating audio effects, music recognition and classification, time-frequency analysis based on nonstationary audio signal decomposition, audio steganalysis and digital audio authentication or audio feature extraction meth-ods. The proposed algorithm employs the ordinary 7 point-stencil central-difference formulas with improvements that minimize the round-off and truncation errors. This is achieved by treating the step size of numerical differentiation as a regularization parameter, which acts as a decision threshold in all calculations. This approach requires shifting discrete audio data by fractions of the initial sample rate, which was obtained by fractional delay FIR filters designed with modified 11-term cosine-sum windows for interpolation and shifting of audio signals. The maximum relative error in estimating first and second derivatives of discrete audio signals are respectively in order of 10 − 13 and 10 − 10 over the entire audio band, which is close to double-precision floating-point accuracy for the first and better than sin-gle-precision floating-point accuracy for the second derivative estimation. Numerical testing showed that this performance of the proposed method is not influenced by the type of signal being differentiated (either stationary or non-stationary), and provides better results than other known differentiation methods, in the audio band up to 21 kHz.


Introduction
The infinitesimal differential calculus with h → 0 can be obtained only mathematically, for continuous-time and analytically obtainable derivatives.In the case of data measured by digital equipment, infinitesimal differential calculus is no longer valid because of the intrinsic discretization of the data being processed, i.e., sampling the data in time and quantizing its amplitude values.Discrete audio signals are sampled at equal-spaced time intervals, which usually span from T s = 1 8000 to T s = 1 384000 seconds with fixed-point precision varying from 8 to 24 bits or generated with 32-or 64-bit floating-point precision.Thus, the derivative obtained for a discrete signal is only an estimation of its true value at some point in time.
In this paper, a numerical differentiation method is proposed to estimate the first and second derivatives of discrete audio data with no assumptions about the data to be analyzed.The method is intended to minimize numerical errors down to the truncation and round-off errors.Details of the method are presented in Sect.2, and the method is evaluated and tested in Sect.3.

Numerical differentiation of discrete audio data
Numerical differentiation of discrete audio data finds numerous applications in solving ordinary and partial differential equations (ODEs and PDEs) as a numerical framework for modeling nonlinear audio circuit systems.It is used, for example, in audio synthesis and creating audio effects [75][76][77][78][79] and music recognition and classification [34,[80][81][82].It is specifically used for time-frequency analysis based on nonstationary audio signal decomposition (methods derived from empirical mode decomposition [83][84][85][86][87]), enhancement of spectral precision in Fourierbased methods [85,[88][89][90][91], audio steganalysis [92], digital audio authentication [93][94][95], acoustic event detection [96][97][98][99], feature extraction based on Mel-frequency cepstral coefficients (MFCC) [100][101][102][103][104][105][106][107][108], speaker and speech identification and recognition, and sound source tracking [107,[109][110][111][112][113][114][115][116][117][118][119][120].Audio data, such as music or speech, which are nonstationary over time, cannot be described by a mathematical expression.This is particularly a problem with the use of numerical differentiation which employs approximation to the analyzed data.Approximation may be treated as a smoothing operation or searching for a trend line in the data.Selecting a trend line is achieved by specifying a regularization parameter.The selection of algorithms and methods for finding the regularization parameter depends on the given requirements, but finally every regularization procedure compromises between "smoothness" and "roughness" of data estimate [27,42,43,45,52,59,73,[121][122][123][124][125].Averaging approximation of audio signal in the time domain translates directly into the frequency domain in the form of the attenuation of higher frequencies in the signal.Averaging corresponds to changes in the shape, cutoff frequency, and cutoff slope of the low-pass filter resulting from the chosen regularization method and selected parameter.A similar process of removing high-frequency content in the signal occurs in the group of numerical differentiation methods based on finite-difference calculation, which can be regarded as a special case of FIR filters known as differentiators.All these approximation methods try to find the best compromise between cut-off frequency, frequency transition region, and stop-band attenuation with an equivalent filter approach.Although averaging is appropriate in applications in which high-frequency content is regarded as noise, it is not acceptable for many kinds of digital audio signals which hold essential information in rapid changes in the amplitude over time.Consequently, there have been some attempts to make a regularization parameter variable based on the actual form of the signal [124,126].Nevertheless, no method can successfully separate audio signal from noise without distorting the signal.The numerical differentiation method proposed in this paper estimates the first and second derivatives of discrete audio data using central-difference formulas for calculations and makes no assumptions about the data being analyzed.The method does not incorporate smoothing of input data and employs additional procedures to minimize numerical errors.The main contribution of the proposed method, presented in detail in Sect.2, can be summarized as follows: • Instead of smoothing or filtering the high-frequency content of the input signal, the step size h (see Sect. 2 and 3) is used as the regularization parameter to be optimized.
• Additional procedures to minimize numerical errors employ fractional delay FIR filters designed with modified cosine-sum windows [127] for shifting and interpolation of audio signals and enhance numerical accuracy in estimation of the derivative.• As it will be shown, the maximum relative errors in the estimation of the first and second derivatives are small for discrete audio signals, respectively, of order 10 −13 and 10 −10 in the entire audio band, which is close either to double-precision or single-precision floating-point accuracy of calculations.
Section 3 shows the experiments designed to verify the performance of the proposed method and gives a comparison to other known methods of numerical differentiation, both by analysis of random input samples and through the differences in resulting transfer function.

Proposed method
The derivative of a discrete signal is only an estimation of its true value at some point in time because all calculations are performed with non-exact finite-precision arithmetic.Possible errors may result from simplified assumptions in the mathematical model, discretization error, convergence error, and round-off error (due to the finite-precision of numerical calculations).
The first-order derivative of a general signal f being a function of a variable x calculated at x 0 point can be expressed as the discrete finite central-difference formula [128]: where the step size h should be kept sufficiently small for the accurate approximation (i.e., preserve a small truncation error).However, a decrease in the step size h leads to subtractive cancelation which increases round-off errors.The challenge is to identify the optimal step size h 0 to avoid conditions in which the decreasing truncation error is dominated by the round-off error (Fig. 1).The optimal h 0 can be found by estimating the round-off and trunca- tion errors associated with (1).According to [9] and truncating all terms of Taylor series expansion greater than 3, these errors can be estimated as: where true values of f (x 0 ± h) from ( 1) are represented as a sum of approximations f (x 0 ± h) and round-off errors defined as err x 0 ±h .
An absolute value of the upper bound of total error can be represented as: Assuming that the eps (precision value of floatingpoint numbers) in ( 3) is set as the upper bound of ( 1) round-off error, and maximum value of f (3) (ξ ) in ( 2) is set to P.Then, the optimal step size h 0 can be determined by differentiating (3) with respect to h and imposing the resulting derivative equal to zero, and then solving for h: Figure 1 shows that, for the step size h < h 0 , the rela- tive error of finite central-difference approximation is determined by round-off errors.Otherwise, for h > h 0 , the truncation error dominates.

Derivation and analysis of proposed method
For estimation of the first and second derivatives, the 7-point stencil central-differences approximation following [5] was used.The approximation of the first derivative is given by the formula: for N = 7 and coefficients c k = [45, −9, 1] .The approxi- mation of the second derivative is given by: (4) h 0 ≤ 3 3 • eps P . (5) for N = 7 , c k = [270, −27, 2] , c 0 = −490 , where h is the step size, N is the order, and c k for (k = 0, ..., (N − 1)/2) are the coefficients of central-difference approximation ( c 0 occurs only in the case of the second-order derivative).
As depicted in Fig. 1, the choice of the step size h is critical for the accuracy of the derivative estimation.For this reason, the derivative estimation defined as F was computed for several h values as: Since discrete audio data are sampled at equalspaced intervals h = T s = 1/f s , the value of h is fixed over time.Calculations performed for h = 0.1 • T s , h = 0.01 • T s , and h = 0.001 • T s require components in ( 5) and ( 6) to be shifted by corresponding fractions of the initial sample rate.The required shifting operation was performed by fractional delay FIR filters designed with the modified 11-term cosine-sum window as proposed in [127] as in (8) below: (7) Fig. 1 Relative error of first derivative approximation of function f (x) = sin(π t) at point x 0 = π 3 on the machine with 52 bits of mantissa as a function of step size h for a different order of approximation.Vertical dashed line represents optimal step size h 0 = 5.964 • 10 −6 calculated with (4) for j = 0, ..., N − 1 , where N is filter's order (number of coefficients), K = 11 is number of cosine-sum terms from [127], and δ is the fractional shift of the filter.Fil- ter coefficients are calculated by multiplying this window function with the sinc function given in (9) below: To increase the accuracy in derivative estimation, input data were 64 times oversampled.Maximum relative errors in the estimation of the first and second derivatives by F (1) (x) and ) and 300 sinu- soidal input signals of frequencies randomly varying from 1 to 21000 Hz (sampling rate f s = 44100 Hz) are shown in Figs. 3 and 4. Maximum relative errors were calculated as max F (1,2) (x) − f (1,2) (x) /max f (1,2) (x) , where f (1,2) (x) were exact derivatives computed analytically.
The results shown in Figs. 3 and 4 reveal that there is no optimal step size h for estimating the derivatives over the whole audio frequency band.The step size h should be increased at lower frequencies and decreased at higher frequencies to achieve the lowest relative error of calculations.There are, however, (9) characteristic points where the maximum relative errors for different step sizes intersect with each other.Knowing the data-dependent optimum points (marked as circles in Figs. 3 and 4), it is possible to derive formulas for estimating the derivatives with the highest possible accuracy (minimum error).
Considering the step size h as a regularization parameter, the optimum step sizes h 07 and h 08 (where 07 and 08 indicate the order of central-difference approximation for step sizes from 10 2 • T s to T s needed to get the optimum step size) calculated for 9th order derivative approximation were obtained and used as the threshold values to select ranges in derivative estimations by (7) which provide the lowest maximum relative error.For the first derivative estimation, threshold values using the 9-point stencil central-differences approximation were obtained as: , where for m = [0, 1, 2] , N = 9 , c k = [−14, 14, −6, 1] and for the second derivative estimation as: (10)  Finally, the first and second derivative estimations F (1) (x) and F (2) (x) given in (7) were modified and derived as vectors which are comprised of derivatives (13) f (1) (x) and f (2) (x) using ( 5) and ( 6) at specific sam- ple indexes [ x 1 , ..., x 6 ] to provide the lowest maxi- mum relative error.These vectors correspond to the derivatives estimated for step sizes of h = 10 m • T s ∀ m = [2, 1, 0, −1, −2, −3] and occur in the ranges speci- fied by the thresholds calculated through (10) and (12).Estimations of the first and the second derivatives are respectively formulated by ( 14) and ( 15) and described with an Algorithm 1.  4 Maximum relative errors (dot marks) of second derivative estimation using (7).Other details as in Fig. 3 Algorithm 1 The proposed method for the first and second derivatives estimation Threshold values of the h 07 , h 08 and the div parameter used in ( 14) and (15) (empirically determined constant dependent on sampling frequency) allow for switching between derivatives calculated with different step sizes h which results in 10 −13 and 10 −10 accuracy (maximum relative errors) of the approximation in the whole audio band.The maximum relative error of estimating the first and second derivatives using the proposed method through (14) and ( 15) is shown by 'x' markers in Figs. 5 and  6 (which at higher frequencies form a thick bottom line).( 14)

Comparison with other numerical differentiation methods
Since the exact derivatives of real-world audio signals cannot be calculated, experiments based on synthetic data were conducted.Two kinds of experiments with the use of, respectively, stationary and nonstationary signals, were performed.The testing was intended to compare the proposed method with other numerical differentiation methods.

Experiments with stationary synthetic data
Stationary synthetic data have been generated as a sum of four harmonic signals with randomly selected frequencies and arbitrarily chosen amplitudes as defined by the following formula: (16) f n (x) = 0.5 • sin(ω 1,n x) + 0.15 • cos(ω 2,n x) + 0.2 • sin(ω 3,n x) + 0.15 • cos(ω 4,n x) Fig. 5 Maximum relative error of first derivative estimation obtained using proposed method ("x" marks).Input signals were sampled at 44100 Hz and 64 times oversampled.Grayed-out markers are the maximum relative errors as obtained in Fig. 3 Fig.6 Maximum relative error of second derivative estimation obtained using proposed method ("x" marks).Input signals were sampled at 44100 Hz and 64 times oversampled.Grayed-out markers are the maximum relative errors as obtained in Fig. 4 for n = [1, ..., 300] , where ω 1,n , ..., ω 4,n are randomly selected frequencies respectively from the range , , [3501-10000], and [10001-22050] Hz.The test data consisted of n = 300 sets of such gen- erated synthetic data.To simulate actual signal conditions more realistically, a noise was added to the harmonic signal (16).Two ( M = 1, 2 ) normally distrib- uted noise M realizations with a standard deviation of σ M = [2.2204• 10 −16 , 0.001] were used.The noise was added to the harmonic signal f n (x) as shows ( 17): where M denotes actual noise used, and n represents the four-sine frequencies set.In all calculations, the sampling frequency f s = 44100 Hz was used.The differ- entiation methods were compared as the error ratio of SNR 1,2,M,n /SNR 0,M,n (where SNR is the signal-to-noise ratio) for n = 300 data sets and M = 1, 2 noise distribu- tions where SNR 1,2,M,n are signal-to-noise ratios of the estimated first and second derivatives, and SNR 0,M,n is the signal-to-noise ratio of the input signal.The level of errors SNR 0,M,n in the input data has been characterized by the M signal-to-noise ratios, defined as: where L = 6000 is the length of the input data sequence, f n is the input harmonic data sequence, and fM,n is the input harmonic data sequence with M = 1, 2 noise dis- tributions.Signal-to-noise ratios SNR 1,2,M,n of the esti- mated first and second derivatives have been derived as follows: where M,n are estimates of the first and second derivatives and f (1,2)  M,n are true derivatives calculated analytically.

Experiments with nonstationary synthetic data
Nonstationary synthetic data were generated by adding the FM signal v(x) to the previously defined by ( 16) signal f n (x) in the following way: where ω A and ω B were set so that v(x) changes its fre- quency from 1 to 21000 Hz during one sinusoidal cycle in a data sequence of length L = 6000 .As in the previ- ous experiment, the noise was added to input data in the same way as shown in (17).The differentiation methods were compared as the ratio of SNR 1,2,M,n /SNR 0,M,n for n = 300 input harmonic data with FM modulation sets and M = 1, 2 noise distributions.

Comparison material
The method proposed in this paper and described in Sect. 2 was compared with the following numerical differentiation methods: • Algorithm for numerical differentiation of discrete functions with an arbitrary degree and order of accuracy with the use of the closed explicit formula presented by H. Z. Hassan et al. in [13].The algorithm is based on the method of undetermined coefficients and the closed form of the Vandermonde inverse matrix (the method labeled later as "Hassan").The "Hassan" numerical differentiation has been performed with the order of 8. • MaxPol package written in MATLAB (labeled later as "MaxPol") which is a comprehensive tool for numerical differentiation.The MaxPol is based on the method of undetermined coefficients to render a variety of FIR kernels in a closed form that can be used to approximate the full-band or low-pass derivatives of discrete single or multidimensional signals (images) [14,15].Numerical differentiation was performed with a centralized FIR derivative kernel for the full-band operation.• Ordinary 9-point stencil central-difference formulas (hereinafter referred to as "Central-Diff ").

Results of experiments
The dependence of error ratio SNR 1,2,M,n /SNR 0,M,n for n = 300 stationary and nonstationary sets of experi- mental data, and M = 1, 2 noise distributions, obtained for each of the compared methods are presented in Figs. 7, 8, 9 and 10.Figures 7 and 8 show the results for the stationary signals, for the first and second derivatives, respectively.Correspondingly, Figs. 9 and 10 show the results for the nonstationary signals.The error ratio for the method proposed in this work is shown by the solid line.The "Hassan, " "MaxPol, " and "Central-Diff " methods used for the comparison are shown with the dash-dotted, dotted, and dashed lines, respectively.The sets of experimental data are shown along the abscissa and represent the random selection of input data.The error ratio is shown on the linear scale on the ordinate.The noise level shown for each condition represents the noise added to the input data.The results presented in Figs. 7, 8, 9 and 10 reveal that the proposed method gives stable results which are not prone to changes in the input data sequence, both for stationary and nonstationary cases.Among other methods, only the "MaxPol" method performance is close to that of the proposed method in the estimation of the second derivative for the stationary data.In all other cases, the proposed method results are consistently better in performing the numerical differentiation of either stationary or nonstationary data.The presented results also reveal some general characteristics of the compared methods in the estimation of the first and second derivatives in discrete audio signals: • The "Central-Diff " difference formulas and the "Hassan" method are more computationally efficient than "MaxPol" and the proposed method.Estimation of the first derivative for stationary data gives comparable results for the "Central-Diff " formula and "Hassan" method both for no noise and − 80 dB noise conditions (Fig. 7).For the estimation of the second derivative, the "Hassan" method provides slightly better results than the 'Central-Diff ' formulas (Fig. 8).Similar differences in the performance of these two methods are apparent in the results for nonstationary data (Figs.8 and 10).• Full-band FIR kernel performance in MaxPol package is very prone to random selection of samples (Figs. 7 and 8) thus to the frequency content of the input data.It was found that when the stationary input data contained frequencies above 15000 Hz (see ( 16)), the performance of the method drastically decreased.The experiments with nonstationary data revealed that the "MaxPol" method produces lower error ratios than the "Central-Diff " and "Hassan" methods.The performance of the "MaxPol" method is also lower as compared to the proposed method, which is especially evident for nonstationary signals (Figs. 9 and 10).• The noise added to the input data improves to some extent the performance of numerical differentiation of all compared methods.The likely reason is that the random error introduced by the added noise to the input harmonic signal decorrelates consecutive samples.This in turn decreases the computational round-off error during the numerical differentiation process.

Transfer functions
The advantage of the proposed method is seen in differences between transfer functions showing the impact of the frequency content of input signal on the performance of the numerical differentiation.Transfer functions for the proposed method and the three other methods were calculated as a relationship between Fourier transforms of the estimated derivatives and the input signals, for 300 frequencies varying from 20 to 20000 Hz (the resolution was set to 8192 with 512-point Hann window).The left panels in Figs.11 and 12 show the transfer functions of selected methods, respectively for the first and second Fig. 9 First derivative error ratio SNR 1,M,n /SNR 0,M,n for nonstationary synthetic data (18).Other details as in Fig. 7 Fig. 10 Second derivative error ratio of SNR 2,M,n /SNR 0,M,n for nonstationary synthetic data.Other details as in Fig. 7 derivatives.The performance of the proposed method (shown with the thick solid line) is nearly identical with the ideal differentiator response over the whole audio frequency band for the first and second-order numerical differentiation, which is not the case for other methods.The right panels in Figs.11 and 12 show the relative error between the transfer function of the ideal differentiator, proposed method, and "MaxPol" method (as the best out of the other methods).In nearly all cases the performance of the proposed method is better than that of the "MaxPol" method.The only exception is the estimation of the second derivative with input signals below 15000 Hz (Fig. 12, right panel) but as it is seen in Fig. 8 only for stationary input signals.

Conclusions
The paper addresses the problem of estimating the first and second derivatives of discrete audio data.Numerical differentiation of discrete audio data has several applications.For example, it is particularly important in the development of numerical solvers for ordinary and partial differential equations PDEs and ODEs.These are fundamental in modeling audio circuit systems for digital audio effects and synthesizers.The audio signal is always a complex combination of the components ranging four decades in frequency, from a few Hz to tens of thousands Hz, which are processed by both linear and nonlinear systems with parameters varying over time.Thus, it is not possible Fig. 11 Transfer functions of numerical differentiation methods for estimation of the first derivative in the audio band (left panel) and relative error for proposed method and the "MaxPol" method (right panel) Fig. 12 Transfer functions of numerical differentiation methods for estimation of the second derivative in the audio band (left panel) and relative error for proposed method and the "MaxPol" method (right panel) to derive an analytical mathematical expression for music or speech signals to be applied in the evaluation numerical methods used for differentiation.Discrete audio data consist of a sequence of samples that occur at a specific fixed time order therefore every preprocessing operation (like smoothing or approximation) disrupts audio data in some way.For this reason, it is important that the proposed method for estimating the first and second derivatives makes no assumptions about the data being analyzed and does not incorporate smoothing or filtering while preprocessing.To achieve the best possible numerical accuracy in the whole audio band, the step size h should be treated as a regularization parameter and is made variable based on the input signal frequency range.This was achieved with very precise fractional-delay FIR filters designed for interpolation and shifting of the processed audio data.
The comparison with three existing numerical differentiation methods showed that the performance of the proposed method is consistently better than of the other methods, especially in the case of nonstationary discrete audio data.Future research in employing the proposed method for time-domain analysis and modeling of digital audio systems should consider further investigations on increasing the numerical accuracy in estimating the second-order derivative.

Figure 2
Figure 2 shows frequency and phase response for one of the designed filters.It shifts input signal by δ = 0.06 fraction of T s , has a number of 8001 coefficients, and works with f s = 44100 • 64Hz and cutoff frequency f c = 22050 Hz .Phase-delay plot on the left panel shows the shift of 0.06 fraction of sampling time and passband ripples of the 10 −14 order on the right panel.To increase the accuracy in derivative estimation, input data were 64 times oversampled.Maximum relative errors in the estimation of the first and second derivatives by F (1) (x) and F (2) (x) in (7) with different step sizes( h = 10 m • T s ∀ m = [2, 1, 0, −1, −2, −3]) and 300 sinu- soidal input signals of frequencies randomly varying from 1 to 21000 Hz (sampling rate f s = 44100 Hz) are shown in Figs.3 and 4. Maximum relative errors were calculated as max F (1,2) (x) − f (1,2) (x) /max f (1,2) (x) , where

Fig. 2
Fig. 2 Example filter design with modified 11-term cosine-sum window [127] for F s = 64 • 44100 Hz, N = 8001 taps, f c = 22050 Hz, and delay of 0.06 • T s .Frequency response and phase delay are shown on the left and filter's passband ripples are shown on the right panel

Fig. 7 Fig. 8
Fig.7 First derivative error ratio SNR 1,M,n /SNR 0,M,n for stationary synthetic data generated using(17).Method proposed in this work is shown by the solid line, methods used for the comparison are shown with the dash-dotted, dotted, and dashed lines, respectively.Left and right panels show calculations for added noise levels of − 313 dB ( M = 1 ) and − 80 dB ( M = 2 ), respectively