 Research
 Open Access
 Published:
Reversible audio data hiding algorithm using noncausal prediction of alterable orders
EURASIP Journal on Audio, Speech, and Music Processing volume 2017, Article number: 4 (2017)
Abstract
This paper presents a reversible data hiding scheme for digital audio by using noncausal prediction of alterable orders. Firstly, the samples in a host signal are divided into the cross and the dot sets. Then, each sample in a set is estimated by using the past P samples and the future Q samples as prediction context. The order P + Q and the prediction coefficients are computed by referring to the minimum error power method. With the proposed predictor, the prediction errors can be efficiently reduced for different types of audio files. Comparing with the existing several stateoftheart schemes, the proposed prediction model with expansion embedding technique introduces less embedding distortion for the same embedding capacity. The experiments on the standard audio files verify the effectiveness of the proposed method.
Introduction
Reversible data hiding technique is used for embedding data in a host signal and the host signal can be completely recovered [1]. It is used for keeping host signal such as medical images and audio files losslessly. There are two significant criterions for reversible data hiding techniques: the embedding capacity should be large while the distortion should be low. These two criterions conflict with each other. Usually, a higher embedding capacity is accompanied by a higher distortion.
Early reversible data hiding algorithms mainly focused on lossless compression. To embed data into a host signal, vacant space was made by compressing a part or even the whole host signal. Fridrich et al. proposed reversible data hiding algorithms using compression of bitplane [2] and vector state [3] for better performance. In [4], Celik et al. proposed a lossless generalizedLSB data hiding method which compressed a set of selected features from an image and embedded the payload in the space made by the compression. The type of methods usually achieved a low capacity with severe distortion.
For improving data hiding performance, Tian in [5] introduced a difference expansion (DE)based method, in which every two pixels were grouped together to produce one highpass coefficient and one lowpass coefficient. Then, a highpass coefficient should be expanded to carry 1 bit. That is to say, two pixels were used to embed 1 bit. To solve the overflow and underflow problems, a location map should be used to mark the out of range pixels and embed together with the payload. Therefore, the embedding capacity is at best 0.5 bit/pixel. Tian’s method is a fundamental work of reversible data hiding and has been developed in many aspects, such as Alattar’s technique which embedded two data bits in every three pixels [6], the reduction of the size of location map [7] and the strategy to generalize DE into integer transform [8, 9].
Another type of improvement called prediction error expansion (PEE) has exceeded the DEbased methods. In these schemes, pixels were first predicted by their contexts, and the prediction error was used for data embedding through expansion. The superiority of PEE is that it can better explore the correlation to improve the prediction performance and reduce the embedding distortion. In [10], Thodi and Rodriguez proposed a histogram shifting method for embedding data in prediction errors. This paper established the foundation of PEE. Then, the two authors also proposed an improvement’s method based on difference expansion technique [11]. There are many different predictors for PEE, such as partial difference expanding (PDE) predictor [12], edgedetection mechanism (MED) [13] predictor, Gaussian weight predictor [14], or accurate predictor [15].
On the basis of DE and PEE, histogram shifting (HS) technique has been developed. HSbased scheme was first proposed by Ni et al. [16]. The significant part of the scheme was to shift the right and left bins of the peak frequency bin to make room for data embedding. Thus, the number of the peak frequency bin determines the embedding capacity. These schemes may include blocking or area selection methods just like the approach shown in [17]. Its embedding capacity was usually smaller and the embedding distortion was unstable. For bigger capacity and lower distortion, some works have combined PE with HS, such as the reference [18]. A sharper predictionerror histogram can be obtained from PE while HS can reduce embedding distortion.
For better prediction performance, Yan and Wang proposed a predictionerror expansion method using linear prediction [19] which used past eight samples for prediction and the prediction coefficients were integers and the order was fixed. In [20], Nishimura combined linear prediction method and error expansion technique that the past eight samples used to compute prediction coefficients. For exploring the correlation of the neighbor pixels/samples adequately, in [21], a noninteger prediction error expansion embedding method was proposed. In this method, the prediction value of the current sample was the mean of its two closest samples. Sachnev et al. [22] proposed a doubleembedding scheme, which separated an image into two sets so that the pixels can be predicted with four immediate pixels. Hu et al. [23] presented an image data hiding scheme by using minimum rate prediction and optimized histogram modification method.
There is still room for improvement in these PEEbased excellent works by using better prediction method with different order for different clips. In this paper, the PEE technique is further explored and a new reversible audio data hiding scheme is presented with two improvements to PEE:

1)
Noncausal predictor. Due to conventional predictors of PEE which the prediction coefficients keep unchanged [19, 21, 22] or only past samples (or pixels) are used as prediction context, the redundancy can not be explored effectively [19, 20]. To answer this question, we proposed a new noncausal predictor by combining the advantages of linear predictor and conventional noncausal predictor. This predictor is designed for the doubleembedding scheme in which the prediction coefficients can be adaptively calculated by minimum error power method.

2)
Alterable orders. Unlike conventional predictors of PEE which the prediction order is fixed [19–23] and where the prediction errors can not be effectively reduced for different audio files, the noncausal linear predictor with alterable orders is proposed in this paper. The optimal prediction order can be chosen according to the complexity of an audio file by using minimum error power method.
Owing to our improvements, the sharper predictionerror histogram can be obtained for the reduction of embedding distortion. With several standard clips, experimental results have shown that the prediction orders are different for different clips, and the best prediction performance can be achieved for a candidate file. Comparing with existing reversible audio data hiding methods, the proposed one has lower distortion at the same embedding rate.
The rest of the paper is organized as follows: the proposed scheme is described in Section II, and the experimental results in comparison with several existing excellent methods are reported in Section III. The Conclusions is in the last section.
The proposed scheme
This section presents the proposed noncausal prediction model in detail, which can provide satisfactory prediction accuracy for different clips. The doubleembedding strategy [22] is introduced for the proposed prediction model to form the proposed highcapacity reversible data hiding scheme.
Doubleembedding strategy
The doubleembedding strategy has been proposed for reversible image data hiding in [22] by dividing an image into two sets like a chess board. In such a way, the pixels in a set can be predicted with its four immediate pixels in the other set. In the encoder, the first set was marked at first. In the decoder, the second set was recovered at first.
In this paper, an audio sequence is divided into two sets: cross set and dot set, as shown in Fig. 1. The samples in the cross set are predicted for expansion embedding at first. The detailed embedding and extraction operations are described in part E and part F of Section II.
Noncausal prediction model
In the proposed prediction model, a sample is estimated by using the linear combinations of its P past samples and Q future samples as prediction context. This is more efficient to reduce the prediction error than only using the past samples as prediction context. The prediction value \( {\overline{x}}_i \) of the current sample x _{ i } is given by:
Where P and Q are integers, and a _{ k }(k = 1, 2, …, P + Q) are the prediction coefficients. K = P + Q is defined as the order of the prediction model in this paper.
Estimate of prediction coefficients
Before the prediction step, we use a sorting model to sort the distances of the current sample and those neighboring samples (the past 40 samples and the future 20 samples). First, we calculate the distance between the current sample and the neighboring samples as
where dP _{ p }(p = 1, 2, …, P) is the distance between the current sample and the past p samples while dQ _{ q }(q = 1, 2, …, Q) is the distance between the current sample and the future 2q − 1 samples, L is the number of the samples in the cross or the dot set. And \( L=\left\lfloor \frac{N}{2}\right\rfloor \) where N is the length of the audio file.
After the distances have been calculated, we propose a sorting method to sort the distances. For example, if dP _{1} < dQ _{1} < dP _{3} < dP _{2} < dQ _{2} and the optimal K is 3, we let P = 2 and Q = 1. For each i, we use x _{ i − 1}, x _{ i − 3} and x _{ i + 1} to calculate the prediction coefficients. For better expression, we denote \( {x}_i^{P_1} \) as x _{ i − 1}, \( {x}_i^{P_2} \) as x _{ i − 3}, \( {x}_i^{P_3} \) as x _{ i − 2}, \( {x}_i^{Q_1} \) as x _{ i + 1} and \( {x}_i^{Q_2} \) as x _{ i + 3}. In other words, we use \( {x}_i^{P_1} \), \( {x}_i^{P_2} \) and \( {x}_i^{Q_1} \) for prediction.
For the sorting method, we modify the Eq. (1) as
For the current sample x _{ i }, we denote the sample set \( {U}_i^K \) as its prediction context and the set A _{ K } as its prediction coefficients, formulated as follows:
and
Where T is the transposition operation on a matrix.
In this paper, we propose to use minimum error power method [24] to estimate the prediction coefficients by computing the minimum error power. For a given K value, the error power value ρ ^{K} in the cross or dot set can be computed as
Referring to (3), there are \( 2\left\lceil \frac{P}{2}\right\rceil + Q \) samples not predicted for the computation.
From (6), we can compute the prediction coefficients A ^{K} by minimizing ρ ^{K}. This can be done by the following formulation,
From (6) and (7), we have the following deduction,
From (8), the prediction coefficient set A ^{K} can be computed by the following expression,
After the prediction coefficients A ^{K} are estimated, the minimum error power value with the order K can be computed by referring to (6).
The prediction order
How to compute the prediction order K is a crucial step since it plays an important role for the reduction of the prediction errors. Too small a size can not effectively explore the correlation among samples, and too large a size will bring negative effects since a sample not close to the current sample has less correlation. For different audio files, the order K may be different in order to achieve an ideal prediction accuracy. In Section IIC, we have shown that for a given order K, the minimum error power ρ ^{K} and the corresponding coefficient set A ^{K} can be computed for the prediction. For different order values, we can get different minimum error power values. Among all the minimum error power values, the smallest one is corresponding to the order and the prediction coefficients used for reversible data hiding.
For better description, in this work we denote K _{1} and K _{2} as the orders of the prediction coefficients in the cross set and dot set, respectively. Let \( {A}^{K_1} \) be the prediction coefficients for the cross set while \( {A}^{K_2} \) for the dot set.
Data embedding and extraction methods
After the prediction, expansion embedding combined with histogram shifting techniques proposed in [10] are applied to hide information bits reversibly. A threshold value T is defined by referring to the embedding capacity. The prediction errors in the range [−T, T] will be expanded to carry the data bits while those not in [−T, T] are shifted to make room for the expansion.
In the encoder, the samples in the cross set are predicted and watermarked at first. Suppose the prediction value of the current sample x _{ i } is \( \overline{x_i} \), the prediction error e _{ i } is calculated as
Then, the information bits can be inserted by the following rules:
Where D _{ i } is the prediction error after expansion embedding and b is a bit to be hidden. After the embedding, the sample x _{ i } is watermarked as
Once the watermark embedding operations on the samples in the cross set have been finished, the similar embedding process will be implemented on the samples in the dot set. Figure 2 shows prediction, watermark embedding, and watermark extraction processes of the cross set and the dot set. In Fig. 2a, the original samples (unshadowed) are used to predict the cross set, and then watermark bits are embedded into the cross set in Step 1.1. After that, the dot set samples are predicted by original dot original samples and watermarked cross samples, and the watermark bits are inserted into the dot set in Step 1.2.
In the decoder, we extract the hidden bits from the dot set and recover the samples in this set at first. For the sample x _{ i }, the sample prediction operation can be used to obtain \( \overline{x_i} \). Then we have
The hidden information bit is extracted and the original sample is recovered as
and
and
Once the decoding operations on the samples in the dot set have been finished, the similar extraction process is implemented on the samples in the cross set. As shown in Step 2.1 in Fig. 2, we first recover the original samples of the cross set and extract the payload. Then the original samples of the dot set are recovered and the payload is extracted completely by Step 2.2. The sketch of the proposed watermarking scheme is shown in Fig. 3.
Auxiliary information
In the proposed scheme, the auxiliary information includes the threshold values (T _{1} for the cross set and T _{2} for the dot set), the prediction orders (K _{1} = P _{1} + Q _{1} and K _{2} = P _{2} + Q _{2}), and the prediction coefficients (\( {A}^{K_1} \) and \( {A}^{K_2} \)). The auxiliary information should be inserted into the cover signal for blind extraction.
In experimental way, the size of the auxiliary information is assigned as follows:

1.
In the testing, all the samples can be used for reversible data hiding when the threshold value is bigger than 800. So, we use 20 bits to reserve the threshold values T _{1} (10 bits) and T _{2} (10 bits) since 10 binary bits can represent 1024 at most.

2.
We use 12 bits to reserve the values of K _{1} (6 bits) and K _{2} (6 bits). The basic reason is that in the testing the threshold value is always smaller than 64 for all the clips.

3.
In our testing, the prediction coefficients in magnitude are always smaller than 10. For a tradeoff between prediction accuracy and embedding efficiency, all the coefficients only keep two decimal places by using rounding operation. For example, when the prediction coefficient a1 is 1.4433, it will be rounded to 1.44; when a2 is −0.3852, it is rounded to −0.39. After expanding one hundred times, we can use 11 bits to represent a coefficient (10 bits for the magnitude, 1 bit for the sign).

4.
In the embedding, the underflow and overflow problems have been considered by using location map. For a sample with the underflow or overflow problem, we use 25 bits to mark its position since most of the clips (44.1 kHz in duration) are not longer than 12 min. Due to the fact that the proposed prediction model has higher accuracy, there are lesser samples with the underflow and overflow problems by testing all example clips.

5.
Considering the auxiliary information above, in our scheme we use 12 bits to save the length of the auxiliary information, which can indicate 4096 bit of auxiliary information at most.
In the encoder, the LSB values of the first M + 12 samples are saved as part of the payload to reversibly embed into the cover signal. Here, M is the length of the auxiliary information. We use the LSB positions of the first 12 samples to record the length. Then the LSB positions of the next M samples are used to keep the auxiliary information. In the decoder, the auxiliary information is first extracted from the LSB values of the first M + 12 samples for the extraction of the hidden bits and the recovery of the cover signal.
Experimental results
In reversible data hiding community, embedding rate and distortion are two significant criterions. In the testing, we use signal to noise ratio (SNR) and PEAQ software to choose objective difference grade (ODG) to measure the watermark distortion of reversible data hiding schemes. The bit per sample (bps) is adopted to measure the embedding rate. The test data set includes 70 standard audio files (the wave format with the sampling rate of 44.1 kHz) [25]. Here, four clips marked by 39, 49, 64, and 66 are randomly selected as example clips for report.
Figure 4 shows the different K _{1} (the cross set prediction order) and error power values for 4 example clips. We can see that different audio clip have different K _{1}. Figures 5, 6, 7, and 8 show the different K _{2} (the dot set prediction order) for 6 audio clips with two thresholds 5 and 15 in case of low capacity while with two thresholds 50 and 80 in case of high capacity. We also observe that for the same clip, the order K _{2} is often different from K _{1}. The K _{2} value mainly depends on K _{1} and the low capacity thresholds (5 and 15). In other words, the larger K _{1} and the lower capacity thresholds, the larger K _{2}. The basic reason is that K _{1} is estimated by using the original samples but K _{2} is not. After the cross set is watermarked, the embedding distortion has an effect on the computation of K _{2}.
Table 1 shows the four different types of example clips. For each clip, the order of the prediction coefficients for the cross set K _{1} is computed and listed. We can see that different types of audio files have different optimal prediction orders. For all the 70 audio clips, we show their optimal K _{1} values in Fig. 9.
Figures 10, 11, 12, and 13 plot the histograms of the proposed predictor, DE predictor [5], linear predictor [20], and noncausal predictor [21] by using the four clips marked by 39, 49, 64, and 66, respectively. We can see that the proposed predictor provides the smallest error power and the prediction errors are closer to zero. The error power of the other schemes can be estimated by (17), where N is the length of the audio file. For the other clips, the simulation results are similar. That means the proposed predictor can better reduce prediction errors. The main reason is that the proposed prediction model can better explore the correlation property of the samples for different types of audio files.
For the four example clips, we test the performance of the proposed scheme against three existing state of the art works [20–22]. The method [22] proposed for twodimensional image files can be adapted for onedimensional audio clips by rounding the average of two neighboring samples as the predicted value. Figures 14, 15, 16, and 17 plot the experimental results on four clips. We can see that for the same embedding capacity, the proposed scheme obtains the highest SNR values than the other three schemes. The basic reason is that we can use different orders of predictors to reduce the prediction error in noncausal way for different types of clips. In the previous schemes, the order of the predictor is fixed for different clips or only past samples are used as prediction context. Similarly, from Figs. 18, 19, 20, and 21, our method has the highest ODG values than the other three schemes on the four example clips.
Figure 22 shows the SNR results on 70 audio clips by 1 bit per sample. We can see that in most audio files, our method has the highest SNR. That means our prediction model has the least distortion for most of the clips. Figure 23 shows the ODG experimental results on 70 audio clips by 1 bit per sample. As we can see, lower ODG can be achieved for most of the clips.
Table 2 shows the average ODG value, average SNR value, the percentage of the best SNR values and the percentage of the best ODG values in all the 70 audio clips. We can see that the proposed method has the best performances for most of the clips.
By choosing four different types of clips, Table 3 lists their durations, computational costs in the embedding, and the computational cost in the extracting by using four reversible data hiding schemes. The test software is Matlab R2012a running with the computer of i54690K Processor and CPU Speed of 4.4 GHz. In the proposed scheme, the computational cost in the embedding phase is somewhat higher since the prediction orders and the prediction coefficients are needed to be estimated at first for data hiding. For clip 39 with duration of 2 min and 17 s, the embedding cost is 49 min and 5 s. And the computational cost is related to the duration. From the perspective of applications, higher computational cost in the embedding phase is acceptable since the authentication process is implemented in the extraction phase. In the proposed scheme, the decoding process is satisfactory since the auxiliary information has been restored for blind extraction.
Conclusions
The paper presents a reversible audio data hiding scheme by using noncausal prediction with alterable order. For an audio clip, the optimum order and the prediction coefficients can be achieved by using the minimum error power method. As a result, the proposed prediction model can better explore the correlation of the samples. Experimental results have shown that the proposed prediction model provides a satisfactory prediction precision for different types of clips. And, the proposed scheme (by combining the doubleembedding strategy and the proposed prediction model) has lower embedding distortion for the same embedding rate in comparison with several existing excellent works.
References
 1.
YQ Shi, Z Ni, D Zou, C Liang, G Xuan, Lossless data hiding: fundamentals, algorithms and applications, in Proc. IEEE ISCAS, vol. 2, 2004, pp. 313–336
 2.
J Fridrich, M Goljan, R Du, Invertible authentication, in Proc. SPIE Security Watermarking Multimedia Contents, San Jose, CA, 2001, pp. 197–208
 3.
J Fridrich, M Goljan, R Du, Lossless data embeddingnew paradigm in digital watermarking. Eurosip. J. Appl. Signal Process. 2002(2), 185–196 (2002)
 4.
MU Celik, G Sharma, AM Teklap, E Saber, Lossless generalizedLSB data embedding, in IEEE Transactions on Image Processing, vol. 14, 2nd edn., 2005, pp. 253–266
 5.
J Tian, Reversible data embedding using a difference expansion, in EEE Transactions on Circuits and Systems for Video Technology, vol. 13, 8th edn., 2003, pp. 890–896
 6.
AM Alattar, Reversible watermark using difference expansion of triplets, in Proc. Int. Conf. Image Process., vol. 1. Barcelona, Spain, 2003, pp. 501–504
 7.
HJ Kim, V Sachnev, YQ Shi, J Nam, HG Choo, A novel difference expansion transform for reversible data embedding, in IEEE Transactions on Information Forensics and Security, vol. 4, 3rd edn., 2008, p. 465
 8.
X Wang, X Li, B Yang, Z Guo, Efficient generalized integer transform for reversible watermarking, in IEEE Signal Processing Letters. 6, vol. 17, 2010, pp. 567–570
 9.
F Peng, X Li, B Yang, Adaptive reversible data hiding scheme based on integer transform. Signal Process. 92(1), 54–62 (2012)
 10.
DM Thodi, JJ Rodriguez, Reversible watermarking by predictionerror expansion, in Proc. IEEE Southwest Symp. Image Anal. Interpretation, Lake Tahoe, CA, 2004, pp. 21–25
 11.
DM Thodi, JJ Rodriguez, Expansion embedding techniques for reversible watermarking, in IEEE Transactions on Image Processing, vol. 16, 3rd edn., 2007, pp. 721–730
 12.
B Ou, X Li, Y Zhao, R Ni, Reversible data hiding scheme based on pde predictor. J. Syst. Softw. 86(10), 54–62 (2012)
 13.
Y Hu, HK Lee, J Li, DEbased reversible data hiding with improved overflow location map, in IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, 2nd edn., 2009, pp. 250–260
 14.
C Panyindee, C Pintavirooj, Optimizations using the genetic algorithm for reversible watermarking, in Proc. ECTICON, 2013, pp. 1–5
 15.
S Kang, HJ Hwang, HJ Kim, Reversible watermark using an accurate predictor and sorter based on payload balancing, in ETRI, vol. 34, 3rd edn., 2012, pp. 410–420
 16.
Z Ni, YQ Shi, N Ansari, S Wei, Reversible data hiding, in IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, 3rd edn., 2006, pp. 354–362
 17.
M. Kamran, A. Khan, and S. A. Malik, A high capacity reversible watermarking approach for authenticating images: exploiting downsampling, histogram processing, and block selection. Inf. Sci. (2013). doi: 10.1016/j.ins.2013.07.035
 18.
WL Tai, CM Yeh, CC Chang, Reversible data hiding based on histogram modification of pixel differences, in IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, 6th edn., 2009, pp. 906–910
 19.
D Yan, R Wang, Reversible data hiding for audio based on prediction error expansion, in Proc. of IIHMSP2008, 2008, pp. 249–252
 20.
A Nishimura, Reversible audio data hiding using linear prediction and error expansion, in Proc. of IIHMSP2011, 2011, pp. 318–321
 21.
S Xiang, Noninteger expansion embedding for predictionbased reversible watermarking, in Proc. 14th Int. Conf, 2012, pp. 224–239
 22.
V Sachnev, HJ Kim, J Nam, S Suresh, YQ Shi, Reversible data embedding using sorting and prediction, in IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, 7th edn., 2009, pp. 989–999
 23.
X Hu, W Zhang, X Li, N Yu, Minimum rate prediction and optimized histograms modification for reversible data hiding, in IEEE Transactions on Information Forensics and Security, vol. 10, 3rd edn., 2015, pp. 653–664
 24.
AH Nuttal, Spectral analysis of a univariate process with bad data points, via maximum entropy and linear predictive techniques, in Tech. Pep. TR  5303, Naval Underwater Systems Center, New London, Conn, 1976
 25.
EBU Committee: sound quality assessment material recordings for subjective tests [online]. Available: https://tech.ebu.ch/publications/sqamcd
Funding
This work was partially supported by the NSFC project (No. 61272414), cofunded by the State Key Laboratory of Information Security (No. 2016MS07).
Competing interests
The authors declare that they have no competing interests.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Reversible data hiding
 Audio
 Noncausal prediction
 Minimum error power
 Alterable orders