A memory efficient finitestate source coding algorithm for audio MDCT coefficients
 Sumxin Jiang^{1}Email author,
 Rendong Yin^{1} and
 Peilin Liu^{1}
https://doi.org/10.1186/16874722201422
© Jiang et al.; licensee Springer. 2014
Received: 18 December 2013
Accepted: 16 April 2014
Published: 12 May 2014
Abstract
Abstract
To achieve a better tradeoff between the vector dimension and the memory requirements of a vector quantizer (VQ), an entropyconstrained VQ (ECVQ) scheme with finite memory, called finitestate ECVQ (FSECVQ), is presented in this paper. The scheme consists of a finitestate VQ (FSVQ) and multiple component ECVQs. By utilizing the FSVQ, the interframe dependencies within source sequence can be effectively exploited and no side information needs to be transmitted. By employing the ECVQs, the total memory requirements of the FSECVQ can be efficiently decreased while the coding performance is improved. An FSECVQ, designed for the modified discrete cosine transform (MDCT) coefficients coding, was implemented and evaluated based on the Unified Speech and Audio Coding (USAC) scheme. Results showed that the FSECVQ achieved a reduction of the total memory requirements by about 11.3%, compared with the encoder in USAC final version (FINAL), while maintaining a similar coding performance.
Keywords
1 Introduction
It is well known that a memoryless vector quantizer (VQ) can achieve performance arbitrarily close to the ratedistortion (R/D) function of the source, if the codevector dimension is large enough [1]. However, with the increase of the codevector dimension, the memory requirements and the computational complexity of the VQ will also increase exponentially. Furthermore, it will be difficult to design a practical VQ with high performance in a highdimensional space. Consequently, various product codevector quantization methods [2–5] have been proposed as alternative solutions. These methods cut down the memory requirements and reduce the computational complexity with a moderate loss of quantization performance. Among the widely reported product code techniques, split vector quantizer (SVQ), which was first proposed by Paliwal and Atal [6] for linear predictive coding (LPC) parameters quantization, receives extensive attention. In a SVQ, the input vector is first split into multiple subvectors [7], and then the resulting subvectors are quantized independently [8, 9]. Although the SVQ cuts down the memory requirements and reduces the computational complexity of a memoryless VQ, it ignores the correlations between the subvectors and, hence, leads to a coding loss, referred to as ‘split loss’ [10].
In order to recover the split loss, many techniques have been developed. So and Paliwal [2, 11] have proposed a switched SVQ (SSVQ) method, which adds multiple different SVQs to the input vector space so as to exploit the global dependencies. Based on SSVQ, a Gaussian mixture model (GMM)based SSVQ (GMMSSVQ) was proposed by Chatterjee et al. [12], where the distribution of the source is modeled by a GMM. Furthermore, a GMMbased KarhunenLoève transform (KLT) domain SSVQ was proposed by Lee et al. [13], which was constructed by adding a regionclustering algorithm to the GMMSSVQ. To better exploit the probability density function (pdf) of the source, Chatterjee and Sreenivas [14] developed a switched conditional pdfbased SVQ where the vector space is partitioned into nonoverlapping Voronoi regions, and the source pdf of each switching Voronoi region is modeled by a multivariate Gaussian. Although these methods efficiently recover the split loss, most of them simply focus on removing intraframe redundancies and fail to exploit interframe redundancies.
In addition, ordinary VQs can generally be divided into two groups: entropyconstrained VQ (ECVQ) [15] and resolutionconstrained VQ (RCVQ) [16], and the abovementioned methods are mainly proposed for the RCVQ and can hardly be applied on the ECVQ [17]. In the other side, an ECVQ usually achieves better R/D performance than a RCVQ does [18]. This is mainly owing to the length function contained in the ECVQ which allocates a different number of bits to different vector indices according to the probability of their appearance. Therefore, an ECVQ with recovered split loss would achieve a higher R/D performance than a RCVQ does.
To better recover the split loss of a SVQ, the finitestate VQ (FSVQ) can usually be resorted to, which is able to efficiently take advantage of the interframe dependencies. FSVQ [19, 20], which incorporates memory into a memoryless VQ, is intrinsically a predictionbased technique. An FSVQ can be regarded as a finitestate machine [21], which contains multiple states, each corresponding to a certain state codebook. The state transition is determined by a nextstate function based on the information obtained from the previously encoded vectors. Thus, the FSVQ utilizes the previous encoded vectors to predict the current input [22] and, therefore, efficiently exploits the redundancies among the input vectors and achieves a considerable increase in the R/D performance over a memoryless VQ.
In this paper, a composite quantizer, called FSECVQ, is introduced, in which multiple ECVQs are combined with a FSVQ. In FSECVQ [23], this FSVQ serves as a classifier which splits the source sequence into multiple clusters. To achieve better classification performance, the FSVQ draws the current decision based on information obtained from a number of previous adjacent vectors, even from those in previous frames, and thus better exploits the interframe redundancies than an ordinary SVQ does. After that, a specially designed ECVQ is applied on each cluster derived from the FSVQ. Among the resulting clusters, the more frequently a cluster occurred, the higher vector dimension it will be assigned. Through this method, the total memory requirements can be significantly reduced and the coding performance can be obviously improved. Moreover, within each component ECVQ, multiple length functions are devised for coding the indices of input vectors, each corresponding to a certain pdf model. To select the optimal length function for each vector index, another FSVQ is introduced. This FSVQ predicts the source pdf of the current vector index based on the information obtained from its previous adjacent ones, and then the length function with the highest matching probability is chosen. Through this method, the ‘mismatch’ between the designed pdf and the source pdf can be efficiently decreased. Thus, the FSECVQ will be more robust than an ordinary SVQ.
The organization of this paper is as follows. In Section 2, some fundamentals about VQ, FSVQ, and ECVQ are introduced. Section 3 deals with the design of the FSECVQ. Then, in Section 4, a practical FSECVQ aimed at coding the audiomodified discrete cosine transform (MDCT) coefficients in the MPEG Unified Speech and Audio Coding (USAC) [24] is implemented and tested. Finally, conclusions are presented in Section 5.
2 Preliminaries
Since FSECVQ is based on FSVQ and ECVQ, in this section we will review the classical results of these vector quantization theories under the high rate assumption.
2.1 Vector quantization
Generally, a VQ, q, consists of four elements: encoder ϕ, decoder ψ, index coder ζ, and codebook . Suppose that random vector, x, with pdf, f, is quantized by quantizer q and the corresponding reconstructed vector is $\widehat{\mathbf{x}}$. Then, for a given measurable space $(\Omega ,\mathcal{F})$ consisting of a kdimensional Euclidean space Ω and its Borel subset, the mappings of quantizer q can be described as follows:

Encoder ϕ: $\Omega \to \mathcal{I}$, where is a countable index set. Each element in corresponds to a different codevector contained in codebook . The aim of encoder ϕ is to find the index of the best matching vector in codebook for input vector x according to a given distortion criterion

Decoder ψ: $\mathcal{I}\to \Omega $, which is used to reconstruct the vector in space Ω according to the received vector index

Index coder ζ: $\mathcal{I}\to \left\{\text{bitstream}\right\}$, which transforms the index sequence generated from encoder ϕ to a bitstream

Codebook , which is used by both encoder ϕ and decoder ψ to generate the optimal codevector indices or to find the corresponding codevectors
In our work, Euclidean distance, $d(\mathbf{x},\widehat{\mathbf{x}})=\parallel \mathbf{x}\widehat{\mathbf{x}}{\parallel}^{2}$, is used as the distortion measure, where ∥·∥ denotes the l_{2} norm.
2.2 Finitestate vector quantization
FSVQ is a VQ with a timevarying encoder and decoder pair [21], which is realized by means of a finitestate machine. Assume that a FSVQ contains M distinct states, S_{1},…,S_{ M }, whose corresponding state codebooks are, ${\mathcal{C}}_{1},\dots ,{\mathcal{C}}_{M}$, respectively. Suppose that x_{ n } is the input vector, whose current state is s_{ n }∈{S_{1},…,S_{ M }}. Then, by searching the codebook ${\mathcal{C}}_{m}$, corresponding to the current state s_{ n }, for the best matching codevector ${\widehat{\mathbf{x}}}_{n}$, the input vector x_{ n } can be quantized, whose vector index is denoted as i_{ n }.
which implies that the input vector x_{ n } is quantized in the codebook ${\mathcal{C}}_{m}$ corresponding to the current state s_{ n }.
which implies that the received vector index, i_{ n }, is decoded in the codebook ${\mathcal{C}}_{m}$ corresponding to the current state s_{ n }.
2.3 Entropyconstrained vector quantization
The design of an ECVQ is to find a set of reconstruction vectors which minimizes the average distortion between the source and its reconstruction, subject to a constraint on the index entropy [15]. To obtain a common conclusion, Gray et al. [25, 27] investigated the variablerate ECVQ using a Lagrangian formulation in which a Lagrangian multiplier λ>0 is defined for each rate.
where D_{ f }(q) and R_{ f }(q), obtained from (5) and (1), are the average rate and average distortion of quantizer q, respectively.
This result guarantees that if a pdf f satisfies the conditions of (15), then there exists an optimal quantizer q for f in the sense that for any decreasing λ converging to 0, its optimal performance is ξ_{ k }.
Compared with (15), it can be seen that the mismatch resulted from applying an asymptotically optimal quantizer for pdf g to a source sequence with pdf f is exactly the relative entropy of the source pdf f to the design pdf g, I(fg).
3 Quantizer design
3.1 Main FSVQ
The major function of the main FSVQ is to partition the input space into four nonoverlapped clusters according to the four states contained. For each resulting cluster, a component ECVQ is constructed holding a different vector dimension and different memory requirements. By this means, the total memory requirements could be efficiently decreased. The state transition is determined by a nextstate function, which is the key component of the main FSVQ. In the following part of this section, we will mainly discuss the construction of the nextstate function.
Once a source sequence is split into a series of blocks, the value of V_{ x } will be calculated for each block. Thus, a mapping can be established between the V_{ x } set, composed of all the possible values of V_{ x }, and the input space Ω. Then, by splitting the possible values of V_{ x } into two segments, we can partition the input space Ω into two clusters, Ω_{ k } and ${\Omega}_{k}^{\text{C}}$. Here, k denotes the dimension of cluster Ω_{ k }. To implement the split, a threshold V_{T} is employed, whose value is obtained by maximizing the coding gain of the FSECVQ under the constraint of the total memory requirements using the training data. As for the two resulting clusters, Ω_{ k } is supposed to contain the blocks occurring relatively frequently, whereas ${\Omega}_{k}^{\text{C}}$ is assumed to hold those occurring relatively scarcely.
which denotes an estimation of the cluster to which the current block is most likely to be classified.
3.2 ECVQ
Based on the research done by Gray et al. [25], in our work, Z_{ n } lattice quantizer and arithmetic coder are selected as the lattice quantizer and the length function of each component ECVQ, respectively. Unlike conventional ECVQ [15, 17], where all the vector indices generated from the lattice quantizer share a same length function regardless of their possible differences, in our work multiple length functions are available and the optimal one is selected by another FSVQ (subFSVQ) for each generated vector index. Moreover, to improve the robustness and, at the same time, decrease the memory requirements of each component ECVQ, the design of subFSVQ is optimized and an iterative method to merge the similar length functions is proposed.
The length functions are implemented by an arithmetic coder, which are based on the pdf model of the input index. Hence, the main work of the subFSVQ is to search for the optimal one among a predesigned collection of pdf models based on the information obtained from previous indices.
3.2.1 Lattice quantizer
where X denotes an input vector, and t_{0} and p_{0} are two thresholds that constrain the norm and the probability of input vector X, respectively.
3.2.2 SubFSVQ
This FSVQ is used to search for the optimum in a predesigned collection of length functions, which are used to encode the current vector index generated from the lattice quantizer. The nextstate function of the subFSVQ, ${\gamma}_{{s}_{i}}$, is built on the four previous indexes I_{ A }, I_{ B }, I_{ C }, and I_{ D }, adjacent to the current input, I_{ x }. Since the ECVQ holds a finite number of codevectors, the simplest way to construct the nextstate function is to enumerate all the possible combinations of the four neighbors, each denoting a certain state. But with the increase of the number of codevectors, the possible number of current states will be extremely large, and thus, the memory requirements and the computation cost skyrocket.
To reduce the number of possible current states, the different dependencies between the current index and its four previous neighbors must be taken into account. In practice, less emphasis is placed on indices I_{ A } and I_{ C } than on indices I_{ B } and I_{ D }. This is due to the fact that among the four neighbors, current vector x is less relevant to vectors A and C than to vectors B and D. Thus, we apply the operation ·_{2} to vectors A and C, so as to reduce the number of their possible values.
where i denotes that the subFSVQ belongs to the ith ECVQ and t_{0}, t_{1}, and t_{2} are three constants making each combination of the four indices corresponding to a different current state. This is feasible since for an audio MDCT coefficient sequence, the values of the four variables, I_{ B }, I_{ D }, ${I}_{\left\rightA{}_{2}}$, and ${I}_{\left\rightC{}_{2}}$, are all finite, and then according to their maximum possible values, it is easy to find the possible values of the three constants.
3.2.3 Length function
The length functions are realized by an arithmetic coder holding multiple pdf models. There are two difficulties in building an optimal arithmetic coder for an optimal ECVQ. First, the memory requirements for saving the predesigned pdf models will become infeasible as the number of states derived from (25) increases. Second, as the volumes of the partitions split by the subFSVQ shrink, the available data may not provide credible pdf estimation. Popat and Picard [33] proposed a solution to the second problem using a Gaussian mixture model (GMM) for describing the source pdf. Thus, this work mainly focuses on reducing the memory requirements for saving the pdf models necessary for the arithmetic coder.
where ρ_{ m }, which equals to the probability P_{ g }(S_{ m }), is the weight of model g_{ m }. Thus, the mismatch d_{mis} can be seen as a distance measure of a pdf model pair. The more similar the two models are, the smaller is the mismatch. Therefore, we can efficiently decrease the memory requirements for saving the pdf models by merging the model pairs, which hold small enough mismatches, into a new pdf model with a negligible loss of the coding performance.
For a pdf model collection, once we have obtained the d_{mis} values of each model pair, we can merge the ones with minimal d_{mis} values into a new pdf model so as to reduce the memory requirements. If the memory size is still above the requirements, the mergence of the similar pdf models should be continued. But once a new pdf model is generated, the mismatches among pdf models should be updated first. And then, a new merge can be executed. The whole procedure will be carried out iteratively, until the memory size reaches the requirements. Once the final pdf models are obtained, a remapping between these models and their corresponding states is needed.
4 Results
In USAC [34], an uptodate MPEG standardization, MDCT plays an important role [35]. In the USAC encoder, the MDCT coefficients are firstly companded with a power low function before scalar quantization, achieving in effect a nonuniform scalar quantization. And then, the residuals are further entropy coded. To improve the performance of MDCT coefficients quantization and coding, a novel scheme [29], which combined a scalar quantization with a contextbased entropy coding, was developed in the USAC. In this new scheme, the input tuples (blocks) were first quantized by a scalar quantizer (SQ), and then the generated tuple indices were further encoded through a contextbased arithmetic encoder. In the USAC final version (FINAL), the tuple length of this scheme was selected to be 2, in order to decrease the total memory requirements.
To further reduce the memory requirements and improve the R/D performance of the MDCT coefficients quantization and coding, a FSECVQ was implemented and tested based on the USAC final version. The implemented FSECVQ consisted of three component ECVQs, ECVQ_CB4, ECVQ_CB2, and ECVQ_CB1, of which the vector dimensions were 4, 2, and 1, respectively.
To make an easy comparison with the FINAL, the FSECVQ was divided into two parts, SQ, which was formed by merging the scaling steps contained in the three component ECVQs and constructed just the same as the one in the FINAL, and the core module of FSECVQ, which was referred to FSECVQ for simplicity. Thus, the FSECVQ and the FINAL would share the same source sequence and the same quantization error and only differ in their coding performance. Therefore, the remainder of this section was mainly focused on evaluating the coding performance of the FINAL and the FSECVQ.
4.1 Memory requirements
Memory requirements for the two methods: FINAL and FSECVQ
Table name  Description  Words of 32 b  

FINAL  FSECVQ  
Model decision  For selecting the optimal cdf model  927.5  192 
Cdf models  Required for saving the cdf models  512  1,075 
Others  Other requirements  1.5  11.5 
Total    1,441.0  1,278.5 
Number of codevectors, models, and memory requirements for FINAL and FSECVQ
Codebook  FINAL  FSECVQ  

Vector  Model  ROM  Vector  Model  ROM  
Dimension 1        10  27  108 
Dimension 2  17  64  512  26  31  264 
Dimension 4        49  27  703 
Total  17  64  512  85  85  1,075 
Compared with FINAL, the FSECVQ was less memory exhausting in cdf model decision. This was mainly due to the two FSVQs (main FSVQ and subFSVQ), which adaptively reshaped the input blocks and merged the states with similar cdf models to be a new one, while at the same time no side information was needed to be transmitted. Thus, the number of states needed to be conserved contained in subFSVQ would be much fewer than those contained in the contextmodel of the FINAL. As a result, the FSECVQ further reduced the total memory requirements of the FINAL by up to 11.3%.
The number of codevectors (codebook size) and the memory requirements for saving the cdf models of FINAL and FSECVQ were demonstrated in Table 2. It could be seen that the FSECVQ employed three different codebooks, whose dimensions were 4, 2, and 1, respectively. Among these codebooks, the 4dimensional codebook was assigned the largest number of codevectors, whereas the 1dimensional one was assigned the least. Through this means, the equivalent vector dimension of the FSECVQ would be reduced, and therefore, its memory requirements would be efficiently decreased.
4.2 Average computational complexity
Average complexity numbers for decoding 32 kbps stereo reference quality bitstreams for quantizers FINAL and FSECVQ
Operating mode  USACFINAL  FSECVQ 

PCU (MHz)  0.607  0.605 
As the cubic terms usually led to a large computation, to reduce the computational complexity, a lookup table was employed in the FSECVQ so that the FSECVQ held a similar computational complexity as the FINAL. In practice, the size of the lookup table was dependent on the selection of the threshold of the main FSVQ. In our work, to calculate the threshold of current block, four previous neighbors were employed. Since the current block and its four neighbors were highly correlated and usually hold a similar envelope shape, the largest element of all the codevectors could be constrained to a small value, such as 8. Thus, the size of the lookup table for storing the cubic terms would be very small, about two words.
4.3 Rate performance
Bitrates of quantizers FINAL and FSECVQ for nine audio items
Operating mode  FINAL  FSECVQ  

(kbps)  (kbps)  (%)  
Test 1, 64 kbps stereo  48.59  48.77  −0.33 
Test 2, 32 kbps stereo  24.56  24.55  0.04 
Test 3, 24 kbps stereo  17.59  17.57  0.11 
Test 4, 20 kbps stereo  14.87  14.86  0.09 
Test 5, 16 kbps stereo  11.71  11.69  0.17 
Test 6, 24 kbps mono  18.97  18.98  −0.05 
Test 7, 20 kbps mono  15.41  15.42  −0.06 
Test 8, 16 kbps mono  12.18  12.17  0.08 
Test 9, 12 kbps mono  8.78  8.76  0.23 
Average  19.18  19.18  0.03 
The table demonstrated that the FINAL and the FSECVQ achieved a similar coding performance in all the nine items. This denoted that the FINAL and the FSECVQ both could efficiently remove the redundancies within audio MDCT coefficient sequences. Moreover, both FINAL and FSECVQ obtained more coding gains in the low bitrate items than in the high bitrate items. These phenomena were mainly due to the fact that the nine items have different pdf of MDCT coefficients. In FSECVQ, a different source distribution would lead to a different calling ratio of its three component ECVQs.
4.4 Main FSVQ
The effects on the three component ECVQs and coding gains
Pyramidal  Threshold  ECVQ_D4  ECVQ_D2  ECVQ_D1  Gains (%)  

decomposition  T _{ b d }  T _{ a c }  Ratio (%)  LSB (%)  Ratio (%)  LSB (%)  Ratio (%)  LSB (%)  
Ω _{4} ^{a}  9  30  77.395  1.403  16.769  1.873  5.837  9.883  3.0626 
10  79.276  1.623  14.895  2.075  5.830  9.895  2.9917  
12  82.300  2.104  11.884  2.501  5.816  9.914  2.7091  
17  82.517  2.159  11.669  2.529  5.814  9.916  2.6779  
24  83.410  2.403  10.785  2.673  5.805  9.925  2.5094  
∞  87.733  4.947  7.728  2.672  4.539  10.129  0.9096  
10  24  78.733  1.579  15.436  2.014  5.832  9.892  3.0123  
27  78.836  1.588  15.333  2.025  5.831  9.893  3.0094  
33  79.520  1.645  14.651  2.105  5.829  9.896  2.9834  
36  79.606  1.654  14.566  2.117  5.829  9.896  2.9812  
∞  81.651  1.917  12.624  2.363  5.725  10.062  2.7537  
Ω _{2} ^{b}  32  243  79.276  1.623  13.849  1.647  6.875  8.455  2.9771 
45  14.542  1.880  6.183  9.363  3.0022  
67  15.219  2.241  5.505  10.443  2.9449  
75  15.496  2.436  5.228  10.950  2.8997  
∞  17.364  6.231  3.360  12.663  1.4433  
65  189  79.276  1.623  14.727  2.021  5.997  9.634  2.9563  
216  14.776  2.036  5.948  9.707  2.9535  
249  14.922  2.085  5.802  9.941  2.9929  
257  14.942  2.098  5.782  9.972  2.9916  
∞  15.537  2.548  5.817  10.975  2.8963 
First, the thresholds of cluster Ω_{4} had a larger impact on the coding gain than cluster Ω_{2} did, which could be explained by the fact that the variation range of the coding gains on Ω_{4} was much wider than that on Ω_{2}. Furthermore, within a level threshold T_{ b d } had a larger impact on the coding gain than threshold T_{ a c } did. Since T_{ b d } and T_{ a c } were obtained from adjacent blocks B, D and A, C, respectively, this proved the assumption that B, D were more significant than A, C.
Second, the component ECVQ, ECVQ_D4, gains than the two others. From Table 5, it could be observed that most of the MDCT coefficients were encoded by ECVQ_D4. Therefore, to obtain the optimal performance, the promotion of performance of ECVQ_D4 should be of the highest priority.
4.5 ECVQ
As each component ECVQ contained two stages, lattice quantization and entropy coding, we would first assess the quantization stage and then, the entropy coding stage.
4.5.1 Quantization stage
To assess the quantization stage, we took LSB as a major indicator. There were at least three reasons. First, LSB appeared if and only if an input vector fell outside the range constrained by the lattice quantizer, and thus, LSB could be seen as the sign of the appearance of error in the quantization stage. Therefore, the lower occurrence frequency of LSB would usually denote fewer quantization errors in the quantization stage, and as a result, a higher coding gain achieved by the component ECVQ. Second, by adjusting the threshold T_{ b d } and T_{ a c }, we could achieve different occurrence frequency of LSB and thus make different tradeoff between the coding gain and the memory requirements. At last, the ratio among the three LSB occurrence frequencies is correlated with the distribution of quantization errors among the three component ECVQs. A higher LSB occurrence frequency denoted more quantization errors distributed to the corresponding component ECVQ.
The LSB occurrence in each component ECVQ significantly influenced the final coding gain of the FSECVQ, which could be seen from the Table 5. For an input vector, if the LSB appeared, the ECVQ would consume much more bits than that for encoding it directly. There were two methods for reducing the appearance of LSB: to enlarge the range of the corresponding codebook or to shrink the range constrained by the threshold. However, the first method would lead to an increase in the memory requirements, while the second would degrade the coding gain. Therefore, a tradeoff must be made between the memory requirements and the coding gain. Among the three ECVQs, ECVQ_D4 had the least percentage of LSBs while ECVQ_D1 had the largest. By this means, the FSECVQ could save the memory requirements while keeping the coding gain as high as possible.
4.5.2 The length functions
Although the FINAL contained less cdf models than the FSECVQ did, it obtained similar coding performance to the FSECVQ. This was mainly owing to the cdf model selection method used in FINAL, which accurately selected the optimal cdf model for each input vector index. However, it was more complicated than that used in FSECVQ. This could be seen from the fact that the memory requirements for the cdf model selection in FINAL was much larger than those in FSECVQ, as demonstrated in Table 1.
5 Conclusions
In this paper, an ECVQ with finite memory, called FSECVQ, is proposed. In the FSECVQ, a FSVQ, namely the main FSVQ, is used to partition the source sequence into multiple nonoverlapped clusters. Then to each cluster, an ECVQ is applied. Within each ECVQ, its length function is taken by an arithmetic coder holding multiple predesigned cdf models. To select the optimal cdf model for each input vector, another FSVQ, namely the subFSVQ, is employed.
Owing to the main FSVQ which effectively exploits the interframe dependencies, the source sequence is split into multiple clusters and no side information is needed to be transmitted. Moreover, the main FSVQ assigned different vector dimensions to the resulting clusters. The more frequently a cluster appears, the higher vector dimension is allocated. This helps the FSECVQ to efficiently reduce its total memory requirements while, at the same time, maintaining a relatively high coding performance. Finally, for each input vector, the subFSVQ selects the best matching cdf model, which adds robustness to the FSECVQ.
There are multiple ways to realize the proposed FSECVQ. First of all, if the quantizing errors generated from the lattice quantizer are directly discarded, then the FSECVQ is equivalent to an ordinary ECVQ. However, if the quantizing errors are taken as the LSBs and encoded by an additional length function, the FSECVQ will be equal to an uniform quantizer. In addition, if the quantization steps of all the component ECVQs are separated from the FSECVQ, then the FSECVQ becomes an entropy encoder. The FSECVQ can also be used in coding the speech, image, and video signals, and even any other source sequence with nonuniform distribution.
Declarations
Acknowledgements
The authors wish to thank the anonymous reviewers for their detailed comments and suggestions, which have been extremely helpful in improving the clarity and quality of this paper. This work was supported by the National Natural Science Foundation of China under Grant No. 61171171.
Authors’ Affiliations
References
 Gersho A, Gray RM: Vector Quantization and Signal Compression. New York: Wiley; 1994.Google Scholar
 So S, Paliwal KK: Efficient product code vector quantisation using the switched split vector quantiser. Digit Signal Process 2007, 17: 138171. 10.1016/j.dsp.2005.08.005View ArticleGoogle Scholar
 Gray R, Neuhoff D: Quantization. Inform. Theory, IEEE Trans 1998, 44(6):23252383. 10.1109/18.720541MathSciNetView ArticleGoogle Scholar
 Subramaniam A, Rao B: PDF optimized parametric vector quantization of speech line spectral frequencies. Speech Audio Process IEEE Trans 2003, 11(2):130142. 10.1109/TSA.2003.809192View ArticleGoogle Scholar
 So S, Paliwal K: Multiframe GMMbased block quantisation of line spectral frequencies for wideband speech coding. In Proceedings in IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP ’05), vol. 1. Philadelphia; March 2005:121124.Google Scholar
 Paliwal K, Atal B: Efficient vector quantization of LPC parameters at 24 bits/frame. Speech Audio Process IEEE Trans 1993, 1: 314. 10.1109/89.221363View ArticleGoogle Scholar
 Bouzid M, Cheraitia S, Hireche M: Switched split vector quantizer applied for encoding the LPC parameters of the 2.4 Kbits/s MELP speech coder. In 7th International MultiConference on Systems Signals and Devices. Amman, Jordan; June 2010:15.Google Scholar
 Leis J, Sridharan S: Adaptive vector quantization for speech spectrum coding. Digit Signal Process 1999, 9(2):89106. 10.1006/dspr.1999.0335View ArticleGoogle Scholar
 Chatterjee S, Sreenivas T: Optimum switched split vector quantization of LSF parameters.Signal Process.. 2008, 88(6):15281538.Google Scholar
 Nordin F, Eriksson T: On split quantization of LSF parameters. In Proceedings on IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’04), vol. 1. Montreal; May 2004:I157–60.View ArticleGoogle Scholar
 So S, Paliwal KK: A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding. Digit Signal Process 2007, 17: 114137. 10.1016/j.dsp.2005.10.002View ArticleGoogle Scholar
 Chatterjee S, Sreenivas T: Gaussian mixture model based switched split vector quantization of LSF parameters. In IEEE International Symposium on Signal Processing and Information Technology. Giza; December 2007:10541059.Google Scholar
 Lee Y, Jung W, Kim MY: GMMbased KLTdomain switchedsplit vector quantization for LSF coding. Signal Process Lett. IEEE 2011, 18(7):415418.View ArticleGoogle Scholar
 Chatterjee S, Sreenivas T: Switched conditional PDFbased split VQ using gaussian mixture model. Signal Process Lett. IEEE 2008, 15: 9194.View ArticleGoogle Scholar
 Chou P, Lookabaugh T, Gray R, Entropyconstrained vector quantization: Acoustics Speech Signal Process. IEEE Trans. 1989, 37: 3142.View ArticleGoogle Scholar
 Lookabaugh T, Gray R: Highresolution quantization theory and the vector quantizer advantage. Inform Theory IEEE Trans 1989, 35(5):10201033. 10.1109/18.42217MathSciNetView ArticleGoogle Scholar
 Zhao D, Samuelsson J, Nilsson M: On entropyconstrained vector quantization using gaussian mixture models. Commun IEEE Trans 2008, 56(12):20942104.View ArticleGoogle Scholar
 Vasilache A: Ratedistortion models for entropy constrained lattice quantization. In IEEE International Conference on Acoustics Speech and Signal Processing, (ICASSP ’10). Dallas; March 2010:46984701.Google Scholar
 Foster J, Gray R, Dunham M: Finitestate vector quantization for waveform coding. Inform Theory IEEE Trans 1985, 31(3):348359. 10.1109/TIT.1985.1057035View ArticleGoogle Scholar
 Andras Cziho BS, ETC IL: An optimization of finitestate vector quantization for image compression. Signal Process Image Commun 2000, 15(6):545558. 10.1016/S09235965(99)000120View ArticleGoogle Scholar
 Yahampath P, Pawlak M: On finitestate vector quantization for noisy channels.Commun. IEEE Trans 2004, 52(12):21252133. 10.1109/TCOMM.2004.838736View ArticleGoogle Scholar
 Chang RF, Huang YL: Finitestate vector quantization by exploiting interband and intraband correlations for subband image coding. Image Process IEEE Trans 1996, 5(2):374378. 10.1109/83.480773View ArticleGoogle Scholar
 Jiang S, Yin R, Liu P: A finitestate entropyconstrained vector quantizer for audio MDCT coefficients coding. In International Conference on Audio, Language and Image Processing, (ICALIP 2012). Shanghai; July 2012:218223.View ArticleGoogle Scholar
 ISO/IEC JTC1/SC29/WG11: Call for proposals on unified speech and audio coding. 2007.http://mpeg.chiariglione.org/standards/mpegd/unifiedspeechandaudiocoding []Google Scholar
 Gray R, Linder T, Li J: A Lagrangian formulation of Zador’s entropyconstrained quantization theorem. Inform Theory IEEE Trans 2002, 48(3):695707. 10.1109/18.986007MathSciNetView ArticleGoogle Scholar
 Nasrabadi N, Rizvi S: Nextstate functions for finitestate vector quantization. Image Process IEEE Trans 1995, 4(12):15921601. 10.1109/83.475510View ArticleGoogle Scholar
 Gray R, Li J: On Zador’s entropyconstrained quantization theorem. In Proceedings on Data Compression Conference, (DCC 2001). Snowbird; March 2001:312.View ArticleGoogle Scholar
 Gray R, Linder T: Mismatch in highrate entropyconstrained vector quantization.Inform. Theory IEEE Trans 2003, 49(5):12041217. 10.1109/TIT.2003.810637MathSciNetView ArticleGoogle Scholar
 Fuchs G, Subbaraman V, Multrus M: Efficient context adaptive entropy coding for realtime applications. In IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP ’11). Prague; May 2011:493496.View ArticleGoogle Scholar
 Nasrabadi N, Choo C, Feng Y: Dynamic finitestate vector quantization of digital images. Commun IEEE Trans 1994, 42(5):21452154. 10.1109/26.285150View ArticleGoogle Scholar
 Gyorgy A, Linder T, Chou P, Betts B: Do optimal entropyconstrained quantizers have a finite or infinite number of codewords. Inform Theory IEEE Trans 2003, 49(11):30313037. 10.1109/TIT.2003.819340MathSciNetView ArticleGoogle Scholar
 Yu R, Lin X, Rahardja S, Ko C: A statistics study of the MDCT coefficient distribution for audio. In IEEE International Conference on Multimedia and Expo, (ICME ’04) vol. 2. Taipei; June 2004:14831486.Google Scholar
 Popat K, Picard R: Clusterbased probability model and its application to image and texture processing. Image Process IEEE Trans 1997, 6(2):268284. 10.1109/83.551697View ArticleGoogle Scholar
 ISO/IEC JTC 1/SC 29N11510: Information technology  MPEG audio technologies Part 3: unified speech and audio coding. 2010.http://mpeg.chiariglione.org/standards/mpegd/unifiedspeechandaudiocoding []Google Scholar
 Neuendorf M, Gournay P, Multrus M, Lecomte J, Bessette B, Geiger R, Bayer S, Fuchs G, Hilpert J, Rettelbach N, Salami R, Schuller G, Lefebvre R, Grill B: Unified speech and audio coding scheme for high quality at low bitrates. In IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP ’09). Taipei; April 2009:14.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.