Expanded three-channel mid/side coding for three-dimensional multichannel audio systems
- Shi Dong^{1},
- Ruimin Hu^{1}Email author,
- Xiaochen Wang^{1},
- Yuhong Yang^{1} and
- Weiping Tu^{1}
https://doi.org/10.1186/1687-4722-2014-10
© Dong et al.; licensee Springer. 2014
Received: 1 November 2013
Accepted: 12 March 2014
Published: 24 March 2014
Abstract
Three-dimensional (3D) audio technologies are booming with the success of 3D video technology. The surge in audio channels makes its huge data unacceptable for transmitting bandwidth and storage media, and the signal compression algorithm for 3D audio systems becomes an important task. This paper investigates the conventional mid/side (M/S) coding method and discusses the signal correlation property of three-dimensional multichannel systems. Then based on the channel triple, a three-channel dependent M/S coding (3D-M/S) method is proposed to reduce interchannel redundancy and corresponding transform matrices are presented. Furthermore, a framework is proposed to enable 3D-M/S compress any number of audio channels. Finally, the masking threshold of the perceptual audio core codec is modified, which guarantees the final coding noise to meet the perceptual threshold constraint of the original channel signals. Objective and subjective tests with panning signals indicate an increase in coding efficiency compared to Independent channel coding and a moderate complexity increase compared to a PCA method.
Introduction
Recently, 3D audio has attracted more attention and developed fast following the booming market of 3D movie. Many 3D audio technologies are now introduced into audio-involved applications to replace the surround sound system to provide superior sound localization and an immersive feeling. Wave field synthesis (WFS), Ambisonics and vector-based amplitude panning (VBAP) are the three most well-developed technologies. WFS generally follows Huygens principle to reconstruct the original sound field [1]. Research institutions such as IDMT of Fraunhofer and IRCAM in France have an intensive study in WFS, and attempt to bring WFS into theater and live transmission of concert. Ambisonics utilizes spherical harmonic functions to recording sound field and driving loudspeakers, its loudspeakers have rigorous configuration and give a good sound field reconstruction in the center [2]. VBAP follows the tangent law in a three-dimensional space using three adjacent loudspeakers to form a sound vector. For its simplicity, VBAP is the most common algorithm in 3D signal panning [3]. A 3D system like 22.2 multichannel system proposed by NHK in Japan utilizes VBAP to generate 3D sound image [4]. The 22.2 multichannel system is also included in the developing MPEG-H standard for rendering 3D audio scene.
There is a clear trend that 3D audio technology will become mature gradually and replace stereo and surround sound [5]. However, a main and common feature of 3D audio technologies is the great number of sound channels. For instance, WFS system always contains dozens and even hundreds of audio channels. The 22.2 system has three layers and 24 audio channels. Although the Ambisonics system can have flexible order and channel number, it usually uses dozens of channels because fewer channels will cause quality deterioration. Comparing with a two-channel stereo and a 5.1 surround sound, the increasing of audio channel causes a dramatical 3D audio data increase. A report from Fraunhofer shows 37 Mbps is needed for live transmission of WFS [6]. For the 22.2 multichannel system, uncompressed data also reaches 28 Mbps [7]. Currently, storage media and transmission bandwidth can hardly afford those huge data size. So the compression of 3D multichannel audio signals becomes an important subject.
The well-known Spatial Audio Coding (SAC) models the signals as virtual sound sources in the frequency domain, extracts the interchannel level difference (ICLD) and interchannel time difference (ICTD) and interchannel coherence (IC) to represent the direction and width of virtual sound source and downmixes the multichannels to reduce redundancy [8–11]. The idea of using downmixed sources with spatial parameters was later developed into Spatial Audio Object Coding (SAOC) for efficiently coding the multiple input spatial audio objects with interactive and personalized rendering ability [12]. Recently, some other investigations have been published to increase the compression efficiency for multichannel 3D audio signals. In 2007, Goodwin and Jot proposed a PCA-based multichannel compression framework for parametric coding [13], which can enhance specific audio scenarios and provide robust spatial audio coding. In 2008, Cheng et al. proposed the Spatially Squeezed Surround Audio Coding (S^{3}AC) for parametrically compressing the Ambisonics signal [14]. In 2009, Hellerud used an inter-channel prediction-based coding method to remove the redundancy between Ambisonics channels [15], which has low algorithm delay but high computational complexity. Tzagkarakis used a sinusoidal model and linear prediction to parameterize the separate spot microphone channels, then downmixed the residual signals. This coding scheme is more suitable for multichannel signals with weak correlation, and such scenarios require Independent channel decoding [16]. In 2010, Pinto et al. utilized a space/time-frequency transform to decompose the WFS signals into plane waves and evanescent waves. By discarding the evanescent waves and perceptually coding the plane wave signals, coding gain is obtained. Coding efficiency increases along with the number of audio channels, because the transform decomposition accuracy depends on the spatial resolution which is the number of WFS channels [17, 18]. In 2013, Cheng further proposed a Spatial Localization Quantization Point (SLQP) codec using localization cues to compress the 3D audio signals [19, 20]. Since SLQP extracts the spatial cues and downmixes the channels, it achieved high compression ratio for SLQP signals and other 3D audio systems.
In order to increase the coding efficiency at high bitrates, some non-parametric coding schemes were developed. Yang proposed a scalable multichannel codec, using the Karhunen-Loeve Transform (KLT) to remove the interchannel redundancy to realize scalable multichannel audio coding [21]. Mid/side (M/S) coding was introduced by J.D. Johnston [22] and adopted by many audio codec such as MPEG2-Layer III and MPEG4-AAC. In 2003, Liu et al. proposed a bit allocation method for M/S coding based on allocation entropy, which increases the objective quality by allocating more bits to high energy channel in M/S coding [23]. In 2008, Derrien et al. proposed an error model for M/S coding. The error model enables tuning of the quantizer used for channels M and S at the encoder with respect to the distortion of L and R at the decoder side, which increased the coding efficiency of M/S without much complexity [24]. Since M/S coding works as the simplest interchannel prediction, Krueger generalized it using linear prediction instead of M/S transformation and residual signal instead of difference signal [25]. In 2012, Schafer further developed Krueger’s method, the multichannel case, which has low algorithmic delay [26]. Recently, M/S coding was combined with parametric stereo coding at low bitrates in the MPEG-USAC standard [27] by predicting the residual channel using spatial cue-based parameters, which aimed to bridge the stereo quality gap between low bitrates and high bitrates [28]. M/S coding also works alone at high bitrates utilizing a novel complex prediction to achieve better performance [29].
The above model-based codec and parametric codec can offer a considerable compression ratio. However, those methods need to know the direction of the real audio source to do objective-oriented coding, or estimated a virtual source direction to do downmixing and parametric coding. In practice, such as live recording, it is very difficult to obtain the real audio source direction. Downmixing and parametric coding will cause interchannel interference such as ‘tone leakage’ artifacts when channel signals differ greatly [30]. Furthermore, the computational complexity of an audio codec should be acceptable while maintaining enough coding efficiency, and parametric coding can only achieve a performance gain at low bitrates. This paper focuses on the situation that only the multichannel signals of audio sources are recorded, instead of their directions. And we consider high-quality/high-bitrate application and focus on the non-parametric coding method. Section ‘M/S coding in 3D space’ describes the conventional M/S coding process and presents a three-channel Dependent M/S coding (3D-M/S) method. The main idea is to expand M/S coding to three-dimensional audio by designing a new transform matrix, which remove the redundancy of three channels in 3D space rather than just two channels in the horizontal plane. Section ‘3D-M/S psychoacoustic model’ discusses the psychoacoustic model for transformed 3D-M/S signals. Section ‘Framework for general channel configuration’ specifies a new framework enables 3D-M/S to be applied to a more general channel configuration. Section ‘Experiment’ gives a comparison of 3D-M/S coding with PCA coding and Independent channel coding to justify the performance of compression ratio and computational complexity. Section ‘Conclusion’ summarizes and concludes this paper.
M/S coding in 3D space
Conventional M/S coding
M/S coding in three-dimensional space
where $\theta ,\phi \in \left[0,\frac{\pi}{2}\right]$, which determine the gain factor of the three channels.
An example is shown in Figure 2. It can be observed that when the source is close to the center of two or three channels, a corresponding matrix can produce difference signals with lower dynamic range compared to the original channel signals. Under a certain masking threshold, far less bits are required for quantizing the difference signals which brings the coding gain.
Transformed channel signals with five matrices
C _{ M } | C _{ S } | C _{ T } | |
---|---|---|---|
M _{0} | C _{1} | C _{2} | C _{3} |
M _{1} | C _{1} | $\frac{\sqrt{2}}{2}S\text{sin}\theta (\text{cot}\theta +\text{sin}\phi )$ | $\frac{\sqrt{2}}{2}S\text{sin}\theta (\text{cot}\theta -\text{sin}\phi )$ |
M _{2} | $\frac{\sqrt{2}}{2}S\text{sin}\theta (\text{cot}\theta +\text{cos}\phi )$ | C _{2} | $\frac{\sqrt{2}}{2}S\text{sin}\theta (\text{cot}\theta -\text{cos}\phi )$ |
M _{3} | $S\text{sin}\theta \text{sin}(\phi +\frac{\pi}{4})$ | $S\text{sin}\theta \text{sin}(\phi -\frac{\pi}{4})$ | C _{3} |
M _{4} | $\frac{\sqrt{6}}{3}S\text{sin}\theta (\text{cot}\theta +\text{sin}(\phi +\frac{\pi}{4}\left)\right)$ | $S\text{sin}\theta \text{sin}(\phi -\frac{\pi}{4})$ | $\frac{\sqrt{6}}{3}S\text{sin}\theta \left(\frac{\sqrt{2}}{2}\text{sin}\right(\phi +\frac{\pi}{4})-\text{cot}\theta )$ |
where i,j∈{1,2,3}, V_{ 01 }= (0,C_{2},Y C_{3}), V_{ 02 }= (C_{1},0,C_{3}), V_{ 03 }= (C_{1},C_{2},0) are the two channel projections of input vector V_{ 0 }. ${\mathbf{V}}_{\mathbf{1}}=\left(0,\frac{\sqrt{2}}{2},\frac{\sqrt{2}}{2}\right)$, ${\mathbf{V}}_{\mathbf{2}}=\left(\frac{\sqrt{2}}{2},0,\frac{\sqrt{2}}{2}\right)$, ${\mathbf{V}}_{\mathbf{3}}=\left(\frac{\sqrt{2}}{2},\frac{\sqrt{2}}{2},0\right)$, ${\mathbf{V}}_{\mathbf{4}}=\left(\frac{\sqrt{3}}{3},\frac{\sqrt{3}}{3},\frac{\sqrt{3}}{3}\right)$ are the summation vectors of each transform matrix.
3D-M/S psychoacoustic model
The same results can be deduced for other matrices.
Framework for general channel configuration
Experiment
The experiment used five channels (C_{1}, C_{2}, C_{3}, C_{4}, C_{5}) in spherical 22.2 multichannel configuration as shown in Figure 4. Considering that PCA is the best decorrelation transform theoretically and Independent channel coding is widely used for 22.2 multichannel compression, the experiment compared the proposed 3D-M/S method with PCA and Independent channel coding in bitrate, complexity and objective quality. Three MPEG test sequences (es01 voice signal, sc03 symphony music signal, si02 castanets transient signal, mono 48-kHz sampling) were used as the moving virtual sources following the VBAP rule, four sequences (si03, si01, sc01, es02) were used as the discrete fixed-position virtual sources. The virtual sources and respective azimuth and altitude panning angle are generated on a per-frame basis. Here, only point virtual sources were used to test the best performance of three methods, as subband signals can be regarded as point sources in subband coding when bandwidths are small enough. Signals with decorrelated elements are beyond the scope of VBAP model and will decrease the coding performance, for its difference signals retains high energy which depends on the correlation and the energy of the decorrelated elements. Uncorrelated signals with independent audio content is tested in the end.
Discrete virtual source setting and objective results
θ | φ | Average | ||||
---|---|---|---|---|---|---|
$\frac{7\mathit{\pi}}{32}$ | $\frac{5\mathit{\pi}}{32}$ | $\frac{3\pi}{32}$ | $\frac{\mathit{\pi}}{32}$ | |||
Ind | $\frac{11\pi}{36}$ | 20.76 | – | – | – | 20.76 |
3D-M/S | $\frac{11\pi}{36}$ | 21.45 | – | – | – | 21.45 |
PCA | $\frac{11\pi}{36}$ | 21.95 | – | – | – | 21.95 |
Ind | $\frac{13\pi}{36}$ | 15.37 | 21.14 | – | – | 18.26 |
3D-M/S | $\frac{13\pi}{36}$ | 16.07 | 21.88 | – | – | 18.98 |
PCA | $\frac{13\pi}{36}$ | 16.54 | 22.74 | – | – | 19.64 |
Ind | $\frac{15\pi}{36}$ | 23.48 | 23.28 | 22.08 | – | 22.95 |
3D-M/S | $\frac{15\pi}{36}$ | 23.77 | 22.93 | 22.54 | – | 23.08 |
PCA | $\frac{15\pi}{36}$ | 23.86 | 23.78 | 23.49 | – | 23.71 |
Ind | $\frac{17\pi}{36}$ | 21.82 | 19.06 | 18.93 | 19.36 | 19.79 |
3D-M/S | $\frac{17\pi}{36}$ | 21.88 | 18.52 | 19.02 | 19.51 | 19.73 |
PCA | $\frac{17\pi}{36}$ | 21.92 | 21.36 | 19.18 | 20.57 | 20.76 |
The 3D-M/S and PCA was used in each subband in the frequency domain. The three encoders were realized based on FAAC-1.28, and decoders were based on FAAD2-2.7. AAC-LC was used as the core codec and only the long window was enabled for simplification. To avoid the influence of dynamic bandwidth setting of the FAAC, the experiment fixed the bandwidth at 12 kHz with 35 subbands.
Independent channel coding: Audio signals were sent into the core codec and compressed directly.
3D-M/S: The vector was calculated using the subband energy of three channels from AAC psychoacoustic module with no extra energy computation. Then 3D-M/S matrix switching was performed and 3 bits were used per mode parameter. The transformed signals were sent into the core codec, and the masking threshold was modified accordingly.
PCA: The eigenvectors were calculated for each subband. Subband signals were transformed using eigenvector matrix and then sent into core codec. The covariance matrix was quantized and transmitted to the decoder following a previous KLT-based multichannel audio coding scheme [21], with 4 bits per non-redundant element.
Objective evaluation
Bitrate setup and overall SNR
Quality | Bitrate/channel (kbps) | Average | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
(SNR) | C _{ 1 } | C _{ 2 } | C _{ 3 } | C _{ 4 } | C _{ 5 } | P _{ 1 } | P _{ 2 } | P _{ 3 } | ||
Ind | 16.08 | 61.6 | 60.8 | 60.3 | 58.7 | 58.7 | – | – | – | 61.2 |
3D-M/S | 18.21 | 229.5 | 21.7 | 57.9 | 4.9 | 4.9 | 4.9 | 64.7 | ||
PCA | 18.87 | 133.2 | 25.2 | 42.8 | 39.3 | 39.3 | 39.3 | 63.8 |
Secondly, the PCA parameter bitrate of 39.3 kbps/channel is considerably higher than 3D-M/S. If the three channels have little correlation (e.g. channels with different contents or ambient sound), the transformed signals will not save any bits and cause the decrease of coding efficiency. To test the three methods under such condition, the virtual sources of three different signals were fixed at three channels and coded all at 64 kbps. The experimental result is shown in Figure 6. We can see Independent channel coding achieves the best performance in this case; meanwhile, 3D-M/S degrades about 1 dB and PCA degrades nearly 7 dB. It is because, for PCA requirement, too many bits are used for parameters which now cannot bring any coding gain. But for 3D-M/S, parameter bits for modes are only 4.9 kbps/channel. It will not reduce the coding efficiency much for medium and high bitrate conditions, which is the main application scenario of M/S coding. Although the high bitrate for PCA can be alleviated by reducing the refresh rate of PCA parameters, but it will decrease the coding performance on VBAP signals at the same time.
Time complexity
Complexity (s) | |||
---|---|---|---|
Encoder | Decoder | Ratio | |
Ind | 2.382 | 0.223 | 100.0% |
3D-M/S | 2.604 | 0.306 | 111.7% |
PCA | 2.977 | 0.416 | 130.2% |
Subjective evaluation
From the above results on three point sources and uncorrelated signals, it can be observed that both PCA and 3D-M/S method get about 13% SNR improvement for each channel. But the complexity of 3D-M/S is much lower than PCA to achieve similar performance. It can be explained that the fixed matrix transform can be regarded as some special vectors in PCA. The special vectors are chosen based on the assumption that channel signals are either quite similar or quite different. This assumption may not be always true for the diversity of subband signals, but it makes a good compromise between coding efficiency and complexity.
Conclusion
This paper proposed a 3D-M/S coding method, which inherits the low complexity of conventional M/S coding. Moreover, 3D-M/S performs the sum and difference coding triple by triple, rather than couple by couple of the conventional method. This structure is more suitable for a 3D multichannel audio configuration, because adjacent three channels form a triangle and will have the maximum redundancy in spatial configured 3D audio channels. Besides, it is also convenient to unfold 3D audio multichannel structure into plane triangles. Combining the proposed framework, 3D-M/S and PCA methods can be applied to more than three channels. An experiment on VBAP signals indicates the performance of proposed method with relatively low complexity, comparing to the PCA and independent channel coding. Considering the development of 3D audio technology and its requirement for compression efficiency, a low complexity 3D audio codec will be promising and preferable for practical application.
Declarations
Acknowledgements
This work was supported by the National Natural Science Foundation of China (nos. 61231015, 61102127, 61201340, 61201169) and Natural Science Foundation of Hubei (nos. 2011CDB451, 2012FFB04205).
Authors’ Affiliations
References
- Berkhout AJ, de Vries D, Vogel P: Acoustic control by wave field synthesis. J. Acoust. Soc. Am 1993, 93(5):2764-2778. 10.1121/1.405852View ArticleGoogle Scholar
- Gerzon MA: Ambisonics in multichannel broadcasting and video. J. Audio Eng. Soc 1985, 33(11):859-871.Google Scholar
- Cooperstock J: Multimodal telepresence systems. IEEE Signal Process. Mag 2011, 28: 77-86.View ArticleGoogle Scholar
- Staff A: Multichannel audio systems and techniques. J. Audio Eng. Soc 2005, 53(4):329-335.Google Scholar
- Rumsey F: Cinema sound for the 3-D era. J. Audio Eng. Soc 2013, 61(5):340-344.Google Scholar
- Nettingsmeier J: Birds on the wire - WFS live transmission project report. Tech. rep., Fraunhofer 2008Google Scholar
- Sakaida S, Iguchi K, Nakajima N, Nishida Y, Ichigaya A, Nakasu E, Kurozumi M, Gohshi S: The super hi-vision codec. IEEE International Conference on Image Processing, 2007. ICIP 2007, Volume 1 2007, I-21–I-24.Google Scholar
- Baumgarte F, Faller C: Binaural cue coding-part I: psychoacoustic fundamentals and design principles. IEEE Trans. Speech Audio Process 2003, 11(6):509-519. 10.1109/TSA.2003.818109View ArticleGoogle Scholar
- Faller C, Baumgarte F: Binaural cue coding-part II: schemes and applications. IEEE Trans. Speech Audio Process 2003, 11(6):520-531. 10.1109/TSA.2003.818108View ArticleGoogle Scholar
- Oomen W, Schuijers E, Brinker den B, Breebaart J: Advances in parametric coding for high-quality audio. Audio Engineering Society Convention 114 2003.Google Scholar
- Breebaart J, van de Par S, Kohlrausch A, Schuijers E: Parametric coding of stereo audio. EURASIP J. Adv. Sig. Pr 2005, 2005(9):561917.Google Scholar
- Herre J, Disch S: New concepts in parametric coding of spatial audio: from SAC to SAOC. 2007 IEEE International Conference on Multimedia and Expo 2007, 1894-1897.View ArticleGoogle Scholar
- Goodwin M, Jot J: Primary-ambient signal decomposition and vector-based localization for spatial audio coding and enhancement. IEEE International Conference on Acoustics, Speech and Signal Processing, 2007. ICASSP 2007, Volume 1 2007, I-9–I-12.Google Scholar
- Cheng B, Ritz C, Burnett I: A spatial squeezing approach to ambisonic audio compression. IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008 2008, 369-372.View ArticleGoogle Scholar
- Hellerud E, Solvang A, Svensson U: Spatial redundancy in Higher Order Ambisonics and its use for lowdelay lossless compression. IEEE International Conference on Acoustics, Speech and Signal Processing, 2009. ICASSP 2009 2009, 269-272.View ArticleGoogle Scholar
- Tzagkarakis C, Mouchtaris A, Tsakalides P: A multichannel sinusoidal model applied to spot microphone signals for immersive audio. IEEE Trans. Audio Speech Lang. Process 2009, 17(8):1483-1497.View ArticleGoogle Scholar
- Pinto F, Vetterli M: Wave field coding in the spacetime frequency domain. IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008 2008, 365-368.View ArticleGoogle Scholar
- Pinto F, Vetterli M: space-time-frequency processing of acoustic wave fields: theory, algorithms, and applications. IEEE Trans. Signal Process 2010, 58(9):4608-4620.MathSciNetView ArticleGoogle Scholar
- Cheng B: Spatial squeezing techniques for low bit-rate multichannel audio coding. PhD thesis. University of Wollongong 2011Google Scholar
- Cheng B, Ritz C, Burnett I, Zheng X: A general compression approach to multi-channel three-dimensional audio. IEEE Trans. Audio Speech Lang. Process 2013, 21(8):1676-1688.View ArticleGoogle Scholar
- Yang D, Ai H, Kyriakakis C, Kuo CC: High-fidelity multichannel audio coding with Karhunen-Loeve transform. IEEE Trans. Speech Audio Process 2003, 11(4):365-380. 10.1109/TSA.2003.814375View ArticleGoogle Scholar
- Johnston J, Ferreira A: Sum-difference stereo transform coding. 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1992. ICASSP-92, Volume 2 1992, 569-572.View ArticleGoogle Scholar
- Liu CM, Lee WC, Hsiao YH: M/S coding based on allocation entropy. Proceedings of the 6th International Conference on Digital Audio Effects (DAFx-03) 2003.Google Scholar
- Derrien O, Richard G: A new model-based algorithm for optimizing the MPEG-AAC in MS-Stereo. IEEE Trans. Audio Speech Lang. Process 2008, 16(8):1373-1382.View ArticleGoogle Scholar
- Krueger H, Vary P: A new approach for low-delay joint-stereo coding. 2008 ITG Conference on Voice Communication (SprachKommunikation) 2008, 1-4.Google Scholar
- Schafer M, Vary P: Hierarchical multi-channel audio coding based on time-domain linear prediction. 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO) 2012, 2148-2152.Google Scholar
- Neuendorf M, Multrus M, Rettelbach N, Fuchs G, Robilliard J, Lecomte J, Wilde S, Bayer S, Disch S, Helmrich C, Lefebvre R, Gournay P, Bessette B, Lapierre J, Kjorling K, Purnhagen H, Villemoes L, Oomen W, Schuijers E, Kikuiri K, Chinen T, Norimatsu T, Seng CK, Oh E, Kim M, Quackenbush S, Grill B: MPEG unified speech and audio coding-the ISO/MPEG standard for high-efficiency audio coding of all content types. In Audio Engineering Society Convention 132. Audio Engineering Society, 2012);Google Scholar
- Multrus M, Neuendorf M, Lecomte J, Fuchs G, Bayer S, Robilliard J, Nagel F, Wilde S, Fischer D, Hilpert J, Rettelbach N, Helmrich C, Disch S, Geiger R, Grill B: MPEG unified speech and audio coding - bridging the gap. In Microelectronic Systems. Edited by: Heuberger A, Elst G, Hanke R. Berlin, Heidelberg: (Springer Berlin Heidelberg; 2011:351-362.View ArticleGoogle Scholar
- Helmrich C, Carlsson P, Disch S, Edler B, Hilpert J, Neusinger M, Purnhagen H, Robilliard J, Villemoes L, RettelbachN: Efficient transform coding of two-channel audio signals by means of complex-valued stereo prediction. 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2011, 497-500.View ArticleGoogle Scholar
- Liu CM, Hsu HW, Lee WC: Compression artifacts in perceptual audio coding. IEEE Trans. Audio Speech Lang. Process 2008, 16(4):681-695.View ArticleGoogle Scholar
- Zotter F, Frank M: All-round ambisonic panning and decoding. J. Audio Eng. Soc 2012, 60(10):807-820.Google Scholar
- Ando A, Sugimoto T, Irie K: Coding of 22.2 multichannel audio signal by MPEG-AAC. IEICE Tech. Rep., Volume 113 of EA2013-46 2013, 75-80.Google Scholar
- Ando A: Conversion of multichannel sound signal maintaining physical properties of sound in reproduced sound field. IEEE Trans. Audio Speech Lang. Process 2011, 19(6):1467-1475.View ArticleGoogle Scholar
- ITU-T: Method for the subjective assessment of intermediate sound quality (MUSHRA). 2001.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.