Signal Processing Implementation and Comparison of Automotive Spatial Sound Rendering Strategies
© M. R. Bai and J.-R. Hong. 2009
Received: 9 September 2008
Accepted: 8 June 2009
Published: 24 August 2009
Design and implementation strategies of spatial sound rendering are investigated in this paper for automotive scenarios. Six design methods are implemented for various rendering modes with different number of passengers. Specifically, the downmixing algorithms aimed at balancing the front and back reproductions are developed for the 5.1-channel input. Other five algorithms based on inverse filtering are implemented in two approaches. The first approach utilizes binaural (Head-Related Transfer Functions HRTFs) measured in the car interior, whereas the second approach named the point-receiver model targets a point receiver positioned at the center of the passenger's head. The proposed processing algorithms were compared via objective and subjective experiments under various listening conditions. Test data were processed by the multivariate analysis of variance (MANOVA) method and the least significant difference (Fisher's LSD) method as a post hoc test to justify the statistical significance of the experimental data. The results indicate that inverse filtering algorithms are preferred for the single passenger mode. For the multipassenger mode, however, downmixing algorithms generally outperformed the other processing techniques.
With rapid growth in digital telecommunication and display technologies, multimedia audiovisual presentation has become reality for automobiles. However, there remain numerous challenges in automotive audio reproduction due to the notorious nature of the automotive listening environment. In car interior, the confined space lacks natural reverberations. This may degrade the perceived spaciousness of audio rendering. Localization of sound images may also be obscured by strong reflections from the window panels, dashboard, and seats . In addition, the loudspeakers and seats are generally not in proper positions and orientations, which may further aggravate the rendering performance [2, 3]. To address these problems, a comprehensive study of automotive multichannel audio rendering strategies is undertaken in this paper. Rendering approaches for different numbers of passengers are presented and compared.
In spatial sound rendering, binaural audio lends itself to an emerging audio technology with many promising applications [4–10]. It proves effective in recreating stereo images by compensating for the asymmetric positions of loudspeakers in car environment . However, this approach suffers from the problem of the limited "sweet spot" in which the system remains effective [7, 8]. To overcome this limitation, several methods that allow for more accurate spatial sound field synthesis were suggested in the past. The Ambisonics technique originally proposed by Gerzon is a series of recording and replay techniques using multichannel mixing technology that can be used live or in the studio . The Wave Field Synthesis (WFS) technique is another promising method to creating a sweet-spot-free rendering environment [12–14]. Nevertheless, the requirement of large number of loudspeakers, and hence the high processing complexity, limits its implementation in practical systems.
Notwithstanding the eager quest for advanced rendering methods in academia, the majority of the off-the-shelf automotive audio systems still rely on simple systems with panning and equalization functions. For instance, Pioneer's (Multi-Channel Acoustic Calibration MCACC) system attempts to compensate for the acoustical responses between the listener's head position and the loudspeaker by using a 9-band equalizer . Rarely has been seen a theoretical treatment with rigorous evaluation on the approaches that have been developed for this difficult problem.
If binaural audio and the WFS are regarded as two extremes in terms of loudspeaker channels, this paper is focused on pragmatic and compromising approaches of automotive audio spatializers targeted at economical cars with four available loudspeakers for 5.1-channel input contents. In these approaches, it is necessary to downmix the audio signals to decrease the number of audio channels between the inputs and the outputs . By combining various inverse filtering and the downmixing techniques, six rendering strategies are proposed for various passengers' sitting modes. One of the six methods is based on downmixing approaches, whereas the remaining five methods are based on inverse filtering.
The proposed approaches have been implemented on a real car by using a fixed-point digital signal processor (DSP). Extensive objective and subjective experiments were conducted to compare the presented rendering strategies for various listening scenarios. In order to justify the statistical significance of the results, the data of subjective listening tests are processed by the multivariate analysis of variance (MANOVA)  method, followed by the least significant difference method (Fisher's LSD) as a post hoc test. In light of these tests, it is hoped that viable rendering strategies capable of delivering compelling and immersive listening experience in automotive environments can be found.
2. Downmixing-Based Strategy
Next, the frontal channels are weighted (0.65) and delayed (20 millisecond) to produce the back channels.
3. Inverse Filtering-Based Approaches
Beside the aforementioned downmixing-based strategy, five other strategies are based on inverse filtering. These design strategies are further divided into two categories. The first category is based on the Head-Related Transfer Functions (HRTFs) that account for the diffraction and shadowing effects due to the head, ears, and torso. Three rendering strategies are developed to reproduce four virtual images located at and in accordance with the 5.1 deployment stated in ITU-R Rec. BS.775-1 . For the 5.1-channel inputs and four loudspeakers, the center channel has to be attenuated by 3 dB and mixing into the front-left and the front-right channels. The HRTF database measured by the MIT Media Laboratory [19, 20] is employed as the matching model, whereas the HRTFs measured in the car are used as the acoustical plant. The second category named "the point-receiver model" regards the passenger's head as a simple point-receiver at the center.
3.1. Multichannel Inverse Filtering
The regularization parameter β that weights the input power against the performance error can be used to prevent the singularity of H from saturating the filters. If is too small, there will be sharp peaks in the frequency responses of the CCS filters, whereas if is too large, the cancellation performance will be rather poor. The criterion for choosing the regularization parameter is dependent on a preset gain threshold . Inverse Fast Fourier transforms (FFT) along with circular shifts (hence the modeling delay) are needed to obtain causal FIR filters.
In general, it is not robust to implement the inverse filters based on the measured room responses that usually have many noninvertible zeros (deep troughs) . In this paper, a generalized complex smoothing technique suggested by Hatziantoniou and Mourjopoulos  is employed to smooth out the peaks and dips of the acoustical frequency responses before the design of inverse filters.
3.2. Inverse Filtering-Based Approaches and Formulation
3.2.1. HRTF Model
where the superscripts and refer to the ipsilateral and the contralateral paths, respectively. The subscripts 30 and 110 in the matching model matrix signify the azimuth angles of the HRTF. The HRTFs are assumed to use symmetry, the and are generated by swapping the ipsilateral and contralateral sides of and . The acoustical plants H are the frequency response functions between the inputs to the loudspeakers and the outputs from the microphones mounted in the (Knowles Electronics Manikin for Acoustic Research KEMAR's) [19, 20] ears. This leads to a 4 4 matrix inversion problem, which is computationally demanding to solve. In order to yield a more tractable solution, the current research has separated this problem into two parts: the front side and the back side. Specifically, the frontal loudspeakers are responsible for generating the sound images at , while the back loudspeakers are responsible for generating the sound images at . In this approach, the plant, the matching model, and the inverse filter matrices are given by
where superscripts and denote the front-side and the back-side, respectively. The inverse matrices are calculated using (3). In comparison with the formulation in (4) and (5), a great saving of computation can be attained by applying this approach. The number of the inverse filters reduces from sixteen (one 4 4 matrix) to eight (two 2 2 matrices).
To be specific, there are two –one for the ipsilateral side ( ) and another for contralateral side ( ). Both HRTFs refer to the transfer functions between a source positioned at with respect to the head center and two ears. Although the loudspeakers in the car are not symmetrically deployed, the matching model (consisting of and ) of the inverse filter design in the present study is chosen tom be symmetrical. For the asymmetrical acoustical plants, we can calculate the inverse filters using (3). The loudspeaker setups are not symmetrical for the front left virtual sound and the front right virtual sound and hence the acoustical plants are not symmetrical. This results in different solutions for the inverse filters.
Next, the situation with two passengers sitting on different seats, for example, the front left and the back right seats, is examined. This problem involves four control points for two passengers' ears, four loudspeakers, and four input channels. Following the steps from the single passenger case, the design of the inverse filter can be divided into two parts. Accordingly, two 4 2 matrices of the acoustical plants, two 4 2 matrices of the matching models, and two 2 2 matrices of the inverse filters are expressed as follows:
The subscripts of z), are as follows i = 1,2 refers to the left and right ears of the passenger 1, = 3,4 refers to the left and the right ears of the passenger 2, and = 1,2,3,4 refers to the four loudspeakers. In the 4 2 matrices (z) and (z), the first and second rows are identical to the third and fourth rows. Specifically, the rows 1 and 2 are for passenger 1 while the rows 3 and 4 are for passenger 2. The two HRTF inversion methods outlined in (6)–(8) and (9)–(11) were used to generate the following test.
HRTF-Based Inverse Filtering for Single Passenger
HRTF-Based Inverse Filtering (HIF2) for Two Passengers
In this section, two HRTF-based inverse filtering strategies designed for two passengers and 5.1-channel input are presented. The first approach named the HIF2 method considers four control points for two passengers. The associated system matrices take the form formulated in (9) to (11). The two 2 2 inverse filter matrices are calculated as previously. The block diagram of the HIF2 method follows that of the HIF1 method.
HRTF-Based Inverse Filtering (HIF2-S) for Two Passengers
This approach is named the HIF2-S method. In (12), the design procedures of the HIF2-S method are divided into two steps. First, the inverse filters for a single passenger sitting on respective positions are designed. Next, by adding the filter coefficients obtained in the first step, two 2 2 inverse filter matrices are obtained. The block diagram of the HIF2-S method follows that of the HIF1 method.
3.2.2. Point-Receiver Model
where denotes the transfer function from the th loudspeaker to the control point. The frequency response function measured using the same type of loudspeakers in the car in an anechoic chamber is designated as the matching model . The point-receiver model was used to generate the following test system.
Point-Receiver-Based Inverse Filtering for Single Passenger
Point-Receiver-Based Inverse Filtering for Two Passengers
For the rendering scenario with two passengers and 5.1-channel input, the aforementioned filter superposition idea is employed in the point-receiver-based inverse filtering approach (PIF2-S). The structure of this rendering approach is similar to those of the PIF1 approach, as shown in Figure 6. A PIF2 system analogous to the HIF2 system was considered in initial tests, but was eliminated from final testing because the PIF2 approach performed badly in an informal experiment, as compared with the other approaches.
4. Objective and Subjective Evaluations
The descriptions of ten automotive audio rendering approaches.
No. input channel
1 or more
Downmixing + weighting & delay
HRTF-based inverse filtering
HRTF-based inverse filtering
HRTF-based inverse filtering
Point-receiver-based inverse filtering
Point-receiver-based inverse filtering
4.1. Objective Experiments
4.1.1. The HRTF-Based Model
4.1.2. The Point-Receiver-Based Model
4.2. Subjective Experiments
The definitions of the subjective attributes.
Overall preference in considering timbral and spatial attributes
Dominance of low-frequency sound
Dominance of high-frequency sound
Any extraneous disturbances to the signal
Determination by a subject of the apparent source direction
The clarity of the frontal image or the phantom center
The sound is dominated by the loudspeaker closest to the subject
Perceived quality of listening within a reverberant environment
4.2.1. Experiment I
Experiment I is intended for evaluating the rendering algorithms designed for one passenger in the FL seat or BR seat. The DWD, HIF1, and PIF1 methods are compared in this experiment. Because only four loudspeakers are available in this car, the center channel of the 5.1-channel input is attenuated by 3 dB and mixed into the frontal channels to serve as the hidden reference. In addition, the four channels of input signals are summed and lowpass filtered (with 4 kHz cutoff frequency) to serve as the anchor.
Figures 13(a) and 13(b) show the means and spreads of the grades on the subjective attributes for the FL position, while Figures 13(c) and 13(d) show the results for the BR position. For the FL position, the results of the post hoc test indicate that the grades of the HIF1 method in preference and fullness are significantly higher than those of the DWD and the PIF1 methods. In brightness, only the grade of PIF1 methods is significantly higher than the hidden reference, while no significant difference between the DWD method and the HIF1 method is found. In addition, there is no significant difference among methods in the attributes artifact, localization, proximity and envelopment. In the attribute frontal, however, the inverse filter-based methods received significantly higher grades than the hidden reference and the DWD method.
4.2.2. Experiment II
Experiment II is intended for evaluating the rendering algorithms designed for two passengers in the FL seat and BR seat and the 5.1-channel input. Four methods including the DWD method, the HIF2 method, the HIF2-S method, and the PIF2-S method are compared in this experiment. The hidden reference and the anchor are identical to those defined in Experiment I.
The summary of the rendering strategies recommended for various listening scenarios.
Number input channel
First, for the rendering scenario with a single passenger and the 5.1-channel inputs, the HIF1 method is suggested for the passenger sitting in the FL seat, whereas the PIF1 method would be the preferred choice for the passenger sitting in the BR seat. Second, for the two-passenger scenario, the HIF2-S method received high grade in most subjective attributes. However, no significant difference in the attributes preference, brightness, artifact, localization and frontal was found between the DWD method and the HIF2-S method. Considering the computational complexity, the DWD method should be the most preferred choice for the two-passenger scenario. Overall, the inverse filtering approaches did not perform as well for the multipassenger scenario as it did for the single passenger scenario. The number of inverse filters increases drastically with number of passengers, rendering approaches of this kind impractical in automotive applications.
The work was supported by the National Science Council in Taiwan, China, under the project no. NSC91-2212-E009-032.
- Kahana Y, Nelson PA, Yoon S: Experiments on the synthesis of virtual acoustic sources in automotive interiors. Proceedings of the 16th International Conference on Spatial Sound Reproduction and Applications of the Audio Engineering Society, March 1999, Paris, FranceGoogle Scholar
- Crockett B, Smithers M, Benjamin E: Next generation automotive sound research and technologies. Proceedings of the 120th Convention of Audio Engineering Society, 2006, Paris, France paper no. 6649Google Scholar
- Bai MR, Lee CC: Comparative study of design and implementation strategies of automotive virtual surround audio systems. to appear in Journal of the Audio Engineering SocietyGoogle Scholar
- Damaske P, Mellert V: A procedure for generating directionally accurate sound images in the upper-half space using two loudspeakers. Acoustica 1969, 22: 154-162.Google Scholar
- Begault DR: 3-D Sound for Virtual Reality and Multimedia. AP Professional, Cambridge, Mass, USA; 1994.Google Scholar
- Gardner WG: Transaural 3D audio. MIT Media Laboratory; 1995.Google Scholar
- Bai MR, Lee C-C: Development and implementation of cross-talk cancellation system in spatial audio reproduction based on subband filtering. Journal of Sound and Vibration 2006,290(3-5):1269-1289. 10.1016/j.jsv.2005.05.016View ArticleGoogle Scholar
- Bai MR, Lee C-C: Objective and subjective analysis of effects of listening angle on crosstalk cancellation in spatial sound reproduction. The Journal of the Acoustical Society of America 2006,120(4):1976-1989. 10.1121/1.2257986View ArticleGoogle Scholar
- Bai MR, Shih G-Y, Lee C-C: Comparative study of audio spatializers for dual-loudspeaker mobile phones. The Journal of the Acoustical Society of America 2007,121(1):298-309. 10.1121/1.2387121View ArticleGoogle Scholar
- Takeuchi T, Nelson PA: Optimal source distribution for binaural synthesis over loudspeakers. The Journal of the Acoustical Society of America 2002,112(6):2786-2797. 10.1121/1.1513363View ArticleGoogle Scholar
- Menzies D, Al-Akaidi M: Nearfield binaural synthesis and ambisonics. The Journal of the Acoustical Society of America 2007,121(3):1559-1563. 10.1121/1.2434761View ArticleGoogle Scholar
- Gauthier P-A, Berry A, Woszczyk W: Sound-field reproduction in-room using optimal control techniques: simulations in the frequency domain. The Journal of the Acoustical Society of America 2005,117(2):662-678. 10.1121/1.1850032View ArticleGoogle Scholar
- Betlehem T, Abhayapala TD: Theory and design of sound field reproduction in reverberant rooms. The Journal of the Acoustical Society of America 2005,117(4):2100-2111. 10.1121/1.1863032View ArticleGoogle Scholar
- Theile G, Wittek H: Wave field synthesis: a promising spatial audio rendering concept. Acoustical Science and Technology 2004,25(6):393-399. 10.1250/ast.25.393View ArticleGoogle Scholar
- Pioneer : MCACC Multi-Channel Acoustic Calibration. August 2008, http://www.pioneerelectronics.com/PUSA/PressRoom/Press+Releases/Car+Audio+Video/Computer+Technology+and+Car+Audio+Converge+in+Pioneer+Single+CD+Player+with+Hard+Disk+Drive%2C+Memory+Stick%2C+MP3+Playback
- Bai MR, Shih G-Y: Upmixing and downmixing two-channel stereo audio for consumer electronics. IEEE Transactions on Consumer Electronics 2007,53(3):1011-1019.View ArticleGoogle Scholar
- Sharma S: Applied Multivariate Techniques. John Wiley & Sons, New York, NY, USA; 1996.Google Scholar
- ITU-R Rec. BS.775-1 : Multi-channel stereophonic sound system with or without accompanying picture. International Telecommunications Union, Geneva, Switzerland, 1994Google Scholar
- Gardner WG, Martin KD: KEMAR HRTF measurements. MIT's Media Lab, August 2008, http://sound.media.mit.edu/resources/KEMAR.htmlGoogle Scholar
- Gardner WG, Martin KD: HRTF measurements of a KEMAR. The Journal of the Acoustical Society of America 1995,97(6):3907-3908. 10.1121/1.412407View ArticleGoogle Scholar
- Noble B: Applied Linear Algebra. Prentice-Hall, Englewood Cliffs, NJ, USA; 1988.MATHGoogle Scholar
- Hatziantoniou PD, Mourjopoulos JN: Errors in real-time room acoustics dereverberation. Journal of the Audio Engineering Society 2004,52(9):883-899.Google Scholar
- Hatziantoniou PD, Mourjopoulos JN: Generalized fractional-octave smoothing of audio and acoustic responses. Journal of the Audio Engineering Society 2000,48(4):259-280.Google Scholar
- ITU-R BS.1534-1 : Method for the subjective assessment of intermediate sound quality (MUSHRA). International Telecommunications Union, Geneva, Switzerland, 2001Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.