 Empirical Research
 Open access
 Published:
Robust acoustic reflector localization using a modified EM algorithm
EURASIP Journal on Audio, Speech, and Music Processing volume 2024, Article number: 22 (2024)
Abstract
In robotics, echolocation has been used to detect acoustic reflectors, e.g., walls, as it aids the robotic platform to navigate in darkness and also helps detect transparent surfaces. However, the transfer function or response of an acoustic system, e.g., loudspeakers/emitters, contributes to nonideal behavior within the acoustic systems that can contribute to a phase lag due to propagation delay. This nonideal response can hinder the performance of a timeofarrival (TOA) estimator intended for acoustic reflector localization especially when the estimation of multiple reflections is required. In this paper, we, therefore, propose a robust expectationmaximization (EM) algorithm that takes into account the response of acoustic systems to enhance the TOA estimation accuracy when estimating multiple reflections when the robot is placed in a corner of a room. A nonideal transfer function is built with two parameters, which are estimated recursively within the estimator. To test the proposed method, a hardware proofofconcept setup was built with two different designs. The experimental results show that the proposed method could detect an acoustic reflector up to a distance of 1.6 m with \(60\%\) accuracy under the signaltonoise ratio (SNR) of 0 dB. Compared to the stateoftheart EM algorithm, our proposed method provides improved performance when estimating TOA by \(10\%\) under a low SNR value.
1 Introduction
Within the context of robot audition, the use of echolocation for acoustic reflector localization and estimation has been proposed by various researchers in the past [1,2,3]. Within this domain, researchers are utilizing acoustic signal processing techniques and propose combining echolocation with stateoftheart technologies, e.g., laser and camerabased technologies to aid a robot in constructing a spatial map of an indoor environment. This can be accomplished by a collocated microphoneloudspeaker combination. One major disadvantage of the camera and laserbased technologies is that they cannot work in complete darkness and cannot detect transparent surfaces that are typically found in an office environment, This makes accurate construction of a spatial map of an environment a difficult process.
The process involved in the aforementioned echolocation techniques is to probe the environment with a known sound so that the reflected signal acquired by a microphone can be processed to estimate the time of arrival (TOA) of the acoustic echo that aids a robot to estimate the distance between the acoustic reflector. Traditionally, TOA information is extracted from room impulse response (RIR) estimates (Fig. 1) which is normally done using a peakpicking approach [2,3,4,5,6]. This model is broadly divided into two distinct parts: the direct path including early reflections and late reflections which are comprised of a stochastic dense tail [7]. The directpath component is the shortest distance a sound can take, i.e., it provides information about the distance between the transmitter and receiver while early reflections help in inferring the distance of the closest acoustic reflector [2, 3, 8]. While TOA estimation enables a robot to determine the distance of an acoustic reflector, the directionofarrival (DOA) of an acoustic source is required to determine the location of an acoustic source. This is done by incorporating multiple receivers attached to a robot [9,10,11]. Recent advancement in machine learning techniques has also enabled robotic platform to incorporate echolocation for terrain classification and detecting echoes from noisy data. For example, in [12], the author proposed training using advanced signal filtering and machine learning techniques which could be used to accurately classify terrain types for a small mobile robot. One potential for such a method is to help robot navigation, i.e., detecting roads from other surfaces. Moreover, echolocation is used to map a spatial map of an indoor environment. For example, in [13], the authors propose training a neural network to predict depth maps and grayscale images from sound alone. The work presented in [13] was later improved in [14] by improving the neural network and reducing the computation time needed to run the model. The contribution of the paper was a full \(360^o\) 3D depth reconstruction with 4 microphones and a lidarbased SLAM for training a model. One notable difference between modelbased approaches and datadriven approaches is the availability of large data sets required to train a neural network. Comparatively, the modelbased approach finds the feature of interest directly from the signal model.
While ultrasonic sensors are popular within robotics to detect obstacles, these require specialized hardware to transmit/receive acoustic echoes and could potentially increase the overall cost of a robotic platform. However, most robots intended for humanrobot interaction (HRI) consist of a collocated microphoneloudspeaker setup, e.g., Softbank’s NAO robot. In our previous work, we proposed a TOA/DOA estimator based on the expectationmaximization (EM) framework [8] but with crude assumptions about the acoustic properties of the acoustic reflectors (point source, ideal reflectors, etc.) and the hardware (ideal response, omnidirectionality). However, these assumptions lead to a detrimental model mismatch in practical settings, e.g., since loudspeakers/microphones contribute to a phase lag due to propagation delay [15], which deteriorates the performance of the TOA/DOA estimator in [8, 16], particularly in the presence of multiple acoustic reflections. This causes a severe problem when using the TOA/DOA estimates in robots for generating a spatial map of an indoor environment using acoustic echoes. Therefore, we propose an algorithm that utilizes the previously proposed loudspeakermicrophone setup to estimate the distance of an acoustic reflector, while estimating the response of the acoustic systems, which may facilitate simultaneous estimation of multiple acoustic echoes impinging at different TOAs and/or from different DOAs.
Traditionally, estimating the transfer function of the loudspeaker is usually done using a loudspeakerenclosed microphone (LEM) setup which involves placing the setup within an anechoic environment. However, in [17], the researchers proposed a method to measure the transfer function of the loudspeaker within an echoic environment. This is done by utilizing two loudspeakers, one of them calibrated and its transfer function already estimated within an anechoic chamber. The loudspeaker is placed in a fixed location within the environment. The process involves transmitting a white noise signal through the calibrated loudspeaker to measure its impulse response (IR) and later replacing the loudspeaker with the uncalibrated loudspeaker and repeating the IR measurement. The transfer function of the uncalibrated loudspeaker is estimated using least squares. Furthermore, TOA estimation can also be influenced by the materials that acoustic reflectors are composed of, e.g., concrete, glass, and cardboard. This is because some materials absorb certain sound frequencies that could lead to nonideal characteristics of the observed signals [18]. The aforementioned method requires access to an anechoic chamber which is a timeconsuming process, hence, there is a need to estimate the response of the acoustic system directly from the model.
In this paper, we, therefore, extend the modelbased method originally proposed in [19] and later used in our previous work [8] to accommodate the nonideal transfer function of an acoustic system, i.e., the loudspeaker, the microphone, and the reflecting materials. We take a modelbased approach to TOA estimation where the model of the early reflections is used to derive a statistically optimal estimator. More specifically, we include an unknown filter to model the uncertainties of the acoustic system which may alleviate the need to estimate loudspeaker IR measurement suggested in [17]. Moreover, to test the proposed method, a proofofconcept setup is built to conduct experiments using real data.^{Footnote 1}
The remaining part of this paper is organized as follows: Section 2 introduces the problem formulation, and Section 3 proposes the TOA estimation method based on EM. Finally, the experimental results followed by discussion and conclusion can be found in Sections 4, 5, and 6, respectively.
2 Problem formulation
Consider the scenario where a loudspeaker is emitting a known probe signal, which is then propagating an acoustic environment, and recorded by a microphone. This can be mathematically modeled as
where h(n) is the acoustic impulse response from the loudspeaker to the microphone, s(n) is the known probe signal, and w(n) is additive background noise while \(x(n) = h(n)*s(n)\). The acoustic impulse response can be further modeled by decomposing the reverberation into early and late reverberation components. The early reflections are modeled as timedelayed and filtered versions of the known probe signal, where the filter represents the responses of the loudspeaker, microphone, and acoustic reflectors. Mathematically, we formulate this as
where R is the number of early reflections, \(g_r\) is the filter pertaining to the \(r^{th}\) reflection, \(\tau _r\) is the delay of the \(r^{th}\) reflection, and v(n) is a noise term embracing both the additive background noise and the late reflections. In the special case where \(M = 1\) for all \(r=1,\dots , R\), we get the ideal model used in [8], which does not account for the nonideal hardware responses that are inevitable in real scenarios. We then assume stationarity and that we have N observations following this model, i.e.,
Here, \(\textbf{D}\) is a cyclic shift register that delays filter gain \(\textbf{g}_{r}\). The matrix \(\textbf{G}_{r}\) has a dimension of \((NM+1)\times N\) while \(\textbf{S}\) has a dimension of \((NM+1)\times M\), where N is the length of the signal while M is the filter length. The filter \(\textbf{g}_{r}\) is a \(1\times M\) vector of the rth reflection. If we assume that the noise term is white Gaussian noise, the maximum likelihood estimator for the unknown filters, \(\textbf{g}_r\), and delays, \(\tau _r\), for \(r=1,\ldots , R\), is given by
Compared to [19], we do not assume that the gain or filter \(\textbf{g}_{r}\) is set to 1. Hence, the problem at hand is to estimate the delay \(\tau _{r}\) and the filter parameters \(\textbf{g}_{r}\). Moreover, in this paper, we are interested in estimating these parameters to localize the position of an acoustic reflector using echolocation which was not addressed in [19]. Furthermore, resolving (9) to estimate \(\tau _{r}\) and \(\textbf{g}_{r}\) clearly, leaves us with a computationally complex and multidimensional task. However, as we shall see next, this can be solved by incorporating iterative procedures such as expectationmaximization (EM).
3 Robust EMbased acoustic reflector localization
The EM algorithm developed in [20] is a general method intended to solve maximumlikelihood (ML) estimation problem given incomplete data [19]. It is intended to alleviate the complexity of parameter estimation. The EM algorithm requires that the complete data be specified. Here, we may define our complete data as all the observations of the individual reflections, each defined as
for, \(r=1,\ldots , R\), where \(\textbf{v}_r(n)\) are individual noise terms obtained by arbitrarily decomposing the noise term \(\textbf{v}(n)\) into R components, such that
Moreover, we can write the observed signal as the sum of the individual observed reflections, i.e.,
We let the individual noise terms be independent, zeromean, white Gaussian and distributed as \(\mathcal {N}(\textbf{0},\beta _r\textbf{C})\), where \(\textbf{0}\) is a vector of zeros and \(\textbf{C}=\textrm{E}[\textbf{v}(n)\textbf{v}^{T}(n)]=\sigma _{v}^{2}\textbf{I}_{{N}}\) is an \(N\times N\) matrix of \(\textbf{v}(n)\), \(\sigma _v^2\) is the variance. \(\textrm{E}[.]\) is the mathematical expectation. Moreover, the scaling factors, \(\beta _r\), are nonnegative, realvalued scalars that satisfy the following:
Here, the \(\beta _{r}\) must satisfy the condition above but it is an arbitrary free variable and could be used to control the rate of convergence. The choice of \(\beta\) could be resort to more investigation as noted by [19] but here we choose the \(\beta = 1/{R}\). The EM algorithm for the problem at hand is given by
Estep:
Mstep:
where \({}^{(i)}\) denotes the iteration index. The Mstep can be simplified since the estimator is linear with respect to the unknown filter coefficients. Moreover, under white Gaussian conditions, the estimator in (15) becomes a maximum likelihood estimator. We can thus solve for these first, which yields
If we insert this back into (15), we get
A potential problem with these estimators is that the filter estimates \(\widehat{\textbf{g}}_{r}\) are unconstrained, which may lead to unreasonably large filter coefficients, since the reflections may partly cancel each other out. One way of addressing such problems is by introducing a constraint on the white noise gain of the filter:
This can be solved using the method of Lagrange multipliers, i.e., to solve for the constrained filter, we write
By taking the partial derivative with respect to the filter, we get
That is, the filter estimate becomes
where \(\lambda\) is the tuning parameter that is empirically set while the \(\textbf{I}\) is the identity matrix. The estimated \(\tau _{r}\) of an acoustic reflector could be converted into a distance estimate if we assume that the speed of sound is known for the given environment and that we are interested in estimating only the firstorder early reflection. This simple conversion can be done as follows:
where c is the speed of sound and d is the distance of an acoustic reflector with respect to a source.
However, by taking the acoustic response within the model, we can estimate multiple reflections originating from two acoustic reflectors, i.e., firstorder and secondorder reflection. By combining the proposed method with ecolabeling [21,22,23], we can estimate the position of multiple acoustic echoes.
4 Experimental results
In this section, we investigate two issues, the performance of the proposed method under different conditions, and the benefit of estimating multiple acoustic echoes. In the first experiment, the proposed method was tested using signals that are synthesized using the room impulse response generator [24] with the following setup. The synthetic room has a dimension of \(6.38\times 5.4\times 4.05\) m. The analysis window considered was set to \(\tau_{\min}\) and \(\tau_{\max}\) samples corresponding to a distance of 0.5 m to 3 m similar to the computation time to run performed in [25]. This analysis window also helps in estimating the firstorder early reflection and prevents the directpath component from being estimated. Moreover, the probe signal s(n) is a broadband signal of length 2000 samples drawn from a Gaussian burst with zero padding to form a signal of length 20,000 samples.
4.1 Proofofconcept
The experimental platform is used to evaluate the performance of the proposed method. The overall system architecture is shown in Fig. 2. Two design variations are proposed to test the proposed method for the acoustic reflector’s position and distance estimation. One variation consists of a loudspeaker (Genelec 8030A) with a microphone (G.R.A.S 40 PH) attached to the top of the loudspeaker. The distance between the acoustic center of a loudspeaker and the center of a microphone is 0.15 m. This is shown in Fig. 3. The second variation consists of a 6 microphone arranged in a uniform circular array (UCA) of radius 0.2 m with a loudspeaker placed at the center of the UCA. This is shown in Fig. 4. The loudspeakermicrophone was placed 1.5 m above the floor inside Aalborg University’s Sound Lab that has a dimension of \(6.38\times 5.4\times 4.05\) m. Furthermore, both the loudspeaker and microphones are connected to an audio interface (Presonus 1818VSL). A Lidar sensor (TFMini Micro) is used to measure the distance between the wall and the platform and is used as a ground truth for further analysis. The audio interface is subsequently connected to a laptop via a USB port. To ensure low latency from hardware, ASIO driver^{Footnote 2} is installed from the internet. Moreover, MATLAB is used as a data acquisition software tool to record and save the observed signals and for statistical analysis of the proposed method. Furthermore, for multichannel data acquisition, PlayRec [26] is used to transmit and record sound simultaneously. The sampling frequency is set to 48, 000 Hz while the speed of sound is assumed as 343 m/s
4.2 Simulated and real results
In the first experiment, the nonideal characteristic of acoustic systems is modeled by filtering the room impulse response, \(h_\text {RIR}\) using a bandpass filter with the impulse response, \(h_\text {BP}\), to obtain our nonideal impulse response, \(h_\text {NI}\), i.e.,
The bandpass filter was a secondorder Butterworth filter with cutoff frequencies, \(\varvec{\omega }=[0.2\pi , 0.6\pi ]\). The nonideal room impulse response was then applied to a known probe signal, s(n), to generate the observation used for the experiment. Here, the search interval for the delays, or TOAs, was chosen as \(\tau \in [1,80]\) samples, and therefore we set N to 2, 080. The number of reflections was set to \(R=3\) because this number gives us better estimates of 2 acoustic reflectors, the number of EM iterations was set to 100, and \(\beta _r=1/R\). Furthermore, the directpath component was removed from the observed signal using an RIR generator. Using this setup, we ran the IdealEM (EMI) method with a filter length \(M=1\) as proposed in [19], and the presented robustEM method (EMR) with filter length \(M=5\) and \(\lambda = 100\). The resulting cost functions, \(J(\textbf{g},\tau )\) from (19), are depicted in Figs. 5 and 6, respectively. Here, \(J_{1}\), \(J_{2}\), and \(J_{3}\) represent the cost function with \(M =1, \lambda =0\), \(M =5, \lambda =100\), and \(M =15, \lambda =500\), respectively. From the results, we can first see how the ideal impulse responses are affected by the bandpass filter applied to it, which smears out the peaks. When applying the EMI method, we therefore also do not see two clearly defined peaks around the timeofarrivals of the two components. If we instead use the EMR method, we can model the effects of the bandpass filter, which results in two broader, but clearly defined peaks at the TOA.
Furthermore, we repeat the simulated experiment in a practical setting using the hardware platform in Fig. 3. The platform was placed at a corner of a room with a distance to the walls, 1 m and 0.65 m, respectively. The collocated microphoneloudspeaker setup probes the environment with a known sound, and the received echoes are recorded by the microphone. The observed signal was later used to estimate the RIR of the environment using the dualchannel method [27]. This is done by computing \(\widehat{H}(f)=Y(f)/S(f)\) and then taking the inverse DFT to get \(\widehat{h}=\mathcal {F}^{1}\{\widehat{H}(f)\}\). The EMR’s filter length was set to \(M=15\), \(\lambda =500\), and \(R=3\). As seen in Fig. 7, the EMR method successfully estimates all the peaks corresponding to an individual acoustic reflector. In this experiment, both M and \(\lambda\) are set empirically. However, in the future iteration of this work, we can adaptively select these parameters.
4.3 Impact of distances and background noises
In this experiment, we evaluate the performance of the proposed TOA estimator and compare it against varying distances. The setup was placed at a distance of [0.8, 1.0, 1.5, 2.0, 2.5] m, and 100 acoustic echoes were recorded at each interval. The data was collected using the single channel setup shown in Fig. 3. Accuracy is defined as the percentage of TOA that is within \(\pm 10\%\) of the ground truth value obtained from the lidar. The proposed method (EMR) is compared with the previous method (EMI) proposed by [19] and singlechannel localization and mapping (ScLAM) [28]. These results are shown in Fig. 8. The data obtained from this experiment is also summarized in Table 1.
Additionally, a comparison of the proposed method against different background noise was also performed. To simulate different noise levels, a separate loudspeaker was placed at a distance of 6.4 m away from the setup within the lab. This separate loudspeaker was used to simulate a low signaltonoise ratio (SNR). The separate loudspeaker is playing an audio clip from YouTube called cocktail party^{Footnote 3}. The SNR is defined as the variance of the observed signal, \(\textbf{x}(n)\), against the variance of the background noise, \(\textbf{v}(n)\).
where \(\sigma _{x}^{2} = E[\Vert \textbf{x}(n)\Vert ^{2}]\) and \(\sigma _{v}^{2} = E[\Vert \textbf{v}(n)\Vert ^{2}]\). Both the observed signal and the background noise are recorded for 1 s. The background noise was recorded before the system probed the environment with a known signal. Based on this configuration, 4 SNRs were selected by adjusting the loudness of the separate speaker, [0, 10, 20, 30] dB. Furthermore, 100 audio recordings were obtained at each SNR to evaluate the proposed method (EMR). The evaluation results are shown in Fig. 9. According to Table 1, both the standard deviation \(\sigma\) and root mean square error (RMSE) of the EMI and EMR increases when the distance between the acoustic reflector and the platform increases while the mean value \(\mu\) is close to the ground truth for a distance up to 1.5 and for all SNRs.
4.4 Evaluation of robust EM using multilateration technique
In this experiment, we test the performance of the proposed method using multilateration technique. In this way, we can estimate the DOA of the acoustic echoes which can aid robotic platforms to locate the source of the acoustic echoes. The idea here is that the proposed method will estimate TOAs from each of the microphoneloudspeaker combinations, which will then be used with a multilateration technique. Multilateration is a localization technique popularly used in telecommunication to estimate the direction and distance of a transmitter/source [29,30,31]. Moreover, multilateration was also used to estimate the robot’s position in 3D space as proposed in [32]. Within the context of this paper, multilateration is used to estimate the location of the acoustic reflector. Multilateration techniques rely on the TOAs’ knowledge of the acoustic reflections and also assume that the locations of the sensor nodes are known with respect to the same coordinate system. To locate an acoustic reflector, we need to set a reference with respect to a coordinate system. This information could be known from the robot’s motor encoder or from an inertial measurement unit (IMU) but this aspect of robot navigation is beyond the scope of this paper. More specifically, let us assume that we have P microphones and the source is placed on the same xyplane. Using (17), we can estimate the TOA and (22), the range value vector, \(\textbf{d}\). If the microphones are located on the xyplane or 2D plane, at positions, \([\textbf{x}_{p}, \textbf{y}_{p}] = [(x_{1}, y_{1}),(x_{2}, y_{2}), \dots , (x_{P}, y_{P})]\), where P are the number of microphones, then based on the range data \(\textbf{d}_{p}\) a circle can be drawn from each microphone. The point of intersection of these individual circles would yield the location of the acoustic reflector as seen in Fig. 10. The true acoustic reflector position (x, y) is at the intersection of all the circles and satisfies the following equations:
In the presence of noise, the estimations of \(\textbf{d}\), the circles will not intersect at a single point. Therefore, a leastsquare fit can be used to obtain the acoustic reflector location estimate [33], i.e.,
where
The setup used for this experiment is shown in Fig. 4. Here, the setup was fixed at distances [0.7, 1.1, 1.5] m against an acoustic reflector. Furthermore, 50 recordings were made at each distance which was later evaluated. The results are depicted in Fig. 11 and listed in Table 2. According to Table 2, the \(\sigma\) and RMSE values of the proposed method increase as the platform’s distance with respect to the wall also increases while \(\mu\) value is close to 0.7 m at an SNR of 30.
5 Discussion and limitations
Two platform designs were proposed to test the algorithm: A collocated microphoneloudspeaker as seen in Fig. 3 and a uniform circular microphone array with a loudspeaker positioned at the center of the array as seen in Fig. 4. The results obtained from the first experiment revealed that the proposed method can be used to estimate multiple acoustic reflections as EMR can account for the acoustic system’s response which can hinder the estimation accuracy of multiple acoustic reflections. As seen in Fig. 6, EMR estimates multiple peaks that correspond to an acoustic reflectorm, while EMI (Fig. 5) estimates a single acoustic reflector. Therefore, estimating multiple acoustic reflectors using the proposed method is beneficial for spatial map construction in an indoor environment.
In the second experiment, the performance of EMR and EMI are evaluated using the proofofconcept setup described in Section 4.1. The results in Fig. 8 reveal that EMR provides significant improvements in estimating the acoustic reflector as it can account for the acoustic system’s response that affects the performance of the TOA estimator, while Fig. 9 shows that the proposed method is \(~10\%\) better than the EMI method overall SNR values which are on par with the ScLAM techniques. According to the results obtained in Fig. 8, the proposed method can estimate an acoustic reflector up to a distance of 1.5 m with \(60\%\) accuracy under low SNR of 0 dB. Similarly, the proposed method is robust against different SNR levels as seen in Fig. 9 compared to EMI. The results obtained from Table 1 shows that the proposed method offers a limited range as it estimates the acoustic reflector’s range up to a distance of 1.5 m with an RMSE of 0.2671 m at a high SNR value of 30 dB. Under low SNR value of 0 dB, the \(\mu\), \(\sigma\), and RMSE remain similar which indicates that the proposed method is robust under changing environmental conditions.
In the last experiment, we combined the proposed method with a multilateration technique so that the direction, as well as the location of the acoustic reflector, is determined by a robotic system as it navigates an indoor environment. Here, we test EMI, EMR, and ScLAM under an SNR of 30 dB and place the multichannel setup at varying distances. According to the results obtained in Fig. 11, all methods can estimate an acoustic reflector up to a distance of 0.7 m with \(80\%\) accuracy. The results obtained in Table 2 also indicates that the \(\mu\), \(\sigma\) and the RMSE are similar for all 3 methods (EMI, EMR and ScLAM). The \(\mu\) value is around 0.6154 m while the RMSE value is 0.16176 m when the setup is placed at a distance of 0.7 m. The \(\mu\) and RMSE values increase as the distance between the wall and the setup increases to 1.1 m and 1.5 m. This reduction in accuracy could be due to the loudspeaker blocking the acoustic echoes from reaching one of the microphones placed behind the loudspeaker which could affect the TOA estimation. This could result in spurious estimates that can reduce the performance of the multilateration technique when locating an acoustic source. Similar performance is seen in the remaining methods. However, for multilateration technique to work within robotics, the robotic platform requires the knowledge of its Cartesian position in the environment, i.e., the position of the loudspeaker and microphones should be known. One way to acquire this information is by utilizing sensors used for tracking the odometry and orientation of a robot, e.g., the inertial measurement unit. However, in this paper, we assume that the location of the loudspeaker and microphones will be known.
6 Conclusions and future work
The contribution of this paper is to propose a robust expectationmaximization technique for acoustic reflector localization, intended for the robotic platform using echolocation. The proposed method builds on existing work proposed by [19], i.e., their work assumed that the gain or filter parameters are assumed to be the same which in practice is not a valid assumption as this can hinder the acoustic reflector estimation process. Hence, in this paper, we introduced this uncertainty within the signal formulation. Three experiments were performed in a simulated and practical environment. To test the performance of the proposed method, two proofofconcept platforms are used: one consists of a collocated microphoneloudspeaker arrangement while the other consists of a uniform circular microphone array with a loudspeaker placed at the center of an array. From our experimental results, we deduce that our proposed method can estimate an acoustic reflector up to a distance of 1.5 m with \(60\%\) accuracy and can be combined with a multilateration technique to locate the direction of an acoustic reflector. Our proposed method can be beneficial to the robotic platforms as it can complement existing laser and camerabased technologies for generating a spatial map of an indoor environment as done in our previous works. Our proposed echolocation method can aid a robotic platform in detecting and estimating transparent surfaces and can also estimate multiple acoustic echoes when a robot moves to a corner of a room.
In the future iteration of this work, we aim to implement the proposed method on an existing robotic platform, e.g., Softbank’s NAO robot, and also improve the algorithm and combine it with ecolabeling techniques as proposed in [21] so that multiple acoustic echoes are estimated and categorized to represent an indoor environment. We also intend to test the proposed method using the robotic platform outlined in [28]. This way, we can test the performance of the proposed method against the ScLAM and McLAM algorithms and also evaluate the performance in generating a spatial map of a typical office environment. The current proofofconcept is a fixed loudspeakermicrophone setup, while in [28], the setup is placed on top of a robotic platform that moves within an indoor environment. Moreover, this method could also be used in a wireless acoustic sensor network (WASN) to detect acoustic sources [28, 34].
Availability of data and materials
Not applicable.
Notes
The dataset and code for this work can be found here: https://doi.org/10.5281/zenodo.5082224
Abbreviations
 TOA:

Timeofarrival
 EM:

Expectationmaximization
 UCA:

Uniform circular array
 SNR:

Signaltonoise ratio
 DOA:

Directionofarrival
 aSLAM:

Acoustic simultaneous localization and mapping
 RIR:

Room impulse response
 TDOA:

Time differenceofarrival
 ML:

Maximum likelihood
 \(T_{60}\) :

Reverberation time (60 dB)
 RPM:

revolutions per minute
 DREGON:

Database of drone audio recordings
 NLS:

nonlinear least squares
References
J. Steckel, H. Peremans, BatSLAM: Simultaneous localization and mapping using biomimetic sonar. PLoS ONE 8(1), 1–11 (2013)
R. Kuc, Echolocation with bat buzz emissions: Model and biomimetic sonar for elevation estimation. J. Acoust. Soc. Am. 131(1), 561–568 (2012)
M. Kreković, I. Dokmanić, M. Vetterli, EchoSLAM: Simultaneous localization and mapping with acoustic echoes, Proc. IEEE Int. Conf. Acoust., Speech, Signal Process, IEEE, pp. 11–15 (2016)
S. Tervo, J. Pätynen, T. Lokki, Acoustic reflection localization from room impulse responses. ACTA Acustica U. Acustica 98(3), 418–440 (2012)
G. Defrance, L. Daudet, J.D. Polack, Detecting arrivals within room impulse responses using matching pursuit, Proc. of the 11th Int. Conference on Digital Audio Effects (DAFx08), Espoo, Finland. vol. 10, pp. 307–316 (2008)
G. Defrance, L. Daudet, J.D. Polack, Using matching pursuit for estimating mixing time within room impulse responses. Acta Acustica U. Acustica 95(6), 1071–1081 (2009)
G. Moschioni, A new method for measurement of early sound reflections in theaters and halls, Proceedings of the 19th IEEE Instrumentation and Measurement Technology Conference (IEEE Cat. No.00CH37276), IEEE, vol. 1, pp. 425–430 (2002)
U. Saqib, S. Gannot, J. Jensen, Estimation of acoustic echoes using expectationmaximization methods. EURASIP J. Audio Speech Music. Process. 2020(1), 1–15 (2020)
Y. Geng, J. Jung, Soundsource localization system for robotics and industrial automatic control systems based on neural network, 2008 International Conference on Smart Manufacturing Application, IEEE, pp. 311–315 (2008)
S. Dey, S. Boppu, M.S. Manikandan, Design of a realtime automatic source monitoring framework based on sound source localization, 2019 Seventh International Conference on Digital Information Processing and Communications (ICDIPC), IEEE, pp. 35–40 (2019)
H. Zhu, H. Wan, Single sound source localization using convolutional neural networks trained with spiral source, 5th International Conference on Automation, Control and Robotics Engineering (CACRE), IEEE, pp. 720–724 (2020)
N. Riopelle, P. Caspers, D. Sofge, Terrain classification for autonomous vehicles using batinspired echolocation, 2018 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1–6 (2018)
J.H. Christensen, S. Hornauer, S.X. Yu, BatVision: Learning to see 3D spatial layout with two ears, IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 1581–1587 (2020)
E. Tracy, N. Kottege, Catchatter: Acoustic perception for mobile robots. IEEE Robot. Autom. Lett. 6(4), 7209–7216 (2021)
D.W. Gunness, Loudspeaker transfer function averaging and interpolation. J. Audio Eng. Soc. (2001)
U. Saqib, J.R. Jensen, Soundbased distance estimation for indoor navigation in the presence of ego noise, Proc. 27th European Signal Processing Conf. (EUSIPCO), IEEE, pp. 15 (2019)
P. Ahgren, P. Stoica, A simple method for estimating the impulse responses of loudspeakers. IEEE Trans. Consum. Electron. 49(4), 889–893 (2003)
Z. Sü, M. Çalışkan, Acoustical design and noise control in metro stations: Case studies of the ankara metro system. Build. Acoust. 14(3), 203–221 (2007)
M. Feder, E. Weinstein, Parameter estimation of superimposed signals using the em algorithm. IEEE Trans. Acoust. Speech Signal Process. 36(4), 477–489 (1988)
A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)
I. Dokmanic, R. Parhizkar, A. Walther, Y.M. Lu, M. Vetterli, Acoustic echoes reveal room shape. Proc. Natl. Acad. Sci. 110(30), 12186–12191 (2013)
L. Nguyen, J.V. Miro, X. Qiu, Can a robot hear the shape and dimensions of a room?, International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 5346–5351 (2019)
M. Boutin, G. Kemper, Can a groundbased vehicle hear the shape of a room?. Studies in Applied Mathematics. 151(1), 352368 (2023)
E.A.P. Habets, I. Cohen, S. Gannot, Generating nonstationary multisensor signals under a spatial coherence constraint. J. Acoust. Soc. Am. 124(5), 2911–2917 (2008)
U. Saqib, J. Jensen, A modelbased approach to acoustic reflector localization using robotic platform,in Proc. IEEE Int. Conf. Intell., Robot, Automation (IROS), IEEE, pp. 1–8 (2018)
R. Humphrey, Playrec: Multichannel MATLAB audio. (2007). http://www.playrec.co.uk. Accessed Mar 2001
H. Herlufsen, Dual channel FFT analysis (part I), Brüel & Kjær Technical Review. (1984)
U. Saqib, J.R. Jensen, A framework for spatial map generation using acoustic echoes for robotic platforms. Robot. Auton. Syst. 150, 104009 (2022)
J. Yang, H. Lee, K. Moessner, Multilateration localization based on singular value decomposition for 3D indoor positioning, Int. Conf. Indoor Positioning and Indoor Navigation, IEEE, pp. 1–8 (2016)
J. Wan, N. Yu, R. Feng, Y. Wu, C. Su, Localization refinement for wireless sensor networks. Comput. Commun. 32(13), 1515–1524 (2009)
Y. Zhou, Jun Li, L. Lamont, Multilateration localization in the presence of anchor location uncertainties, IEEE Global Communications Conference (GLOBECOM), IEEE, pp. 309–314 (2012)
A. Yazici, U. Yayan, H. Yücel, An ultrasonic based indoor positioning system, Int. Symposium on Innovations in Intell. Sys. and Applications, IEEE, pp. 585–589 (2011)
C. Chen, K. Yao, in Classical and Modern DirectionofArrival Estimation, ed. by T.E. Tuncer, B. Friedlander. Source and node localization in sensor networks (Academic Press, Boston, 2009), pp. 343–383
M. Cobos, F. Antonacci, A. Alexandridis, A. Mouchtaris, B. Lee, A survey of sound source localization methods in wireless acoustic sensor networks. Wirel. Commun. Mob. Comput, pp. 124 (2017)
Acknowledgements
Not applicable.
Funding
This work was funded by Aalborg Unversity, Denmark.
Author information
Authors and Affiliations
Contributions
JRJ, MGC, and US designed the idea for the manuscript. JRJ and US conducted the experiments. All the authors contributed to the writing of this work. Moreover, all author(s) read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Saqib, U., Græsbøll Christensen, M. & Jensen, J. Robust acoustic reflector localization using a modified EM algorithm. J AUDIO SPEECH MUSIC PROC. 2024, 22 (2024). https://doi.org/10.1186/s1363602400340y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1363602400340y