H. Kuttruff, Room acoustics (CRC Press, Germany, 2016). https://doi.org/10.1201/9781315372150.
D. Griesinger, The psychoacoustics of apparent source width, spaciousness and envelopment in performance spaces. Acta Acustica U. Acustica. 83(4), 721–731 (1997).
J. B. Allen, D. A. Berkley, Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am.65(4), 943–950 (1979). https://doi.org/10.1121/1.382599.
S. Gannot, D. Burshtein, E. Weinstein, Signal enhancement using beamforming and non-stationarity with applications to speech. IEEE Trans. Signal Process.49(8), 1614–1626 (2001). https://doi.org/10.1109/78.934132.
I. Cohen, Relative transfer function identification using speech signals. IEEE Trans. Speech Audio Process.12(5), 451–459 (2004). https://doi.org/10.1109/TSA.2004.832975.
S. Markovich, S. Gannot, I. Cohen, Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals. IEEE Trans Audio Speech Lang. Process.17(6), 1071–1086 (2009). https://doi.org/10.1109/TASL.2009.2016395.
O. Schwartz, S. Gannot, E. A. Habets, Multi-microphone speech dereverberation and noise reduction using relative early transfer functions. IEEE/ACM Trans. Audio Speech Lang. Process.23(2), 240–251 (2014). https://doi.org/10.1109/TASLP.2014.2372335.
S. Braun, W. Zhou, E. A. Habets, in 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Narrowband direction-of-arrival estimation for binaural hearing aids using relative transfer functions, (2015), pp. 1–5. https://doi.org/10.1109/WASPAA.2015.7336917.
X. Li, L. Girin, F. Badeig, R. Horaud, in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Reverberant sound localization with a robot head based on direct-path relative transfer function, (2016), pp. 2819–2826. https://doi.org/10.1109/IROS.2016.7759437.
Q. Nguyen, L. Girin, G. Bailly, F. Elisei, D. -C. Nguyen, in Workshop on Crossmodal Learning for Intelligent Robotics in conjunction with IEEE/RSJ IROS. Autonomous sensorimotor learning for sound source localization by a humanoid robot (IEEENew York, 2018).
B. Laufer-Goldshtein, R. Talmon, S. Gannot, et al, Data-driven multi-microphone speaker localization on manifolds. Found. Trends Signal Process.14(1–2), 1–161 (2020).
J. L. Flanagan, A. C. Surendran, E. -E. Jan, Spatially selective sound capture for speech and audio processing. Speech Comm.13(1-2), 207–222 (1993). https://doi.org/10.1016/0167-6393(93)90072-S.
E. E. Jan, P. Svaizer, J. L. Flanagan, in IEEE International Symposium on Circuits and Systems, vol. 2. Matched-filter processing of microphone array for spatial volume selectivity, (1995), pp. 1460–1463. https://doi.org/10.1109/ISCAS.1995.521409.
S. Affes, Y. Grenier, A signal subspace tracking algorithm for microphone array processing of speech. IEEE Trans. Speech Audio Process.5(5), 425–437 (1997). https://doi.org/10.1109/89.622565.
P. Annibale, F. Antonacci, P. Bestagini, A. Brutti, A. Canclini, L. Cristoforetti, E. Habets, W. Kellermann, K. Kowalczyk, A. Lombard, E. Mabande, D. Markovic, P. Naylor, M. Omologo, R. Rabenstein, A. Sarti, P. Svaizer, M. Thomas, The SCENIC project: environment-aware sound sensing and rendering. Procedia Comput. Sci.7:, 150–152 (2011). https://doi.org/10.1016/j.procs.2011.09.039.
I. Dokmanić, R. Scheibler, M. Vetterli, Raking the cocktail party. IEEE J. Sel. Top. Signal Process.9(5), 825–836 (2015). https://doi.org/10.1109/JSTSP.2015.2415761.
K. Kowalczyk, Raking early reflection signals for late reverberation and noise reduction. J. Acoust. Soc. Am. (JASA). 145(3), 257–263 (2019). https://doi.org/10.1121/1.5095535.
F. Ribeiro, D. Ba, C. Zhang, D. Florêncio, in IEEE International Conference on Multimedia and Expo (ICME). Turning enemies into friends: using reflections to improve sound source localization, (2010), pp. 731–736. https://doi.org/10.1109/ICME.2010.5583886.
D. Salvati, C. Drioli, G. L. Foresti, Sound source and microphone localization from acoustic impulse responses. IEEE Signal Process. Lett.23(10), 1459–1463 (2016). https://doi.org/10.1109/LSP.2016.2601878.
D. Di Carlo, A. Deleforge, N. Bertin, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Mirage: 2D source localization using microphone pair augmentation with echoes, (2019), pp. 775–779. https://doi.org/10.1109/ICASSP.2019.8683534.
J. Daniel, S. Kitić, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Time domain velocity vector for retracing the multipath propagation, (2020), pp. 421–425. https://doi.org/10.1109/ICASSP40776.2020.9054561.
A. Asaei, M. Golbabaee, H. Bourlard, V. Cevher, Structured sparsity models for reverberant speech separation. IEEE/ACM Trans. Audio Speech Lang. Process.22(3), 620–633 (2014). https://doi.org/10.1109/TASLP.2013.2297012.
S. Leglaive, R. Badeau, G. Richard, Multichannel audio source separation with probabilistic reverberation priors. IEEE/ACM Trans. Audio Speech Lang. Process.24(12), 2453–2465 (2016). https://doi.org/10.1109/TASLP.2016.2614140.
R. Scheibler, D. Di Carlo, A. Deleforge, I. Dokmanić, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Separake: source separation with a little help from echoes, (2018), pp. 6897–6901. https://doi.org/10.1109/ICASSP.2018.8461345.
L. Remaggi, P. J. Jackson, W. Wang, Modeling the comb filter effect and interaural coherence for binaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process.27(12), 2263–2277 (2019). https://doi.org/10.1109/TASLP.2019.2946043.
K. A. Al-Karawi, D. Y. Mohammed, Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions. Int. J. Speech Technol.22(4), 1077–1084 (2019). https://doi.org/10.1007/s10772-019-09648-z.
F. Antonacci, J. Filos, M. R. Thomas, E. A. Habets, A. Sarti, P. A. Naylor, S. Tubaro, Inference of room geometry from acoustic impulse responses. IEEE Trans. Audio Speech Lang. Process.20(10), 2683–2695 (2012). https://doi.org/10.1109/TASL.2012.2210877.
I. Dokmanić, R. Parhizkar, A. Walther, Y. M. Lu, M. Vetterli, Acoustic echoes reveal room shape. Proc. Natl. Acad. Sci. U.S.A.110(30), 12186–12191 (2013). https://doi.org/10.1073/pnas.1221464110.
M. Crocco, A. Trucco, A. Del Bue, Uncalibrated 3D room geometry estimation from sound impulse responses. J. Frankl. Inst.354(18), 8678–8709 (2017). https://doi.org/10.1016/j.jfranklin.2017.10.024.
L. Remaggi, P. J. B. Jackson, P. Coleman, W. Wang, Acoustic reflector localization: novel image source reversion and direct localization methods. IEEE/ACM Trans. Audio Speech Lang. Process.25(2), 296–309 (2017). https://doi.org/10.1109/TASLP.2016.2633802.
I. Szoke, M. Skacel, L. Mosner, J. Paliesek, J. H. Cernocky, Building and evaluation of a real room impulse response dataset. IEEE J. Sel. Top. Signal Process.13(4), 863–876 (2019). https://doi.org/10.1109/JSTSP.2019.2917582.
A. F. Genovese, H. Gamper, V. Pulkki, N. Raghuvanshi, I. J. Tashev, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Blind room volume estimation from single-channel noisy speech, (2019), pp. 231–235. https://doi.org/10.1109/ICASSP.2019.8682951.
E. Hadad, F. Heese, P. Vary, S. Gannot, in 14th International Workshop on Acoustic Signal Enhancement (IWAENC). Multichannel audio database in various acoustic environments, (2014), pp. 313–317. https://doi.org/10.1109/IWAENC.2014.6954309.
N. Bertin, E. Camberlein, R. Lebarbenchon, E. Vincent, S. Sivasankaran, I. Illina, F. Bimbot, VoiceHome-2, an extended corpus for multichannel speech processing in real homes. Speech Commun.106:, 68–78 (2019). https://doi.org/10.1016/j.specom.2018.11.002.
C. Gaultier, S. Kataria, A. Deleforge, in Lecture Notes in Computer Science, vol. 10169 LNCS. VAST: the virtual acoustic space traveler dataset, (2017), pp. 68–79. https://doi.org/10.1007/978-3-319-53547-0_7.
C. Kim, A. Misra, K. Chin, T. Hughes, A. Narayanan, T. N. Sainath, M. Bacchiani, in Interspeech 2017. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home (ISCAStockholm, 2017), pp. 379–383.
L. Perotin, R. Serizel, E. Vincent, A. Guerin, CRNN-based multiple DoA estimation using acoustic intensity features for ambisonics recordings. IEEE J. Sel. Top. Signal Process.13(1), 22–33 (2019). https://doi.org/10.1109/JSTSP.2019.2900164.
D. Di Carlo, C. Elvira, A. Deleforge, N. Bertin, R. Gribonval, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Blaster: an off-grid method for blind and regularized acoustic echoes retrieval, (2020), pp. 156–160. https://doi.org/10.1109/ICASSP40776.2020.9054647.
S. M. Schimmel, M. F. Muller, N. Dillier, in IEEE International Conference on Acoustics, Speech and Signal Processing. A fast and accurate “shoebox” room acoustics simulator, (2009), pp. 241–244. https://doi.org/10.1109/ICASSP.2009.4959565.
E. A. Habets, Room impulse response generator. Technische Universiteit Eindhoven, Tech. Rep. 2(2.4), 1 (2006).
R. Scheibler, E. Bezzam, I. Dokmanić, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Pyroomacoustics: a Python package for audio room simulations and array processing algorithms (Calgary, 2018). https://doi.org/10.1109/ICASSP.2018.8461310.
D. Diaz-Guerra, A. Miguel, J. R. Beltran, gpurir: a Python library for room impulse response simulation with GPU acceleration. Multimedia Tools Appl.80(4), 5653–5671 (2021). https://doi.org/10.1007/s11042-020-09905-3.
J. Čmejla, T. Kounovský, S. Gannot, Z. Koldovský, P. Tandeitnik, in European Signal Processing Conference (EUSIPCO). Mirage: multichannel database of room impulse responses measured on high-resolution cube-shaped grid, (2021), pp. 56–60. https://doi.org/10.23919/Eusipco47968.2020.9287646.
D. B. Paul, J. M. Baker, in Proceedings of the Workshop on Speech and Natural Language. The design for the Wall Street Journal-based CSR corpus (Association for Computational Linguistics, 1992), pp. 357–362. https://doi.org/10.3115/1075527.1075614.
O. Cramer, The variation of the specific heat ratio and the speed of sound in air with temperature, pressure, humidity, and co 2 concentration. J. Acoust. Soc. Am.93(5), 2510–2516 (1993). https://doi.org/10.1121/1.405827.
A. Farina, Simultaneous Measurement of Impulse Response and Distortion with a Swept-Sine Technique. Journal of The Audio Engineering Society (Audio Engineering Society, New York, 2000).
A. Farina, in Audio Eng. Soc. Convention (AES), 3. Advancements in impulse response measurements by sine sweeps, (2007), pp. 1626–1646.
M. Ravanelli, A. Sosi, P. Svaizer, M. Omologo, in European Signal Processing Conference (EUSIPCO). Impulse response estimation for robust speech recognition in a reverberant environment (IEEENew York, 2012), pp. 1668–1672.
I. Dokmanić, J. Ranieri, M. Vetterli, in European Signal Processing Conference (EUSIPCO). Relax and unfold: Microphone localization with Euclidean distance matrices (IEEENew York, 2015), pp. 265–269. https://doi.org/10.1109/EUSIPCO.2015.7362386.
M. Crocco, A. Del Bue, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Estimation of TDOA for room reflections by iterative weighted l1 constraint, (2016), pp. 3201–3205. https://doi.org/10.1109/ICASSP.2016.7472268.
A. Plinge, F. Jacob, R. Haeb-Umbach, G. A. Fink, Acoustic microphone geometry calibration. IEEE Signal Process. Mag., 14–28 (2016). https://doi.org/10.1109/MSP.2016.2555198.
A. Beck, P. Stoica, J. Li, Exact and approximate solutions of source localization problems. IEEE Trans. Signal Process.56(5), 1770–1778 (2008). https://doi.org/10.1109/TSP.2007.909342.
Y. E. Baba, A. Walther, E. A. P. Habets, 3D room geometry inference based on room impulse response stacks. IEEE/ACM Trans. Audio Speech Lang. Process.26(5), 857–872 (2018). https://doi.org/10.1109/TASLP.2017.2784298.
J. Eaton, N. D. Gaubitch, A. H. Moore, P. A. Naylor, Estimation of room acoustic parameters: the ACE challenge. IEEE/ACM Trans. Audio Speech Lang. Process.24:, 1681–1693 (2016).
G. Defrance, L. Daudet, J. -D. Polack, Finding the onset of a room impulse response: straightforward?IEEE/ACM Trans. Audio Speech Lang. Process.124(4), 248–254 (2008).
D. Di Carlo, P. Tandeitnik, C. Foy, N. Bertin, A. Deleforge, S. Gannot, Zenodo (2021). https://doi.org/10.5281/zenodo.4626590.
J. Eaton, N. D. Gaubitch, A. H. Moore, P. A. Naylor, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). The ACE challenge–corpus description and performance evaluation, (2015), pp. 1–5. https://doi.org/10.1109/WASPAA.2015.7336912.
J. M. Eargle, in Handbook of Recording Engineering. Characteristics of performance and recording spaces (SpringerNew York, 1996), pp. 57–65.
P. A. Naylor, N. D. Gaubitch, Speech dereverberation (Springer, United Kingdom, 2010).
M. R. Schroeder, New method of measuring reverberation time. J. Acoust. Soc. Am.37(6), 1187–1188 (1965).
W. T. Chu, Comparison of reverberation measurements using schroeder’s impulse method and decay-curve averaging method. J. Acoust. Soc. Am.63(5), 1444–1450 (1978).
N. Xiang, Evaluation of reverberation times using a nonlinear regression approach. J. Acoust. Soc. Am.98(4), 2112–2121 (1995).
S. Gannot, E. Vincent, S. Markovich-Golan, A. Ozerov, A consolidated perspective on multi-microphone speech enhancement and source separation. IEEE/ACM Trans. Audio Speech Lang. Process.25(4), 692–730 (2017). https://doi.org/10.1109/TASLP.2016.2647702.
H. L. Van Trees, Optimum array processing: part IV of detection, estimation, and modulation theory (Wiley, United States, 2004).
R. Scheibler, I. Dokmanić, M. Vetterli, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Raking echoes in the time domain, (2015), pp. 554–558. https://doi.org/10.1109/ICASSP.2015.7178030.
H. A. Javed, A. H. Moore, P. A. Naylor, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Spherical microphone array acoustic rake receivers, (2016), pp. 111–115. https://doi.org/10.1109/ICASSP.2016.7471647.
L. Condat, A. Hirabayashi, Cadzow denoising upgraded: a new projection method for the recovery of Dirac pulses from noisy linear measurements. Sampling Theory Signal Image Process.14(1), 17–47 (2015). https://doi.org/10.1007/BF03549586.
M. Miyoshi, Y. Kaneda, Inverse filtering of room acoustics. IEEE/ACM Trans. Acoust. Speech Signal Process.36(2), 145–152 (1988). https://doi.org/10.1109/29.1509.
S. Gannot, M. Moonen, Subspace methods for multimicrophone speech dereverberation. EURASIP J. Adv. Signal Process.2003(11), 1–17 (2003). https://doi.org/10.1155/S1110865703305049.
J. Benesty, J. Chen, Y. Huang, J. Dmochowski, On microphone-array beamforming from a mimo acoustic signal processing perspective. IEEE Trans. Audio Speech Lang. Process.15(3), 1053–1065 (2007). https://doi.org/10.1109/TASL.2006.885251.
M. R. Thomas, I. J. Tashev, F. Lim, P. A. Naylor, in International Workshop on Acoustic Signal Enhancement (IWAENC). Optimal beamforming as a time domain equalization problem with application to room acoustics (IEEE, 2014), pp. 75–79. https://doi.org/10.1109/IWAENC.2014.6953341.
I. Kodrasi, S. Doclo, in Hands-free Speech Communications and Microphone Arrays (HSCMA). EVD-based multi-channel dereverberation of a moving speaker using different RETF estimation methods, (2017), pp. 116–120. https://doi.org/10.1109/HSCMA.2017.7895573.
N. Gößling, S. Doclo, in International Workshop on Acoustic Signal Enhancement (IWAENC). Relative transfer function estimation exploiting spatially separated microphones in a diffuse noise field, (2018), pp. 146–150. https://doi.org/10.1109/IWAENC.2018.8521295.
S. Markovich-Golan, S. Gannot, W. Kellermann, in European Signal Processing Conference (EUSIPCO). Performance analysis of the covariance-whitening and the covariance-subtraction methods for estimating the relative transfer function, (2018), pp. 2499–2503. https://doi.org/10.23919/EUSIPCO.2018.8553007.
M. Kuster, Objective sound field analysis based on the coherence estimated from two microphone signals. J. Acoust. Soc. Am.131(4), 3284–3284 (2012). https://doi.org/10.1121/1.4708280.
O. Schwartz, S. Gannot, E. A. Habets, in 24th European Signal Processing Conference (EUSIPCO). Joint estimation of late reverberant and speech power spectral densities in noisy environments using Frobenius norm, (2016), pp. 1123–1127. https://doi.org/10.1109/EUSIPCO.2016.7760423.
T. H. Falk, C. Zheng, W. -Y. Chan, A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech. IEEE/ACM Trans. Audio Speech Lang. Process.18(7), 1766–1774 (2010). https://doi.org/10.1109/TASL.2010.2052247.
A. W. Rix, J. G. Beerends, M. P. Hollier, A. P. Hekstra, in IEEE International Conference on Acoustics, Speech, and Signal (ICASSP), vol. 2. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, (2001), pp. 749–752. https://doi.org/10.1109/ICASSP.2001.941023.
J. S. Bradley, H. Sato, M. Picard, On the importance of early reflections for speech in rooms. J. Acoust. Soc. Am.113(6), 3233–3244 (2003). https://doi.org/10.1121/1.1570439.
H Peic Tukuljac, A. Deleforge, R. Gribonval, in Advances in Neural Information Processing Systems (NeurIPS), 31, ed. by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett. MULAN: A Blind and Off-Grid Method for Multichannel Echo Retrieval (Curran Associates, Inc.New York, 2018). https://proceedings.neurips.cc/paper/2018/file/c9f95a0a5af052bffce5c89917335f67-Paper.pdf.
M. Crocco, A. Trucco, A. Del Bue, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Room reflector estimation from sound by greedy iterative approach, (2018), pp. 6877–6881. https://doi.org/10.1109/ICASSP.2018.8461640.
S. Tervo, T. Tossavainen, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 3D room geometry estimation from measured impulse responses, (2012), pp. 513–516. https://doi.org/10.1109/ICASSP.2012.6287929.
O. Shih, A. Rowe, in ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). Can a phone hear the shape of a room?, (2019), pp. 277–288. https://doi.org/10.1145/3302506.3310407.
U. Saqib, S. Gannot, J. R. Jensen, Estimation of acoustic echoes using expectation-maximization methods. EURASIP J. Audio Speech Music (2020). https://doi.org/10.1186/s13636-020-00179-z.
A. Beck, P. Stoica, J. Li, Exact and approximate solutions of source localization problems. IEEE Trans. Signal Process.56(5), 1770–1778 (2008). https://doi.org/10.1109/TSP.2007.909342.