I. McCowan, Microphone arrays: a tutorial (Queensland University, Australia, 2001), p. 1
Google Scholar
F. Gustafsson, F. Gunnarsson, in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP’03). Positioning using time-difference of arrival measurements, vol 6 (2003), pp. VI–553
Chapter
Google Scholar
Z. Khan, M.M. Kamal, N. Hamzah, K. Othman, N. Khan, in 2008 IEEE International RF and Microwave Conference. Analysis of performance for multiple signal classification (MUSIC) in estimating direction of arrival (2008), pp. 524–529
Chapter
Google Scholar
K. Nakadai, K. Nakamura, in Wiley Encyclopedia of Electrical and Electronics Engineering. Sound source localization and separation, (New York: John Wiley & Sons, 2015), pp. 1–18
S.A. Vorobyov, Principles of minimum variance robust adaptive beamforming design. Signal Process. 93, 3264 (2013)
Article
Google Scholar
M. Fuhry, L. Reichel, A new Tikhonov regularization method. Numerical Algorithms 59, 433 (2012)
Article
MathSciNet
MATH
Google Scholar
S. Amari, S.C. Douglas, A. Cichocki, H.H. Yang, in First IEEE Signal Processing Workshop on Signal Processing Advances in Wireless Communications. Multichannel blind deconvolution and equalization using the natural gradient (1997), pp. 101–104
Chapter
Google Scholar
M. Kawamoto, K. Matsuoka, N. Ohnishi, A method of blind separation for convolved non-stationary signals. Neurocomputing 22, 157 (1998)
Article
MATH
Google Scholar
T. Takatani, T. Nishikawa, H. Saruwatari, K. Shikano, in Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings. High-fidelity blind separation for convolutive mixture of acoustic signals using simo-model-based in- dependent component analysis, vol 2 (2003), pp. 77–80
Chapter
Google Scholar
D.W. Schobben, P. Sommen, A frequency domain blind signal separation method based on decorrelation. IEEE Trans. Signal Process. 50, 1855 (2002)
Article
Google Scholar
S. Makino, H. Sawada, S. Araki, in Blind Speech Separation. Frequency-domain blind source separation (Dordrecht: Springer, 2007), pp. 47–78
H. Buchner, R. Aichner, W. Kellermann, A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics. IEEE Trans. Speech Audio Process. 13, 120 (2004)
Article
Google Scholar
T. Kim, I. Lee, T.-W. Lee, in 2006 Fortieth Asilomar Conference on Signals, Systems and Computers. Independent vector analysis: definition and algorithms (2006), pp. 1393–1396
Chapter
Google Scholar
Y. Wang, D. Wang, Towards scaling up classification- based speech separation. IEEE Trans. Audio Speech Lang. Process. 21, 1381 (2013)
Article
Google Scholar
S. Mobin, B. Cheung, B. Olshausen, Generalization challenges for neural architectures in audio source separation, arXiv preprint arXiv:1803.08629 (2018)
Google Scholar
P.-S. Huang, M. Kim, M. Hasegawa-Johnson, P. Smaragdis, Joint optimization of masks and deep re- current neural networks for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 2136 (2015)
Article
Google Scholar
J.R. Hershey, Z. Chen, J. Le Roux, S. Watanabe, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Deep clustering: discriminative embeddings for seg- mentation and separation (2016), pp. 31–35
Chapter
Google Scholar
M. Kolbæk, D. Yu, Z.-H. Tan, J. Jensen, Mul-titalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25, 1901 (2017)
Article
Google Scholar
Y. Luo, N. Mesgarani, Conv-TasNet: Surpassing ideal time–frequency magnitude masking for speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 27, 1256 (2019)
Article
Google Scholar
K. Furuya, S. Sakauchi, A. Kataoka, in 2006 IEEE Inter-national Conference on Acoustics Speech and Signal Processing Proceedings. Speech dereverberation by combining MINT-based blind deconvolution and modified spectral subtraction, vol 1 (2006), p. I–I
Google Scholar
T. Nakatani, B.-H. Juang, T. Yoshioka, K. Kinoshita, M. Miyoshi, in 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. Importance of energy and spectral features in gaussian source model for speech dereverberation (New Paltz: IEEE, 2007), pp. 299–302
T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, B.-H. Juang, in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. Blind speech dereverberation with multi- channel linear prediction based on short time fourier transform representation (2008), pp. 85–88
Chapter
Google Scholar
T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, B.-H. Juang, Speech dereverberation based on variance- normalized delayed linear prediction. IEEE Trans. Audio Speech Lang. Process. 18(1717) (2010)
T. Yoshioka, T. Nakatani, M. Miyoshi, H.G. Okuno, Blind separation and dereverberation of speech mix- tures by joint optimization. IEEE Trans. Audio Speech Lang. Process. 19(69) (2010)
A. Jukić, N. Mohammadiha, T. van Waterschoot, T. Gerkmann, S. Doclo, in 2015 IEEE Inter- national Conference on Acoustics, Speech and Signal Processing (ICASSP). Multi-channel linear prediction-based speech dereverberation with low-rank power spectrogram approximation (2015), pp. 96–100
Chapter
Google Scholar
F. Weninger, S. Watanabe, Y. Tachioka, B. Schuller, in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Deep recurrent de-noising auto-encoder and blind de- reverberation for reverberated speech recognition (2014), pp. 4623–4627
Chapter
Google Scholar
D.S. Williamson, D. Wang, in 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). Speech dereverberation and denoising using complex ratio masks (2017), pp. 5590–5594
Chapter
Google Scholar
J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, T. Nakatani, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Joint optimization of neural network- based WPE dereverberation and acoustic model for robust online ASR (2019), pp. 6655–6659
Chapter
Google Scholar
K. Kinoshita, M. Delcroix, H. Kwon, T. Mori, T. Nakatani, in Interspeech. Neural network-based spectrum estimation for online wpe dereverberation (2017), pp. 384–388
Chapter
Google Scholar
M. Delcroix, T. Yoshioka, A. Ogawa, Y. Kubo, M. Fuji-moto, N. Ito, K. Kinoshita, M. Espi, S. Araki, T. Hori, et al., Strategies for distant speech recognitionin reverberant environments. EURASIP J. Adv. Signal Process. 2015, 1 (2015)
Article
Google Scholar
W. Yang, G. Huang, W. Zhang, J. Chen, J. Benesty, in 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC). Dereverberation with differential microphone arrays and the weighted-prediction-error method (2018), pp. 376–380
Google Scholar
M. Togami, in 2015 23rd European Signal Processing Conference (EUSIPCO). Multichannel online speech dereverberation under noisy environments (2015), pp. 1078–1082
Chapter
Google Scholar
L. Drude, C. Boeddeker, J. Heymann, R. Haeb-Umbach, K. Kinoshita, M. Delcroix, T. Nakatani, in Interspeech. Integrating neural network based beamforming and weighted pre- diction error dereverberation (2018), pp. 043–3047
Google Scholar
T. Nakatani, K. Kinoshita, A unified convolutional beamformer for simultaneous denoising and dereverberation. IEEE Signal Process. Lett. 26, 903 (2019)
Article
Google Scholar
G. Wichern, J. Antognini, M. Flynn, L.R. Zhu, E. Mc-Quinn, D. Crow, E. Manilow, J.L. Roux, Wham!: Extending speech separation to noisy environments, arXiv preprint arXiv:1907.01160 (2019)
Google Scholar
C. Ma, D. Li, X. Jia, Two-stage model and optimal si-snr for monaural multi-speaker speech separation in noisy environment, arXiv preprint arXiv:2004.06332 (2020)
Google Scholar
T. Yoshioka, Z. Chen, C. Liu, X. Xiao, H. Erdogan, D. Dimitriadis, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Low-latency speaker-independent continuous speech separation (2019), pp. 6980–6984
Chapter
Google Scholar
Z.-Q. Wang, D. Wang, in Interspeech. Integrating spectral and spatial features for multi-channel speaker separation (2018), pp. 2718–2722
Google Scholar
J. Wu, Z. Chen, J. Li, T. Yoshioka, Z. Tan, E. Lin, Y. Luo, L. Xie, An end-to-end architecture of online multi-channel speech separation, arXiv preprint arXiv:2009.03141 (2020)
Google Scholar
T. Nakatani, R. Takahashi, T. Ochiai, K. Kinoshita, R. Ikeshita, M. Delcroix, S. Araki, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). DNN-supported mask-based convolutional beamforming for simultaneous denoising, dereverberation, and source separation (2020), pp. 6399–6403
Chapter
Google Scholar
Y. Fu, J. Wu, Y. Hu, M. Xing, L. Xie, in 2021 IEEE Spoken Language Technology Workshop (SLT). DESNET: A multi-channel network for simultaneous speech dereverberation, enhancement and separation (2021), pp. 857–864
Chapter
Google Scholar
T. Ochiai, M. Delcroix, R. Ikeshita, K. Kinoshita, T. Nakatani, S. Araki, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Beam-Tasnet: Time-domain audio separation network meets frequency-domain beam- former (2020), pp. 6384–6388
Chapter
Google Scholar
J. Le Roux, S. Wisdom, H. Erdogan, J.R. Hershey, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). SDR–half-baked or well done? (2019), pp. 626–630
Chapter
Google Scholar
O. Ronneberger, P. Fischer, T. Brox, in International Conference on Medical image computing and computer-assisted intervention. U-net: Convolutional networks for biomedical image segmentation (Cham: Springer, 2015), pp. 234–241
O. Ernst, S.E. Chazan, S. Gannot, J. Goldberger, in 2018 26th European Signal Processing Conference (EUSIPCO). Speech dereverberation using fully convolutional networks (2018), pp. 390–394
Chapter
Google Scholar
V. Kothapally, W. Xia, S. Ghorbani, J.H. Hansen, W. Xue, J. Huang, Skipconvnet: Skip convolutional neural network for speech dereverberation using optimally smoothed spectral mapping, arXiv preprint arXiv:2007.09131 (2020)
Google Scholar
J. Yamagishi, C. Veaux, K. MacDonald, et al., Cstr vctk corpus: English multi-speaker corpus for cstr voice cloning toolkit (version 0.92) (2019).
Google Scholar
A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, in 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221). Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone net- works and codecs, vol 2 (2001), pp. 749–752
Chapter
Google Scholar
C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. A short-time objective intelligibility measure for time-frequency weighted noisy speech (2010), pp. 4214–4217
Chapter
Google Scholar
K. Kinoshita, M. Delcroix, T. Nakatani, M. Miyoshi, Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction. IEEE Trans. Audio Speech Lang. Process. 17, 534 (2009)
Article
Google Scholar
S.-I. Amari, A. Cichocki, H.H. Yang, et al., in Advances in neural information processing systems. A new learning algorithm for blind signal separation (1996), pp. 757–763
Google Scholar
S. Wold, K. Esbensen, P. Geladi, Principal component analysis. Chemom. Intell. Lab. Syst. 2, 37 (1987)
Article
Google Scholar
R. Gu, J. Wu, S. Zhang, L. Chen, Y. Xu, M. Yu, D. Su, Y. Zou, D. Yu, End-to-end multi-channel speech separation, arXiv preprint arXiv:1905.06286 (2019)
Google Scholar
Y. Zhao, Z.-Q. Wang, D. Wang, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). A two-stage algorithm for noisy and reverberant speech enhancement (2017), pp. 5580–5584
Chapter
Google Scholar
F. Chollet, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Xception: Deep learning with depthwise separable convolutions (2017)
Google Scholar
K. He, X. Zhang, S. Ren, J. Sun, in Proceedings of the IEEE international conference on computer vision. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification (2015), pp. 1026–1034
Google Scholar
Y. Bengio, J. Louradour, R. Collobert, J. Weston, in Proceedings of the 26th annual international conference on machine learning. Curriculum learning (2009), pp. 41–48
Google Scholar
D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014)
Google Scholar
F. Bahmaninezhad, J. Wu, R. Gu, S.-X. Zhang, Y. Xu, M. Yu, D. Yu, A comprehensive study of speech separation: spectrogram vs waveform separation, arXiv preprint arXiv:1905.07497 (2019)
Google Scholar
J.B. Allen, D.A. Berkley, Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65, 943 (1979)
Article
Google Scholar
N. Zeghidour, D. Grangier, Wavesplit: End-to-end speech separation by speaker clustering, arXiv preprint arXiv:2002.08933 (2020)
Google Scholar