TY - JOUR AU - Campbell, J. P. PY - 1997 DA - 1997// TI - Speaker recognition: a tutorial JO - Proc. IEEE VL - 85 UR - https://doi.org/10.1109/5.628714 DO - 10.1109/5.628714 ID - Campbell1997 ER - TY - STD TI - D. A. Reynolds, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). An overview of automatic speaker recognition technology, (2002), pp. 4072–4075. ID - ref2 ER - TY - JOUR AU - Hansen, J. H. AU - Hasan, T. PY - 2015 DA - 2015// TI - Speaker recognition by machines and humans: a tutorial review JO - IEEE Signal Proc. Mag. VL - 32 UR - https://doi.org/10.1109/MSP.2015.2462851 DO - 10.1109/MSP.2015.2462851 ID - Hansen2015 ER - TY - JOUR AU - Dehak, N. AU - Kenny, P. J. AU - Dehak, R. AU - Dumouchel, P. AU - Ouellet, P. PY - 2011 DA - 2011// TI - Front-end factor analysis for speaker verification JO - IEEE Trans. Audio Speech Lang. Process. VL - 19 UR - https://doi.org/10.1109/TASL.2010.2064307 DO - 10.1109/TASL.2010.2064307 ID - Dehak2011 ER - TY - STD TI - E. Variani, X. Lei, E. McDermott, I. L. Moreno, J. Gonzalez-Dominguez, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Deep neural networks for small footprint text-dependent speaker verification, (2014), pp. 4052–4056. ID - ref5 ER - TY - STD TI - L. Li, Y. Chen, Y. Shi, Z. Tang, D. Wang, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Deep speaker feature learning for text-independent speaker verification, (2017), pp. 1542–1546. ID - ref6 ER - TY - STD TI - J. S. Chung, A. Nagrani, A. Zisserman, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). VoxCeleb2: deep speaker recognition, (2018), pp. 1086–1090. ID - ref7 ER - TY - STD TI - J. -W. Jung, H. S. Heo, J. -H. Kim, H. -J. Shim, H. -J. Yu, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). RawNet: advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification, (2019), pp. 1268–1272. ID - ref8 ER - TY - STD TI - K. Okabe, T. Koshinaka, K. Shinoda, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Attentive statistics pooling for deep speaker embedding, (2018), pp. 2252–2256. ID - ref9 ER - TY - STD TI - W. Cai, J. Chen, M. Li, in Proceedings of Odyssey: The Speaker and Language Recognition Workshop. Exploring the encoding layer and loss function in end-to-end speaker and language recognition system, (2018), pp. 74–81. ID - ref10 ER - TY - STD TI - W. Xie, A. Nagrani, J. S. Chung, A. Zisserman, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Utterance-level aggregation for speaker recognition in the wild, (2019), pp. 5791–5795. ID - ref11 ER - TY - STD TI - N. Chen, J. Villalba, N. Dehak, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Tied mixture of factor analyzers layer to combine frame level representations in neural speaker embeddings, (2019), pp. 2948–2952. ID - ref12 ER - TY - STD TI - L. Li, D. Wang, C. Xing, T. F. Zheng, in 10th International Symposium on Chinese Spoken Language Processing (ISCSLP). Max-margin metric learning for speaker recognition, (2016), pp. 1–4. ID - ref13 ER - TY - STD TI - W. Ding, L. He, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). MTGAN: speaker verification through multitasking triplet generative adversarial networks, (2018), pp. 3633–3637. ID - ref14 ER - TY - STD TI - J. Wang, K. -C. Wang, M. T. Law, F. Rudzicz, M. Brudno1, in International Conference on Acoustics, Speech and Signal Processing (ICASSP). Centroid-based deep metric learning for speaker recognition, (2019), pp. 3652–3656. ID - ref15 ER - TY - STD TI - Z. Bai, X. -L. Zhang, J. Chen, Partial AUC optimization based deep speaker embeddings with class-center learning for text-independent speaker verification. Int. Conf. Acoust. Speech Signal Process. (ICASSP), 6819–6823 (2020). ID - ref16 ER - TY - STD TI - Z. Gao, Y. Song, I. McLoughlin, P. Li, Y. Jiang, L. -R. Dai, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Improving aggregation and loss function for better embedding learning in end-to-end speaker verification system, (2019), pp. 361–365. ID - ref17 ER - TY - STD TI - J. Zhou, T. Jiang, Z. Li, L. Li, Q. Hong, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Deep speaker embedding extraction with channel-wise feature responses and additive supervision softmax loss function, (2019), pp. 2883–2887. ID - ref18 ER - TY - STD TI - R. Li, N. L. D. Tuo, M. Yu, D. Su, D. Yu, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Boundary discriminative large margin cosine loss for text-independent speaker verificationIEEE, (2019), pp. 6321–6325. ID - ref19 ER - TY - STD TI - S. Wang, J. Rohdin, L. Burget, O. Plchot, Y. Qian, K. Yu, J. Cernocky, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). On the usage of phonetic information for text-independent speaker embedding extraction, (2019), pp. 1148–1152. ID - ref20 ER - TY - STD TI - T. Stafylakis, J. Rohdin, O. Plchot, P. Mizera, L. Burget, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Self-supervised speaker embeddings, (2019), pp. 2863–2867. ID - ref21 ER - TY - STD TI - S. O. Sadjadi, C. Greenberg, E. Singer, D. Reynolds, L. Mason, J. Hernandez-Cordero, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). The 2018 NIST Speaker Recognition Evaluation, (2019), pp. 1483–1487. ID - ref22 ER - TY - STD TI - D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, S. Khudanpur, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). X-vectors: robust DNN embeddings for speaker recognitionIEEE, (2018), pp. 5329–5333. ID - ref23 ER - TY - STD TI - S. Ioffe, in European Conference on Computer Vision (ECCV). Probabilistic linear discriminant analysisSpringer, (2006), pp. 531–542. ID - ref24 ER - TY - STD TI - S. J. Prince, J. H. Elder, in 2007 IEEE 11th International Conference on Computer Vision. Probabilistic linear discriminant analysis for inferences about identityIEEE, (2007), pp. 1–8. ID - ref25 ER - TY - JOUR AU - Reynolds, D. A. AU - Quatieri, T. F. AU - Dunn, R. B. PY - 2000 DA - 2000// TI - Speaker verification using adapted Gaussian mixture models JO - Digit. Signal Proc. VL - 10 UR - https://doi.org/10.1006/dspr.1999.0361 DO - 10.1006/dspr.1999.0361 ID - Reynolds2000 ER - TY - STD TI - B. J. Borgström, A. McCree, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Discriminatively trained Bayesian speaker comparison of i-vectorsIEEE, (2013), pp. 7659–7662. ID - ref27 ER - TY - STD TI - A. McCree, G. Sell, D. Garcia-Romero, in INTERSPEECH. Extended variability modeling and unsupervised adaptation for PLDA speaker recognition, (2017), pp. 1552–1556. ID - ref28 ER - TY - STD TI - C. M. Bishop, Pattern recognition and machine learning (Springer, 2006). https://www.springer.com/gp/book/9780387310732. UR - https://www.springer.com/gp/book/9780387310732 ID - ref29 ER - TY - STD TI - A. Blum, J. Hopcroft, R. Kannan, Foundations of data science (Cambridge University Press, 2015). http://www.cs.cornell.edu/jeh/bookmay2015.pdf. UR - http://www.cs.cornell.edu/jeh/bookmay2015.pdf ID - ref30 ER - TY - STD TI - D. Garcia-Romero, C. Y. Espy-Wilson, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Analysis of i-vector length normalization in speaker recognition systems, (2011), pp. 249–252. ID - ref31 ER - TY - STD TI - W. Rudin, Real and complex analysis, 3rd Ed (McGraw-Hill, 1986). https://www.amazon.com/Real-Complex-Analysis-Higher-Mathematics/dp/0070542341. UR - https://www.amazon.com/Real-Complex-Analysis-Higher-Mathematics/dp/0070542341 ID - ref32 ER - TY - STD TI - G. Salton, Automatic text processing: The transformation, analysis, and retrieval of information by computer (Addison-Wesley, 1989). https://books.google.co.jp/books/about/Automatic_Text_Processing.html?id=wb8SAQAAMAAJ&redir_esc=y. ID - ref33 ER - TY - STD TI - G. G. Chowdhury, Introduction to modern information retrieval, 3rd Ed (Neal-Schuman Publishers, 2010). https://www.amazon.com/Introduction-Modern-Information-Retrieval-3rd/dp/1555707157. UR - https://www.amazon.com/Introduction-Modern-Information-Retrieval-3rd/dp/1555707157 ID - ref34 ER - TY - STD TI - C. Xing, D. Wang, C. Liu, Y. Lin, in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Normalized word embedding and orthogonal transform for bilingual word translation, (2015), pp. 1006–1011. ID - ref35 ER - TY - STD TI - N. Dehak, R. Dehak, P. Kenny, N. Brümmer, P. Ouellet, P. Dumouchel, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, (2009), pp. 1559–1562. ID - ref36 ER - TY - STD TI - P. Kenny, in Proceedings of Odyssey: The Speaker and Language Recognition Workshop. Bayesian speaker verification with heavy-tailed priors, (2010), pp. 14–14. https://www.isca-speech.org/archive_open/odyssey_2010/od10_014.html. UR - https://www.isca-speech.org/archive_open/odyssey_2010/od10_014.html ID - ref37 ER - TY - STD TI - S. Sra, Directional statistics in machine learning: a brief review. Appl. Directional Stat. Mod. Methods Case Stud., 225 (2018). ID - ref38 ER - TY - STD TI - K. V. Mardia, P. E. Jupp, Directional statistics (John Wiley & Sons, Inc, 2009). https://onlinelibrary.wiley.com/doi/book/10.1002/9780470316979. UR - https://onlinelibrary.wiley.com/doi/book/10.1002/9780470316979 ID - ref39 ER - TY - STD TI - A. Nagrani, J. S. Chung, A. Zisserman, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). VoxCeleb: a large-scale speaker identification dataset, (2017), pp. 2616–2620. ID - ref40 ER - TY - STD TI - D. Snyder, G. Chen, D. Povey, Musan: a music, speech, and noise corpus. arXiv preprint arXiv:1510.08484 (2015). http://arxiv.org/abs/1510.08484. ID - ref41 ER - TY - STD TI - T. Ko, V. Peddinti, D. Povey, M. L. Seltzer, S. Khudanpur, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). A study on data augmentation of reverberant speech for robust speech recognitionIEEE, (2017), pp. 5220–5224. ID - ref42 ER - TY - STD TI - D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, et al., in IEEE workshop on automatic speech recognition and understanding. The Kaldi speech recognition toolkit, (2011). https://infoscience.epfl.ch/record/192584. UR - https://infoscience.epfl.ch/record/192584 ID - ref43 ER - TY - JOUR AU - Lyu, S. AU - Simoncelli, E. P. PY - 2009 DA - 2009// TI - Nonlinear extraction of independent components of natural images using radial gaussianization JO - Neural Comput. VL - 21 UR - https://doi.org/10.1162/neco.2009.04-08-773 DO - 10.1162/neco.2009.04-08-773 ID - Lyu2009 ER - TY - STD TI - L. Li, Z. Tang, D. Wang, T. F. Zheng, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Full-info training for deep speaker feature learningIEEE, (2018), pp. 5369–5373. ID - ref45 ER - TY - STD TI - L. Li, Z. Tang, Y. Shi, D. Wang, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Gaussian-constrained training for speaker verificationIEEE, (2019), pp. 6036–6040. ID - ref46 ER - TY - STD TI - Y. Zhang, L. Li, D. Wang, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). VAE-based regularization for deep speaker embedding, (2019), pp. 4020–4024. ID - ref47 ER - TY - STD TI - X. Wang, L. Li, D. Wang, in Proceedings of APSIPA ASC. VAE-based Domain Adaptation for Speaker Verification, (2019), pp. 535–539. ID - ref48 ER - TY - STD TI - Y. Tu, M. -W. Mak, J. -T. Chien, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Variational domain adversarial learning for speaker verification, (2019), pp. 4315–4319. ID - ref49 ER - TY - STD TI - Y. Cai, L. Li, D. Wang, A. Abel, Deep normalization for speaker vectors. arXiv:2004.04095 (2020). https://arxiv.org/pdf/2004.04095.pdf. UR - https://arxiv.org/pdf/2004.04095.pdf ID - ref50 ER - TY - STD TI - L. Dinh, D. Krueger, Y. Bengio, in ICLR Workshop. NICE: Non-linear independent components estimation, (2015). https://iclr.cc/archive/www/doku.php%3Fid=iclr2015:main.html. UR - https://iclr.cc/archive/www/doku.php%3Fid=iclr2015:main.html ID - ref51 ER - TY - STD TI - L. Dinh, J. Sohl-Dickstein, S. Bengio, in Neural Information Processing Systems - Deep Learning Symposium. Density estimation using real NVP, (2016). ID - ref52 ER - TY - STD TI - D. P. Kingma, P. Dhariwal, in Advances in Neural Information Processing Systems (NIPS). Glow: generative flow with invertible 1x1 convolutions, (2018), pp. 10215–10224. ID - ref53 ER - TY - STD TI - Y. Fan, J. Kang, L. Li, K. Li, H. Chen, S. Cheng, P. Zhang, Z. Zhou, Y. Cai, D. Wang, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). CN-CELEB: a challenging Chinese speaker recognition dataset, (2020), pp. 7604–7608. ID - ref54 ER - TY - STD TI - H. Aronowitz, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Inter dataset variability compensation for speaker recognition, (2014), pp. 4002–4006. ID - ref55 ER - TY - STD TI - J. Villalba, E. Lleida, in Proceedings of Odyssey: The Speaker and Language Recognition Workshop. Bayesian adaptation of PLDA based speaker recognition to domains with scarce development data, (2012), pp. 47–54. ID - ref56 ER - TY - STD TI - D. Garcia-Romero, A. McCree, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Supervised domain adaptation for i-vector based speaker recognition, (2014), pp. 4047–4051. ID - ref57 ER - TY - STD TI - D. Garcia-Romero, A. McCree, S. Shum, C. Vaquero, in Proceedings of Odyssey: The Speaker and Language Recognition Workshop. Unsupervised domain adaptation for i-vector speaker recognition, (2014), pp. 260–264. ID - ref58 ER - TY - STD TI - H. Aronowitz, in Proceedings of Odyssey: The Speaker and Language Recognition Workshop. Compensating inter-dataset variability in PLDA hyper-parameters for robust speaker recognition, (2014), pp. 280–286. ID - ref59 ER - TY - STD TI - A. Kanagasundaram, D. Dean, S. Sridharan, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Improving out-domain PLDA speaker verification using unsupervised inter-dataset variability compensation approach, (2015), pp. 4654–4658. ID - ref60 ER - TY - JOUR AU - Rahman, M. H. AU - Kanagasundaram, A. AU - Himawan, I. AU - Dean, D. AU - Sridharan, S. PY - 2018 DA - 2018// TI - Improving PLDA speaker verification performance using domain mismatch compensation techniques JO - Comput. Speech Lang. VL - 47 UR - https://doi.org/10.1016/j.csl.2017.08.001 DO - 10.1016/j.csl.2017.08.001 ID - Rahman2018 ER - TY - STD TI - S. Shon, S. Mun, W. Kim, H. Ko, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). Autoencoder based domain adaptation for speaker recognition under insufficient channel information, (2017), pp. 1014–1018. ID - ref62 ER - TY - STD TI - Q. Wang, W. Rao, S. Sun, L. Xie, E. S. Chng, H. Li, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Unsupervised domain adaptation via domain adversarial training for speaker recognition, (2018), pp. 4889–4893. ID - ref63 ER - TY - STD TI - L. Li, D. Wang, T. F. Zheng, in Interspeech 2020. Neural discriminant analysis for speaker recognition, (2020). ID - ref64 ER - TY - STD TI - G. Heigold, I. Moreno, S. Bengio, N. Shazeer, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). End-to-end text-dependent speaker verificationIEEE, (2016), pp. 5115–5119. ID - ref65 ER - TY - STD TI - S. -X. Zhang, Z. Chen, Y. Zhao, J. Li, Y. Gong, in Spoken Language Technology Workshop (SLT). End-to-end attention based text-dependent speaker verificationIEEE, (2016), pp. 171–178. ID - ref66 ER - TY - STD TI - F. R. rahman Chowdhury, Q. Wang, I. L. Moreno, L. Wan, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Attention-based models for text-dependent speaker verification, (2018), pp. 5359–5363. ID - ref67 ER - TY - STD TI - L. Burget, O. Plchot, S. Cumani, O. Glembek, P. Matějka, N. Brümmer, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Discriminatively trained probabilistic linear discriminant analysis for speaker verificationIEEE, (2011), pp. 4832–4835. ID - ref68 ER - TY - STD TI - D. A. Van Leeuwen, N. Br, in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). The distribution of calibrated likelihood-ratios in speaker recognition, (2013), pp. 1619–1623. ID - ref69 ER - TY - STD TI - S. Cumani, P. Laface, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Tied normal variance - mean mixtures for linear score calibration, (2019), pp. 6121–6125. ID - ref70 ER -