Skip to main content

Articles

Page 1 of 9

  1. The acoustic echo cannot be entirely removed by linear adaptive filters due to the nonlinear relationship between the echo and the far-end signal. Usually, a post-processing module is required to further suppr...

    Authors: Hongsheng Chen, Guoliang Chen, Kai Chen and Jing Lu

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:35

    Content type: Research

    Published on:

  2. Code-switching (CS) refers to the phenomenon of using more than one language in an utterance, and it presents great challenge to automatic speech recognition (ASR) due to the code-switching property in one utt...

    Authors: Yanhua Long, Shuang Wei, Jie Lian and Yijie Li

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:34

    Content type: Research

    Published on:

  3. Many modern smart devices are equipped with a microphone array and a loudspeaker (or are able to connect to one). Acoustic echo cancellation algorithms, specifically their multi-microphone variants, are essent...

    Authors: Nili Cohen, Gershon Hazan, Boaz Schwartz and Sharon Gannot

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:33

    Content type: Research

    Published on:

  4. The minimum mean-square error (MMSE)-based noise PSD estimators have been used widely for speech enhancement. However, the MMSE noise PSD estimators assume that the noise signal changes at a slower rate than t...

    Authors: Sujan Kumar Roy and Kuldip K. Paliwal

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:32

    Content type: Research

    Published on:

  5. The performance of speech recognition systems trained with neutral utterances degrades significantly when these systems are tested with emotional speech. Since everybody can speak emotionally in the real-world...

    Authors: Masoud Geravanchizadeh, Elnaz Forouhandeh and Meysam Bashirpour

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:31

    Content type: Research

    Published on:

  6. If music is the language of the universe, musical note onsets may be the syllables for this language. Not only do note onsets define the temporal pattern of a musical piece, but their time-frequency characteri...

    Authors: Mina Mounir, Peter Karsmakers and Toon van Waterschoot

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:30

    Content type: Research

    Published on:

  7. To improve the performance of speech enhancement in a complex noise environment, a joint constrained dictionary learning method for single-channel speech enhancement is proposed, which solves the “cross projec...

    Authors: Linhui Sun, Yunyi Bu, Pingan Li and Zihao Wu

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:29

    Content type: Research

    Published on:

  8. The last decade brought significant advances in automatic speech recognition (ASR) thanks to the evolution of deep learning methods. ASR systems evolved from pipeline-based systems, that modeled hand-crafted s...

    Authors: Alexandru-Lucian Georgescu, Alessandro Pappalardo, Horia Cucu and Michaela Blott

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:28

    Content type: Review

    Published on:

  9. Many end-to-end approaches have been proposed to detect predefined keywords. For scenarios of multi-keywords, there are still two bottlenecks that need to be resolved: (1) the distribution of important data th...

    Authors: Gui-Xin Shi, Wei-Qiang Zhang, Guan-Bo Wang, Jing Zhao, Shu-Zhou Chai and Ze-Yu Zhao

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:27

    Content type: Research

    Published on:

  10. Lately, the self-attention mechanism has marked a new milestone in the field of automatic speech recognition (ASR). Nevertheless, its performance is susceptible to environmental intrusions as the system predic...

    Authors: Lujun Li, Yikai Kang, Yuchen Shi, Ludwig Kürzinger, Tobias Watzel and Gerhard Rigoll

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:26

    Content type: Research

    Published on:

  11. Due to the ad hoc nature of wireless acoustic sensor networks, the position of the sensor nodes is typically unknown. This contribution proposes a technique to estimate the position and orientation of the sens...

    Authors: Tobias Gburrek, Joerg Schmalenstroeer and Reinhold Haeb-Umbach

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:25

    Content type: Methodology

    Published on:

  12. Estimating time-frequency domain masks for single-channel speech enhancement using deep learning methods has recently become a popular research field with promising results. In this paper, we propose a novel comp...

    Authors: Ziyi Xu, Samy Elshamy, Ziyue Zhao and Tim Fingscheidt

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:24

    Content type: Research

    Published on:

  13. Multiple sound source localization is a hot issue of concern in recent years. The Single Source Zone (SSZ) based localization methods achieve good performance due to the detection and utilization of the Time-F...

    Authors: Maoshen Jia, Shang Gao and Changchun Bao

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:23

    Content type: Research

    Published on:

  14. In this paper, we propose a novel feature compensation algorithm based on independent noise estimation, which employs a Gaussian mixture model (GMM) with fewer Gaussian components to rapidly estimate the noise...

    Authors: Yong Lü, Han Lin, Pingping Wu and Yitao Chen

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:22

    Content type: Research

    Published on:

  15. When designing closed-loop electro-acoustic systems, which can commonly be found in hearing aids or public address systems, the most challenging task is canceling and/or suppressing the feedback caused by the ...

    Authors: Marco Gimm, Philipp Bulling and Gerhard Schmidt

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:21

    Content type: Research

    Published on:

  16. Recently, the non-intrusive speech quality assessment method has attracted a lot of attention since it does not require the original reference signals. At the same time, neural networks began to be applied to ...

    Authors: Miao Liu, Jing Wang, Weiming Yi and Fang Liu

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:20

    Content type: Research

    Published on:

  17. Sound event detection (SED), which is typically treated as a supervised problem, aims at detecting types of sound events and corresponding temporal information. It requires to estimate onset and offset annotat...

    Authors: Sichen Liu, Feiran Yang, Yin Cao and Jun Yang

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:19

    Content type: Research

    Published on:

  18. Amongst the various characteristics of a speech signal, the expression of emotion is one of the characteristics that exhibits the slowest temporal dynamics. Hence, a performant speech emotion recognition (SER)...

    Authors: Duowei Tang, Peter Kuppens, Luc Geurts and Toon van Waterschoot

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:18

    Content type: Research

    Published on:

  19. Deep learning-based speech enhancement algorithms have shown their powerful ability in removing both stationary and non-stationary noise components from noisy speech observations. But they often introduce arti...

    Authors: Yuxuan Ke, Andong Li, Chengshi Zheng, Renhua Peng and Xiaodong Li

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:17

    Content type: Research

    Published on:

  20. In this study, we present a deep neural network-based online multi-speaker localization algorithm based on a multi-microphone array. Following the W-disjoint orthogonality principle in the spectral domain, tim...

    Authors: Hodaya Hammer, Shlomo E. Chazan, Jacob Goldberger and Sharon Gannot

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:16

    Content type: Research

    Published on:

  21. An amendment to this paper has been published and can be accessed via the original article.

    Authors: Randall Ali, Toon van Waterschoot and Marc Moonen

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:15

    Content type: Correction

    Published on:

    The original article was published in EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:10

  22. Estimating the direction-of-arrival (DOA) of multiple acoustic sources is one of the key technologies for humanoid robots and drones. However, it is a most challenging problem due to a number of factors, inclu...

    Authors: Zonglong Bai, Liming Shi, Jesper Rindom Jensen, Jinwei Sun and Mads Græsbøll Christensen

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:14

    Content type: Research

    Published on:

  23. Localization of multiple speakers using microphone arrays remains a challenging problem, especially in the presence of noise and reverberation. State-of-the-art localization algorithms generally exploit the sp...

    Authors: Sushmita Thakallapalli, Suryakanth V. Gangashetty and Nilesh Madhu

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:13

    Content type: Research

    Published on:

  24. There has been little work in the literature on the speaker diarization of meetings with multiple distance microphones since the publications in 2012 related to the last National Institute of Standards (NIST) ...

    Authors: Beatriz Martínez-González, José M. Pardo, José A. Vallejo-Pinto, Rubén San-Segundo and Javier Ferreiros

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:12

    Content type: Research

    Published on:

  25. Nowadays automatic speech recognition (ASR) systems can achieve higher and higher accuracy rates depending on the methodology applied and datasets used. The rate decreases significantly when the ASR system is ...

    Authors: Kacper Radzikowski, Le Wang, Osamu Yoshie and Robert Nowak

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:11

    Content type: Research

    Published on:

  26. An integrated version of the minimum variance distortionless response (MVDR) beamformer for speech enhancement using a microphone array has been recently developed, which merges the benefits of imposing constr...

    Authors: Randall Ali, Toon van Waterschoot and Marc Moonen

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:10

    Content type: Research

    Published on:

    The Correction to this article has been published in EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:15

  27. The presence of degradations in speech signals, which causes acoustic mismatch between training and operating conditions, deteriorates the performance of many speech-based systems. A variety of enhancement tec...

    Authors: Yuki Saishu, Amir Hossein Poorjam and Mads Græsbøll Christensen

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:9

    Content type: Research

    Published on:

  28. This paper reviews recent research works in infant cry signal analysis and classification tasks. A broad range of literatures are reviewed mainly from the aspects of data acquisition, cross domain signal proce...

    Authors: Chunyan Ji, Thosini Bamunu Mudiyanselage, Yutong Gao and Yi Pan

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:8

    Content type: Review

    Published on:

  29. Over the recent years, machine learning techniques have been employed to produce state-of-the-art results in several audio related tasks. The success of these approaches has been largely due to access to large...

    Authors: Rajat Hebbar, Pavlos Papadopoulos, Ramon Reyes, Alexander F. Danvers, Angelina J. Polsinelli, Suzanne A. Moseley, David A. Sbarra, Matthias R. Mehl and Shrikanth Narayanan

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:7

    Content type: Research

    Published on:

  30. We propose an algorithm for the blind separation of single-channel audio signals. It is based on a parametric model that describes the spectral properties of the sounds of musical instruments independently of ...

    Authors: Sören Schulze and Emily J. King

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:6

    Content type: Research

    Published on:

  31. Two novel methods for speaker separation of multi-microphone recordings that can also detect speakers with infrequent activity are presented. The proposed methods are based on a statistical model of the probab...

    Authors: Bracha Laufer-Goldshtein, Ronen Talmon and Sharon Gannot

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:5

    Content type: Research

    Published on:

  32. We propose a method of dynamically registering out-of-vocabulary (OOV) words by assigning the pronunciations of these words to pre-inserted OOV tokens, editing the pronunciations of the tokens. To do this, we ...

    Authors: Norihide Kitaoka, Bohan Chen and Yuya Obashi

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:4

    Content type: Research

    Published on:

  33. Instrumentalplaying techniques such as vibratos, glissandos, and trills often denote musical expressivity, both in classical and folk contexts. However, most existing approaches to music similarity retrieval f...

    Authors: Vincent Lostanlen, Christian El-Hajj, Mathias Rossignol, Grégoire Lafay, Joakim Andén and Mathieu Lagrange

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:3

    Content type: Research

    Published on:

  34. In this paper, a study addressing the task of tracking multiple concurrent speakers in reverberant conditions is presented. Since both past and future observations can contribute to the current location estima...

    Authors: Yuval Dorfan, Boaz Schwartz and Sharon Gannot

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:2

    Content type: Research

    Published on:

  35. The progressive paradigm is a promising strategy to optimize network performance for speech enhancement purposes. Recent works have shown different strategies to improve the accuracy of speech enhancement solu...

    Authors: Jorge Llombart, Dayana Ribas, Antonio Miguel, Luis Vicente, Alfonso Ortega and Eduardo Lleida

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:1

    Content type: Research

    Published on:

  36. In real applications, environmental effects such as additive noise and room reverberation lead to a mismatch between training and testing signals that substantially reduces the performance of far-field speaker...

    Authors: Masoud Geravanchizadeh and Sina Ghalamiosgouei

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:20

    Content type: Research

    Published on:

  37. In this paper, we investigate the performance of two deep learning paradigms for the audio-based tasks of acoustic scene, environmental sound and domestic activity classification. In particular, a convolutiona...

    Authors: Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Lukas Stappen, Alice Baird, Lukas Koebe and Björn Schuller

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:19

    Content type: Research

    Published on:

  38. In this article, we conduct a comprehensive simulation study for the optimal scores of speaker recognition systems that are based on speaker embedding. For that purpose, we first revisit the optimal scores for...

    Authors: Dong Wang

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:18

    Content type: Research

    Published on:

  39. Depression is a widespread mental health problem around the world with a significant burden on economies. Its early diagnosis and treatment are critical to reduce the costs and even save lives. One key aspect ...

    Authors: Cenk Demiroglu, Aslı Beşirli, Yasin Ozkanca and Selime Çelik

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:17

    Content type: Research

    Published on:

  40. Drone-embedded sound source localization (SSL) has interesting application perspective in challenging search and rescue scenarios due to bad lighting conditions or occlusions. However, the problem gets complic...

    Authors: Alif Bin Abdul Qayyum, K. M. Naimul Hassan, Adrita Anika, Md. Farhan Shadiq, Md Mushfiqur Rahman, Md. Tariqul Islam, Sheikh Asif Imran, Shahruk Hossain and Mohammad Ariful Haque

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:16

    Content type: Research

    Published on:

  41. Humanoid robots require to use microphone arrays to acquire speech signals from the human communication partner while suppressing noise, reverberation, and interferences. Unlike many other applications, microp...

    Authors: Gongping Huang, Jingdong Chen, Jacob Benesty, Israel Cohen and Xudong Zhao

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:15

    Content type: Research

    Published on:

  42. Microphone leakage or crosstalk is a common problem in multichannel close-talk audio recordings (e.g., meetings or live music performances), which occurs when a target signal does not only couple into its dedi...

    Authors: Patrick Meyer, Samy Elshamy and Tim Fingscheidt

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:14

    Content type: Research

    Published on:

  43. A method to locate sound sources using an audio recording system mounted on an unmanned aerial vehicle (UAV) is proposed. The method introduces extension algorithms to apply on top of a baseline approach, whic...

    Authors: Benjamin Yen and Yusuke Hioka

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:13

    Content type: Research

    Published on:

  44. Estimation problems like room geometry estimation and localization of acoustic reflectors are of great interest and importance in robot and drone audition. Several methods for tackling these problems exist, bu...

    Authors: Usama Saqib, Sharon Gannot and Jesper Rindom Jensen

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:12

    Content type: Research

    Published on:

  45. Ego-noise, i.e., the noise a robot causes by its own motions, significantly corrupts the microphone signal and severely impairs the robot’s capability to interact seamlessly with its environment. Therefore, su...

    Authors: Alexander Schmidt, Andreas Brendel, Thomas Haubner and Walter Kellermann

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:11

    Content type: Research

    Published on:

  46. A keyword spotting algorithm implemented on an embedded system using a depthwise separable convolutional neural network classifier is reported. The proposed system was derived from a high-complexity system wit...

    Authors: Peter Mølgaard Sørensen, Bastian Epp and Tobias May

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:10

    Content type: Research

    Published on:

  47. In this work, we present an ensemble for automated audio classification that fuses different types of features extracted from audio files. These features are evaluated, compared, and fused with the goal of pro...

    Authors: Loris Nanni, Yandre M. G. Costa, Rafael L. Aguiar, Rafael B. Mangolin, Sheryl Brahnam and Carlos N. Silla Jr.

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:8

    Content type: Research

    Published on:

  48. In this paper, we introduce a quadratic approach for single-channel noise reduction. The desired signal magnitude is estimated by applying a linear filter to a modified version of the observations’ vector. The...

    Authors: Gal Itzhak, Jacob Benesty and Israel Cohen

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:7

    Content type: Research

    Published on:

  49. In order to improve the performance of hand-crafted features to detect playback speech, two discriminative features, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients,...

    Authors: Jichen Yang, Longting Xu, Bo Ren and Yunyun Ji

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:6

    Content type: Research

    Published on:

Latest Tweets

Your browser needs to have JavaScript enabled to view this timeline

Who reads the journal?

Learn more about the impact the EURASIP Journal on Audio, Speech, and Music Processing has worldwide

Annual Journal Metrics

Funding your APC

​​​​​​​Open access funding and policy support by SpringerOpen​​

​​​​We offer a free open access support service to make it easier for you to discover and apply for article-processing charge (APC) funding. Learn more here