Skip to main content

Articles

Page 1 of 9

  1. By means of spatial clustering and time-frequency masking, a mixture of multiple speakers and noise can be separated into the underlying signal components. The parameters of a model, such as a complex angular ...

    Authors: Alexander Bohlender, Lucas Van Severen, Jonathan Sterckx and Nilesh Madhu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:16
  2. To improve the sound quality of hearing devices, equalization filters can be used to achieve acoustic transparency, i.e., listening with the device in the ear is perceptually similar to the open ear. The equal...

    Authors: Henning Schepker, Florian Denk, Birger Kollmeier and Simon Doclo
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:15
  3. Subtitles are a crucial component of Digital Entertainment Content (DEC such as movies and TV shows) localization. With ever increasing catalog (≈ 2M titles) and localization expansion (30+ languages), automat...

    Authors: Honey Gupta and Mayank Sharma
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:14
  4. In lossless audio compression, the predictive residuals must remain sparse when entropy coding is applied. The sign algorithm (SA) is a conventional method for minimizing the magnitudes of residuals; however, ...

    Authors: Taiyo Mineo and Hayaru Shouno
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:12
  5. Multiple predominant instrument recognition in polyphonic music is addressed using decision level fusion of three transformer-based architectures on an ensemble of visual representations. The ensemble consists...

    Authors: Lekshmi Chandrika Reghunath and Rajeev Rajan
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:11
  6. The domain of spatial audio comprises methods for capturing, processing, and reproducing audio content that contains spatial information. Data-based methods are those that operate directly on the spatial infor...

    Authors: Maximo Cobos, Jens Ahrens, Konrad Kowalczyk and Archontis Politis
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:10
  7. Head-related transfer function (HRTF) individualization can improve the perception of binaural sound. The interaural time difference (ITD) of the HRTF is a relevant cue for sound localization, especially in az...

    Authors: Pablo Gutierrez-Parera, Jose J. Lopez, Javier M. Mora-Merchan and Diego F. Larios
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:9
  8. Humans can recognize someone’s identity through their voice and describe the timbral phenomena of voices. Likewise, the singing voice also has timbral phenomena. In vocal pedagogy, vocal teachers listen and th...

    Authors: Yanze Xu, Weiqing Wang, Huahua Cui, Mingyang Xu and Ming Li
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:8
  9. Polyphonic sound event detection aims to detect the types of sound events that occur in given audio clips, and their onset and offset times, in which multiple sound events may occur simultaneously. Deep learni...

    Authors: Haitao Li, Shuguo Yang and Wenwu Wang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:5
  10. In this study, we propose a methodology for separating a singing voice from musical accompaniment in a monaural musical mixture. The proposed method uses robust principal component analysis (RPCA), followed by...

    Authors: Wen-Hsing Lai and Siou-Lin Wang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:4
  11. One of the greatest challenges in the development of binaural machine audition systems is the disambiguation between front and back audio sources, particularly in complex spatial audio scenes. The goal of this...

    Authors: Sławomir K. Zieliński, Paweł Antoniuk, Hyunkook Lee and Dale Johnson
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:3
  12. Conventional automatic speech recognition (ASR) and emerging end-to-end (E2E) speech recognition have achieved promising results after being provided with sufficient resources. However, for low-resource langua...

    Authors: Siqing Qin, Longbiao Wang, Sheng Li, Jianwu Dang and Lixin Pan
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:2
  13. In this paper, we propose a novel algorithm for blind source extraction (BSE) of a moving acoustic source recorded by multiple microphones. The algorithm is based on independent vector extraction (IVE) where t...

    Authors: Jakub Janský, Zbyněk Koldovský, Jiří Málek, Tomáš Kounovský and Jaroslav Čmejla
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:1
  14. With the sharp booming of online live streaming platforms, some anchors seek profits and accumulate popularity by mixing inappropriate content into live programs. After being blacklisted, these anchors even fo...

    Authors: Jiacheng Yao, Jing Zhang, Jiafeng Li and Li Zhuo
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:45
  15. We present an unsupervised domain adaptation (UDA) method for a lip-reading model that is an image-based speech recognition model. Most of conventional UDA methods cannot be applied when the adaptation data co...

    Authors: Yuki Takashima, Ryoichi Takashima, Ryota Tsunoda, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki and Nobuaki Motoyama
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:44
  16. Deep learning techniques are currently being applied in automated text-to-speech (TTS) systems, resulting in significant improvements in performance. However, these methods require large amounts of text-speech...

    Authors: Zolzaya Byambadorj, Ryota Nishimura, Altangerel Ayush, Kengo Ohta and Norihide Kitaoka
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:42
  17. Voice conversion is to transform a source speaker to the target one, while keeping the linguistic content unchanged. Recently, one-shot voice conversion gradually becomes a hot topic for its potentially wide r...

    Authors: Fangkun Liu, Hui Wang, Renhua Peng, Chengshi Zheng and Xiaodong Li
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:40
  18. This paper presents a new dataset of measured multichannel room impulse responses (RIRs) named dEchorate. It includes annotations of early echo timings and 3D positions of microphones, real sources, and image ...

    Authors: Diego Di Carlo, Pinchas Tandeitnik, Cedrić Foy, Nancy Bertin, Antoine Deleforge and Sharon Gannot
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:39
  19. In this paper, a multichannel learning-based network is proposed for sound source separation in reverberant field. The network can be divided into two parts according to the training strategies. In the first s...

    Authors: You-Siang Chen, Zi-Jie Lin and Mingsian R. Bai
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:38
  20. High-quality rendering of spatial sound fields in real-time is becoming increasingly important with the steadily growing interest in virtual and augmented reality technologies. Typically, a spherical microphon...

    Authors: Johannes M. Arend, Tim Lübeck and Christoph Pörschmann
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:37
  21. Measurements of the directivity of acoustic sound sources must be interpolated in almost all cases, either for spatial upsampling to higher resolution representations of the data, for spatial resampling to ano...

    Authors: David Ackermann, Fabian Brinkmann, Franz Zotter, Malte Kob and Stefan Weinzierl
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:36
  22. The acoustic echo cannot be entirely removed by linear adaptive filters due to the nonlinear relationship between the echo and the far-end signal. Usually, a post-processing module is required to further suppr...

    Authors: Hongsheng Chen, Guoliang Chen, Kai Chen and Jing Lu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:35
  23. Code-switching (CS) refers to the phenomenon of using more than one language in an utterance, and it presents great challenge to automatic speech recognition (ASR) due to the code-switching property in one utt...

    Authors: Yanhua Long, Shuang Wei, Jie Lian and Yijie Li
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:34
  24. Many modern smart devices are equipped with a microphone array and a loudspeaker (or are able to connect to one). Acoustic echo cancellation algorithms, specifically their multi-microphone variants, are essent...

    Authors: Nili Cohen, Gershon Hazan, Boaz Schwartz and Sharon Gannot
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:33
  25. The minimum mean-square error (MMSE)-based noise PSD estimators have been used widely for speech enhancement. However, the MMSE noise PSD estimators assume that the noise signal changes at a slower rate than t...

    Authors: Sujan Kumar Roy and Kuldip K. Paliwal
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:32
  26. The performance of speech recognition systems trained with neutral utterances degrades significantly when these systems are tested with emotional speech. Since everybody can speak emotionally in the real-world...

    Authors: Masoud Geravanchizadeh, Elnaz Forouhandeh and Meysam Bashirpour
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:31
  27. If music is the language of the universe, musical note onsets may be the syllables for this language. Not only do note onsets define the temporal pattern of a musical piece, but their time-frequency characteri...

    Authors: Mina Mounir, Peter Karsmakers and Toon van Waterschoot
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:30
  28. To improve the performance of speech enhancement in a complex noise environment, a joint constrained dictionary learning method for single-channel speech enhancement is proposed, which solves the “cross projec...

    Authors: Linhui Sun, Yunyi Bu, Pingan Li and Zihao Wu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:29
  29. The last decade brought significant advances in automatic speech recognition (ASR) thanks to the evolution of deep learning methods. ASR systems evolved from pipeline-based systems, that modeled hand-crafted s...

    Authors: Alexandru-Lucian Georgescu, Alessandro Pappalardo, Horia Cucu and Michaela Blott
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:28
  30. Many end-to-end approaches have been proposed to detect predefined keywords. For scenarios of multi-keywords, there are still two bottlenecks that need to be resolved: (1) the distribution of important data th...

    Authors: Gui-Xin Shi, Wei-Qiang Zhang, Guan-Bo Wang, Jing Zhao, Shu-Zhou Chai and Ze-Yu Zhao
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:27
  31. Lately, the self-attention mechanism has marked a new milestone in the field of automatic speech recognition (ASR). Nevertheless, its performance is susceptible to environmental intrusions as the system predic...

    Authors: Lujun Li, Yikai Kang, Yuchen Shi, Ludwig Kürzinger, Tobias Watzel and Gerhard Rigoll
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:26
  32. Due to the ad hoc nature of wireless acoustic sensor networks, the position of the sensor nodes is typically unknown. This contribution proposes a technique to estimate the position and orientation of the sens...

    Authors: Tobias Gburrek, Joerg Schmalenstroeer and Reinhold Haeb-Umbach
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:25
  33. Estimating time-frequency domain masks for single-channel speech enhancement using deep learning methods has recently become a popular research field with promising results. In this paper, we propose a novel comp...

    Authors: Ziyi Xu, Samy Elshamy, Ziyue Zhao and Tim Fingscheidt
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:24
  34. Multiple sound source localization is a hot issue of concern in recent years. The Single Source Zone (SSZ) based localization methods achieve good performance due to the detection and utilization of the Time-F...

    Authors: Maoshen Jia, Shang Gao and Changchun Bao
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:23
  35. In this paper, we propose a novel feature compensation algorithm based on independent noise estimation, which employs a Gaussian mixture model (GMM) with fewer Gaussian components to rapidly estimate the noise...

    Authors: Yong Lü, Han Lin, Pingping Wu and Yitao Chen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:22
  36. When designing closed-loop electro-acoustic systems, which can commonly be found in hearing aids or public address systems, the most challenging task is canceling and/or suppressing the feedback caused by the ...

    Authors: Marco Gimm, Philipp Bulling and Gerhard Schmidt
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:21
  37. Recently, the non-intrusive speech quality assessment method has attracted a lot of attention since it does not require the original reference signals. At the same time, neural networks began to be applied to ...

    Authors: Miao Liu, Jing Wang, Weiming Yi and Fang Liu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:20
  38. Sound event detection (SED), which is typically treated as a supervised problem, aims at detecting types of sound events and corresponding temporal information. It requires to estimate onset and offset annotat...

    Authors: Sichen Liu, Feiran Yang, Yin Cao and Jun Yang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:19
  39. Amongst the various characteristics of a speech signal, the expression of emotion is one of the characteristics that exhibits the slowest temporal dynamics. Hence, a performant speech emotion recognition (SER)...

    Authors: Duowei Tang, Peter Kuppens, Luc Geurts and Toon van Waterschoot
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:18
  40. Deep learning-based speech enhancement algorithms have shown their powerful ability in removing both stationary and non-stationary noise components from noisy speech observations. But they often introduce arti...

    Authors: Yuxuan Ke, Andong Li, Chengshi Zheng, Renhua Peng and Xiaodong Li
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:17
  41. In this study, we present a deep neural network-based online multi-speaker localization algorithm based on a multi-microphone array. Following the W-disjoint orthogonality principle in the spectral domain, tim...

    Authors: Hodaya Hammer, Shlomo E. Chazan, Jacob Goldberger and Sharon Gannot
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:16
  42. An amendment to this paper has been published and can be accessed via the original article.

    Authors: Randall Ali, Toon van Waterschoot and Marc Moonen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:15

    The original article was published in EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:10

  43. Estimating the direction-of-arrival (DOA) of multiple acoustic sources is one of the key technologies for humanoid robots and drones. However, it is a most challenging problem due to a number of factors, inclu...

    Authors: Zonglong Bai, Liming Shi, Jesper Rindom Jensen, Jinwei Sun and Mads Græsbøll Christensen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:14
  44. Localization of multiple speakers using microphone arrays remains a challenging problem, especially in the presence of noise and reverberation. State-of-the-art localization algorithms generally exploit the sp...

    Authors: Sushmita Thakallapalli, Suryakanth V. Gangashetty and Nilesh Madhu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:13
  45. There has been little work in the literature on the speaker diarization of meetings with multiple distance microphones since the publications in 2012 related to the last National Institute of Standards (NIST) ...

    Authors: Beatriz Martínez-González, José M. Pardo, José A. Vallejo-Pinto, Rubén San-Segundo and Javier Ferreiros
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:12

Who reads the journal?

Learn more about the impact the EURASIP Journal on Audio, Speech, and Music Processing has worldwide

Annual Journal Metrics

Funding your APC

​​​​​​​Open access funding and policy support by SpringerOpen​​

​​​​We offer a free open access support service to make it easier for you to discover and apply for article-processing charge (APC) funding. Learn more here