Skip to main content

Articles

Page 1 of 11

  1. End-to-end speech to text translation aims to directly translate speech from one language into text in another, posing a challenging cross-modal task particularly in scenarios of limited data. Multi-task learn...

    Authors: Xin Feng, Yue Zhao, Wei Zong and Xiaona Xu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:36
  2. While deep learning technologies have made remarkable progress in generating deepfakes, their misuse has become a well-known concern. As a result, the ubiquitous usage of deepfakes for increasing false informa...

    Authors: Tahira Kanwal, Rabbia Mahum, Abdul Malik AlSalman, Mohamed Sharaf and Haseeb Hassan
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:35
  3. Polyphonic sound source localization and detection (SSLD) task aims to recognize the categories of sound events, identify their onset and offset times, and detect their corresponding direction-of-arrival (DOA)...

    Authors: Mengzhen Ma, Ying Hu, Liang He and Hao Huang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:34
  4. Dysarthria is a speech disorder that affects the ability to communicate due to articulation difficulties. This research proposes a novel method for automatic dysarthria detection (ADD) and automatic dysarthria...

    Authors: Shaik Sajiha, Kodali Radha, Dhulipalla Venkata Rao, Nammi Sneha, Suryanarayana Gunnam and Durga Prasad Bavirisetti
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:33
  5. This work introduces a large dataset comprising impulse responses of spatially distributed sources within a plane parallel to a planar microphone array. The dataset, named MIRACLE, encompasses 856,128 single-c...

    Authors: Adam Kujawski, Art J. R. Pelling and Ennes Sarradj
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:32
  6. A new method for estimating the first and second derivatives of discrete audio signals intended to achieve higher computational precision in analyzing the performance and characteristics of digital audio syste...

    Authors: Marcin Lewandowski
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:31
  7. Time signature detection is a fundamental task in music information retrieval, aiding in music organization. In recent years, the demand for robust and efficient methods in music analysis has amplified, unders...

    Authors: Jeremiah Abimbola, Daniel Kostrzewa and Pawel Kasprowski
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:30
  8. Limited data availability remains a significant challenge for Whisper’s low-resource speech recognition performance, falling short of practical application requirements. While previous studies have successfull...

    Authors: Yunpeng Liu, Xukui Yang and Dan Qu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:29
  9. In the era of advanced text-to-speech (TTS) systems capable of generating high-fidelity, human-like speech by referring a reference speech, voice cloning (VC), or zero-shot TTS (ZS-TTS), stands out as an impor...

    Authors: Zhiyong Chen, Zhiqi Ai, Youxuan Ma, Xinnuo Li and Shugong Xu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:28
  10. Selective attention is a crucial ability of the auditory system. Computationally, following an auditory object can be illustrated as tracking its acoustic properties, e.g., pitch, timbre, or location in space....

    Authors: Joanna Luberadzka, Hendrik Kayser, Jörg Lücke and Volker Hohmann
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:27
  11. This work studies neural modeling of nonlinear parametric audio circuits, focusing on how the diversity of settings of the target device user controls seen during training affects network generalization. To st...

    Authors: Otto Mikkonen, Alec Wright and Vesa Välimäki
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:26
  12. Visual speech recognition (VSR) is a challenging task that has received increasing interest during the last few decades. Current state of the art employs powerful end-to-end architectures based on deep learnin...

    Authors: David Gimeno-Gómez and Carlos-D. Martínez-Hinarejos
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:25
  13. This article introduces Mi-Go, a tool aimed at evaluating the performance and adaptability of general-purpose speech recognition machine learning models across diverse real-world scenarios. The tool leverages ...

    Authors: Tomasz Wojnar, Jarosław Hryszko and Adam Roman
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:24
  14. Dynamic parameterization of acoustic environments has drawn widespread attention in the field of audio processing. Precise representation of local room acoustic characteristics is crucial when designing audio ...

    Authors: Chunxi Wang, Maoshen Jia, Meiran Li, Changchun Bao and Wenyu Jin
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:23
  15. In robotics, echolocation has been used to detect acoustic reflectors, e.g., walls, as it aids the robotic platform to navigate in darkness and also helps detect transparent surfaces. However, the transfer fun...

    Authors: Usama Saqib, Mads Græsbøll Christensen and Jesper Rindom Jensen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:22
  16. Speech signals are often distorted by reverberation and noise, with a widely distributed signal-to-noise ratio (SNR). To address this, our study develops robust, deep neural network (DNN)-based speech enhancem...

    Authors: Zehua Zhang, Lu Zhang, Xuyi Zhuang, Yukun Qian and Mingjiang Wang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:20
  17. Technologies in healthcare, smart homes, security, ecology, and entertainment all deploy audio event detection (AED) in order to detect sound events in an audio recording. Effective AED techniques rely heavily...

    Authors: Sandeep Reddy Kothinti and Mounya Elhilali
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:19
  18. Spoofed speeches are becoming a big threat to society due to advancements in artificial intelligence techniques. Therefore, there must be an automated spoofing detector that can be integrated into automatic sp...

    Authors: Rabbia Mahum, Aun Irtaza, Ali Javed, Haitham A. Mahmoud and Haseeb Hassan
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:18

    The Correction to this article has been published in EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:21

  19. Most soundfield synthesis approaches deal with extensive and regular loudspeaker arrays, which are often not suitable for home audio systems, due to physical space constraints. In this article, we propose a te...

    Authors: Luca Comanducci, Fabio Antonacci and Augusto Sarti
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:17
  20. Audio augmented reality (AAR), a prominent topic in the field of audio, requires understanding the listening environment of the user for rendering an authentic virtual auditory object. Reverberation time (

    Authors: Shivam Saini, Isaac Engel and Jürgen Peissig
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:16
  21. The vast amount of information stored in audio repositories makes necessary the development of efficient and automatic methods to search on audio content. In that direction, search on speech (SoS) has received...

    Authors: Javier Tejedor and Doroteo T. Toledano
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:15
  22. Analyzing songs is a problem that is being investigated to aid various operations on music access platforms. At the beginning of these problems is the identification of the person who sings the song. In this s...

    Authors: Serhat Hizlisoy, Recep Sinan Arslan and Emel Çolakoğlu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:14
  23. Accurately representing the sound field with high spatial resolution is crucial for immersive and interactive sound field reproduction technology. In recent studies, there has been a notable emphasis on effici...

    Authors: Zining Liang, Wen Zhang and Thushara D. Abhayapala
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:13
  24. This work constitutes the first approach for automatically classifying the surface that the voiding flow impacts in non-invasive sound uroflowmetry tests using machine learning. Often, the voiding flow impacts...

    Authors: Marcos Lazaro Alvarez, Laura Arjona, Miguel E. Iglesias Martínez and Alfonso Bahillo
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:12
  25. Speech synthesis has made significant strides thanks to the transition from machine learning to deep learning models. Contemporary text-to-speech (TTS) models possess the capability to generate speech of excep...

    Authors: Huda Barakat, Oytun Turk and Cenk Demiroglu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:11
  26. Claimed identities of speakers can be verified by means of automatic speaker verification (ASV) systems, also known as voice biometric systems. Focusing on security and robustness against spoofing attacks on A...

    Authors: Priyanka Gupta, Hemant A. Patil and Rodrigo Capobianco Guido
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:10
  27. Audio effects are an ubiquitous tool in music production due to the interesting ways in which they can shape the sound of music. Guitar effects, the subset of all audio effects focusing on guitar signals, are ...

    Authors: Reemt Hinrichs, Kevin Gerkens, Alexander Lange and Jörn Ostermann
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:9
  28. Recent advancements in deep learning-based speech enhancement models have extensively used attention mechanisms to achieve state-of-the-art methods by demonstrating their effectiveness. This paper proposes a t...

    Authors: Sivaramakrishna Yecchuri and Sunny Dayal Vanambathina
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:8
  29. Chinese traditional music, a vital expression of Chinese cultural heritage, possesses both a profound emotional resonance and artistic allure. This study sets forth to refine and analyze the acoustical feature...

    Authors: Lingyun Xie, Yuehong Wang and Yan Gao
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:7
  30. Speech coding is a method to reduce the amount of data needs to represent speech signals by exploiting the statistical properties of the speech signal. Recently, in the speech coding process, a neural network ...

    Authors: Gebremichael Kibret Sheferaw, Waweru Mwangi, Michael Kimwele and Adane Mamuye
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:6
  31. Melody harmonization, which involves generating a chord progression that complements a user-provided melody, continues to pose a significant challenge. A chord progression must not only be in harmony with the ...

    Authors: Shangda Wu, Yue Yang, Zhaowen Wang, Xiaobing Li and Maosong Sun
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:4
  32. Musical instrument sound synthesis (MISS) often utilizes a text-to-speech framework because of its similarity to speech in terms of generating sounds from symbols. Moreover, a plucked string instrument, such a...

    Authors: Junya Koguchi and Masanori Morise
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:3
  33. Shouted and normal speech classification plays an important role in many speech-related applications. The existing works are often based on magnitude-based features and ignore phase-based features, which are d...

    Authors: Khomdet Phapatanaburi, Longbiao Wang, Meng Liu, Seiichi Nakagawa, Talit Jumphoo and Peerapong Uthansakul
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:2
  34. Acoustic scene classification (ASC) is the process of identifying the acoustic environment or scene from which an audio signal is recorded. In this work, we propose an encoder-decoder-based approach to ASC, wh...

    Authors: Yun-Fei Shao, Xin-Xin Ma, Yong Ma and Wei-Qiang Zhang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:1
  35. Acoustic sensing by multiple devices connected in a wireless acoustic sensor network (WASN) creates new opportunities for multichannel signal processing. However, the autonomy of agents in such a network still...

    Authors: Aleksej Chinaev, Niklas Knaepper and Gerald Enzner
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:55
  36. Target speaker separation aims to separate the speech components of the target speaker from mixed speech and remove extraneous components such as noise. In recent years, deep learning-based speech separation m...

    Authors: Jing Wang, Hanyue Liu, Liang Xu, Wenjing Yang, Weiming Yi and Fang Liu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:53
  37. The task of bandwidth extension addresses the generation of missing high frequencies of audio signals based on knowledge of the low-frequency part of the sound. This task applies to various problems, such as a...

    Authors: Pierre-Amaury Grumiaux and Mathieu Lagrange
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:51
  38. This study focuses on exploring the acoustic differences between synthesized Guzheng pieces and real Guzheng performances, with the aim of improving the quality of synthesized Guzheng music. A dataset with con...

    Authors: Huiwen Xue, Chenxin Sun, Mingcheng Tang, Chenrui Hu, Zhengqing Yuan, Min Huang and Zhongzhe Xiao
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:50
  39. Predominant source separation is the separation of one or more desired predominant signals, such as voice or leading instruments, from polyphonic music. The proposed work uses time-frequency filtering on predo...

    Authors: Lekshmi Chandrika Reghunath and Rajeev Rajan
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:49
  40. Speakers with dysarthria often struggle to accurately pronounce words and effectively communicate with others. Automatic speech recognition (ASR) is a powerful tool for extracting the content from speakers wit...

    Authors: Zhaopeng Qian, Kejing Xiao and Chongchong Yu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:48
  41. This article presents the research work on improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling. The speech recognition system is b...

    Authors: Kavya Manohar, Jayan A R and Rajeev Rajan
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:47
  42. Speaker embeddings, from the ECAPA-TDNN speaker verification network, were recently introduced as features for the task of clustering microphones in ad hoc arrays. Our previous work demonstrated that, in compa...

    Authors: Stijn Kindt, Jenthe Thienpondt, Luca Becker and Nilesh Madhu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:46

    The Correction to this article has been published in EURASIP Journal on Audio, Speech, and Music Processing 2024 2024:5

  43. Non-parallel data voice conversion (VC) has achieved considerable breakthroughs due to self-supervised pre-trained representation (SSPR) being used in recent years. Features extracted by the pre-trained model ...

    Authors: Hao Huang, Lin Wang, Jichen Yang, Ying Hu and Liang He
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:45
  44. Appropriate background music in e-commerce advertisements can help stimulate consumption and build product image. However, many factors like emotion and product category should be taken into account, which mak...

    Authors: Le Ma, Xinda Wu, Ruiyuan Tang, Chongjun Zhong and Kejun Zhang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:44
  45. Snoring affects 57 % of men, 40 % of women, and 27 % of children in the USA. Besides, snoring is highly correlated with obstructive sleep apnoea (OSA), which is characterised by loud and frequent snoring. OSA ...

    Authors: Jingtan Li, Mengkai Sun, Zhonghao Zhao, Xingcan Li, Gaigai Li, Chen Wu, Kun Qian, Bin Hu, Yoshiharu Yamamoto and Björn W. Schuller
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:43
  46. Unsupervised anomalous sound detection (ASD) aims to detect unknown anomalous sounds of devices when only normal sound data is available. The autoencoder (AE) and self-supervised learning based methods are two...

    Authors: Jian Guan, Youde Liu, Qiuqiang Kong, Feiyang Xiao, Qiaoxi Zhu, Jiantong Tian and Wenwu Wang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:42

Who reads the journal?

Learn more about the impact the EURASIP Journal on Audio, Speech, and Music Processing has worldwide

Annual Journal Metrics

  • Citation Impact 2023
    Journal Impact Factor: 1.7
    5-year Journal Impact Factor: 1.6
    Source Normalized Impact per Paper (SNIP): 1.051
    SCImago Journal Rank (SJR): 0.414

    Speed 2023
    Submission to first editorial decision (median days): 17
    Submission to acceptance (median days): 154

    Usage 2023
    Downloads: 368,607
    Altmetric mentions: 70

Funding your APC

​​​​​​​Open access funding and policy support by SpringerOpen​​

​​​​We offer a free open access support service to make it easier for you to discover and apply for article-processing charge (APC) funding. Learn more here