Skip to main content

Articles

Page 2 of 11

  1. The human auditory system employs a number of principles to facilitate the selection of perceptually separated streams from a complex sound mixture. The brain leverages multi-scale redundant representations of...

    Authors: Ashwin Bellur, Karan Thakkar and Mounya Elhilali
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:20
  2. Music inpainting is a sub-task of automated music generation that aims to infill incomplete musical pieces to help musicians in their musical composition process. Many methods have been developed for this task...

    Authors: Mauricio Araneda-Hernandez, Felipe Bravo-Marquez, Denis Parra and Rodrigo F. Cádiz
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:19
  3. A two-stage lightweight online dereverberation algorithm for hearing devices is presented in this paper. The approach combines a multi-channel multi-frame linear filter with a single-channel single-frame post-...

    Authors: Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning and Timo Gerkmann
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:18
  4. In the development of acoustic signal processing algorithms, their evaluation in various acoustic environments is of utmost importance. In order to advance evaluation in realistic and reproducible scenarios, s...

    Authors: Thomas Dietzen, Randall Ali, Maja Taseska and Toon van Waterschoot
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:17
  5. Voice activity detection remains a significant challenge in the presence of transients since transients are more dominant than speech, though it has achieved satisfactory performance in quasi-stationary noisy ...

    Authors: Xiao-Yuan Guo, Chun-Xian Gao and Hui Liu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:16
  6. With the rise of deep learning, spoken language understanding (SLU) for command-and-control applications such as a voice-controlled virtual assistant can offer reliable hands-free operation to physically disab...

    Authors: Pu Wang and Hugo Van hamme
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:15
  7. Spoken language recognition has made significant progress in recent years, for which automatic speech recognition has been used as a parallel branch to extract phonetic features. However, there is still a lack...

    Authors: Zimu Li, Yanyan Xu, Dengfeng Ke and Kaile Su
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:14
  8. We present a new dataset of 3000 artificial music tracks with rich annotations based on real instrument samples and generated by algorithmic composition with respect to music theory. Our collection provides gr...

    Authors: Fabian Ostermann, Igor Vatolkin and Martin Ebeling
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:13
  9. Electromagnetic components greatly contribute to the peculiar timbre of analog audio gear. Indeed, distortion effects due to the nonlinear behavior of magnetic materials are known to play an important role in ...

    Authors: Oliviero Massi, Alessandro Ilic Mezza, Riccardo Giampiccolo and Alberto Bernardini
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:12
  10. Most music listeners have an intuitive understanding of the notion of rhythm complexity. Musicologists and scientists, however, have long sought objective ways to measure and model such a distinctively percept...

    Authors: Alessandro Ilic Mezza, Massimiliano Zanoni and Augusto Sarti
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:11
  11. Speech feature model is the basis of speech and noise separation, speech expression, and different styles of speech conversion. With the development of signal processing methods, the feature types and dimensio...

    Authors: Xiaoping Xie, Yongzhen Chen, Rufeng Shen and Dan Tian
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:10
  12. Speech is the most common form of human communication, and many conversations use digital communication links. For efficient transmission, acoustic speech waveforms are usually converted to digital form, with ...

    Authors: Douglas O’Shaughnessy
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:8
  13. The paper uses the K-graphs learning method to construct weighted, connected, undirected multiple graphs, aiming to reveal intrinsic relationships of speech samples in the inter-frame and intra-frame. To benefit ...

    Authors: Tingting Wang, Haiyan Guo, Zirui Ge, Qiquan Zhang and Zhen Yang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:7
  14. Recently, supervised speech separation has made great progress. However, limited by the nature of supervised training, most existing separation methods require ground-truth sources and are trained on synthetic...

    Authors: Jiangyu Han and Yanhua Long
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:6
  15. The aim of this paper is to investigate the influence of personality traits, characterized by the BFI (Big Five Inventory) and its significant revision called BFI-2, on music recommendation error. The BFI-2 de...

    Authors: Mariusz Kleć, Alicja Wieczorkowska, Krzysztof Szklanny and Włodzimierz Strus
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:4
  16. SincNet architecture has shown significant benefits over traditional Convolutional Neural Networks (CNN), especially for speaker recognition applications. SincNet comprises parameterized Sinc functions as filt...

    Authors: Prashanth H C, Madhav Rao, Dhanya Eledath and Ramasubramanian V
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:3

    The Correction to this article has been published in EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:9

  17. Music source separation (MSS) is to isolate musical instrument signals from the given music mixture. Stripes widely exist in music spectrograms, which potentially indicate high-level music information. For exa...

    Authors: Jiale Qian, Xinlu Liu, Yi Yu and Wei Li
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:2
  18. The purpose of this paper is to show a music mixing system that is capable of automatically mixing separate raw recordings with good quality regardless of the music genre. This work recalls selected methods fo...

    Authors: Damian Koszewski, Thomas Görne, Grazina Korvel and Bozena Kostek
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:1
  19. For immersive applications, the generation of binaural sound that matches its visual counterpart is crucial to bring meaningful experiences to people in a virtual environment. Recent studies have shown the pos...

    Authors: Francesc Lluís, Vasileios Chatziioannou and Alex Hofmann
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:33
  20. Speech emotion recognition (SER) is a hot topic in speech signal processing. When the training data and the test data come from different corpus, their feature distributions are different, which leads to the d...

    Authors: Xuan Cao, Maoshen Jia, Jiawei Ru and Tun-wen Pai
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:32
  21. In recent years, there has been a national craze for metaverse concerts. However, existing meta-universe concert efforts often focus on immersive visual experiences and lack consideration of the musical and au...

    Authors: Cong Jin, Fengjuan Wu, Jing Wang, Yang Liu, Zixuan Guan and Zhe Han
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:31
  22. Headphones are commonly used in various environments including at home, outside and on public transport. However, the perception and modelling of the interaction of headphone audio and noisy environments is re...

    Authors: Milap Rane, Philip Coleman, Russell Mason and Søren Bech
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:30
  23. In the task of sound event detection and localization (SEDL) in a complex environment, the acoustic signals of different events usually have nonlinear superposition, so the detection and localization effect is...

    Authors: Chaofeng Lan, Lei Zhang, Yuanyuan Zhang, Lirong Fu, Chao Sun, Yulan Han and Meng Zhang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:29
  24. Guitar effects are commonly used in popular music to shape the guitar sound to fit specific genres, or to create more variety within musical compositions. The sound not only is determined by the choice of the ...

    Authors: Reemt Hinrichs, Kevin Gerkens, Alexander Lange and Jörn Ostermann
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:28
  25. Voice activity detection (VAD) based on deep neural networks (DNN) have demonstrated good performance in adverse acoustic environments. Current DNN-based VAD optimizes a surrogate function, e.g., minimum cross...

    Authors: Xiao-Lei Zhang and Menglong Xu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:27
  26. Automated audio captioning is a cross-modal translation task that aims to generate natural language descriptions for given audio clips. This task has received increasing attention with the release of freely av...

    Authors: Xinhao Mei, Xubo Liu, Mark D. Plumbley and Wenwu Wang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:26
  27. Large-scale sound recognition data sets typically consist of acoustic recordings obtained from multimedia libraries. As a consequence, modalities other than audio can often be exploited to improve the outputs ...

    Authors: Wim Boes and Hugo Van hamme
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:25
  28. In this article, we adapted five recent SSL methods to the task of audio classification. The first two methods, namely Deep Co-Training (DCT) and Mean Teacher (MT), involve two collaborative neural networks. T...

    Authors: Léo Cances, Etienne Labbé and Thomas Pellegrini
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:23
  29. In this paper, we propose a supervised single-channel speech enhancement method that combines Kullback-Leibler (KL) divergence-based non-negative matrix factorization (NMF) and a hidden Markov model (NMF-HMM)....

    Authors: Yang Xiang, Liming Shi, Jesper Lisby Højvang, Morten Højfeldt Rasmussen and Mads Græsbøll Christensen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:22
  30. Automatic speech and music activity detection (SMAD) is an enabling task that can help segment, index, and pre-process audio content in radio broadcast and TV programs. However, due to copyright concerns and t...

    Authors: Yun-Ning Hung, Chih-Wei Wu, Iroro Orife, Aaron Hipple, William Wolcott and Alexander Lerch
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:21
  31. Speech emotion recognition is a key branch of affective computing. Nowadays, it is common to detect emotional diseases through speech emotion recognition. Various detection methods of emotion recognition, such...

    Authors: Jinxing Gao, Diqun Yan and Mingyu Dong
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:20
  32. Most state-of-the-art speech systems use deep neural networks (DNNs). These systems require a large amount of data to be learned. Hence, training state-of-the-art frameworks on under-resourced speech challenge...

    Authors: Vincent Roger, Jérôme Farinas and Julien Pinquier
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:19
  33. PlugSonic is a series of web- and mobile-based applications designed to edit samples and apply audio effects (PlugSonic Sample) and create and experience dynamic and navigable soundscapes and sonic narratives ...

    Authors: Marco Comunità, Andrea Gerino and Lorenzo Picinali
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:18
  34. Language recognition based on embedding aims to maximize inter-class variance and minimize intra-class variance. Previous researches are limited to the training constraint of a single centroid, which cannot ac...

    Authors: Minghang Ju, Yanyan Xu, Dengfeng Ke and Kaile Su
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:17
  35. By means of spatial clustering and time-frequency masking, a mixture of multiple speakers and noise can be separated into the underlying signal components. The parameters of a model, such as a complex angular ...

    Authors: Alexander Bohlender, Lucas Van Severen, Jonathan Sterckx and Nilesh Madhu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:16
  36. To improve the sound quality of hearing devices, equalization filters can be used to achieve acoustic transparency, i.e., listening with the device in the ear is perceptually similar to the open ear. The equal...

    Authors: Henning Schepker, Florian Denk, Birger Kollmeier and Simon Doclo
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:15
  37. Subtitles are a crucial component of Digital Entertainment Content (DEC such as movies and TV shows) localization. With ever increasing catalog (≈ 2M titles) and localization expansion (30+ languages), automat...

    Authors: Honey Gupta and Mayank Sharma
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:14
  38. In lossless audio compression, the predictive residuals must remain sparse when entropy coding is applied. The sign algorithm (SA) is a conventional method for minimizing the magnitudes of residuals; however, ...

    Authors: Taiyo Mineo and Hayaru Shouno
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:12
  39. Multiple predominant instrument recognition in polyphonic music is addressed using decision level fusion of three transformer-based architectures on an ensemble of visual representations. The ensemble consists...

    Authors: Lekshmi Chandrika Reghunath and Rajeev Rajan
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:11
  40. The domain of spatial audio comprises methods for capturing, processing, and reproducing audio content that contains spatial information. Data-based methods are those that operate directly on the spatial infor...

    Authors: Maximo Cobos, Jens Ahrens, Konrad Kowalczyk and Archontis Politis
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:10
  41. Head-related transfer function (HRTF) individualization can improve the perception of binaural sound. The interaural time difference (ITD) of the HRTF is a relevant cue for sound localization, especially in az...

    Authors: Pablo Gutierrez-Parera, Jose J. Lopez, Javier M. Mora-Merchan and Diego F. Larios
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:9
  42. Humans can recognize someone’s identity through their voice and describe the timbral phenomena of voices. Likewise, the singing voice also has timbral phenomena. In vocal pedagogy, vocal teachers listen and th...

    Authors: Yanze Xu, Weiqing Wang, Huahua Cui, Mingyang Xu and Ming Li
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:8
  43. Polyphonic sound event detection aims to detect the types of sound events that occur in given audio clips, and their onset and offset times, in which multiple sound events may occur simultaneously. Deep learni...

    Authors: Haitao Li, Shuguo Yang and Wenwu Wang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:5
  44. In this study, we propose a methodology for separating a singing voice from musical accompaniment in a monaural musical mixture. The proposed method uses robust principal component analysis (RPCA), followed by...

    Authors: Wen-Hsing Lai and Siou-Lin Wang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:4

Who reads the journal?

Learn more about the impact the EURASIP Journal on Audio, Speech, and Music Processing has worldwide

Annual Journal Metrics

  • 2022 Citation Impact
    2.4 - 2-year Impact Factor
    2.0 - 5-year Impact Factor
    1.081 - SNIP (Source Normalized Impact per Paper)
    0.458 - SJR (SCImago Journal Rank)

    2022 Speed
    16 days submission to first editorial decision for all manuscripts (Median)
    185 days submission to accept (Median)

    2022 Usage 
    249,941 downloads
    65 Altmetric mentions 

Funding your APC

​​​​​​​Open access funding and policy support by SpringerOpen​​

​​​​We offer a free open access support service to make it easier for you to discover and apply for article-processing charge (APC) funding. Learn more here