Articles

Page 2 of 11

W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision

Non-parallel data voice conversion (VC) has achieved considerable breakthroughs due to self-supervised pre-trained representation (SSPR) being used in recent years. Features extracted by the pre-trained model ...

Authors: Hao Huang, Lin Wang, Jichen Yang, Ying Hu and Liang He

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:45

Content type: Empirical Research Published on: 28 October 2023
- View Full Text
- View PDF
YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation

Appropriate background music in e-commerce advertisements can help stimulate consumption and build product image. However, many factors like emotion and product category should be taken into account, which mak...

Authors: Le Ma, Xinda Wu, Ruiyuan Tang, Chongjun Zhong and Kejun Zhang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:44

Content type: Methodology Published on: 19 October 2023
- View Full Text
- View PDF
Battling with the low-resource condition for snore sound recognition: introducing a meta-learning strategy

Snoring affects 57 % of men, 40 % of women, and 27 % of children in the USA. Besides, snoring is highly correlated with obstructive sleep apnoea (OSA), which is characterised by loud and frequent snoring. OSA ...

Authors: Jingtan Li, Mengkai Sun, Zhonghao Zhao, Xingcan Li, Gaigai Li, Chen Wu, Kun Qian, Bin Hu, Yoshiharu Yamamoto and Björn W. Schuller

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:43

Content type: Empirical Research Published on: 13 October 2023
- View Full Text
- View PDF
Transformer-based autoencoder with ID constraint for unsupervised anomalous sound detection

Unsupervised anomalous sound detection (ASD) aims to detect unknown anomalous sounds of devices when only normal sound data is available. The autoencoder (AE) and self-supervised learning based methods are two...

Authors: Jian Guan, Youde Liu, Qiuqiang Kong, Feiyang Xiao, Qiaoxi Zhu, Jiantong Tian and Wenwu Wang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:42

Content type: Methodology Published on: 13 October 2023
- View Full Text
- View PDF
Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments

In recent years, the speaker-independent, single-channel speech separation problem has made significant progress with the development of deep neural networks (DNNs). However, separating the speech of each inte...

Authors: Chunxi Wang, Maoshen Jia and Xinfeng Zhang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:41

Content type: Methodology Published on: 12 October 2023
- View Full Text
- View PDF
Speech emotion recognition based on Graph-LSTM neural network

Currently, Graph Neural Networks have been extended to the field of speech signal processing. It is the more compact and flexible way to represent speech sequences by graphs. However, the structures of the rel...

Authors: Yan Li, Yapeng Wang, Xu Yang and Sio-Kei Im

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:40

Content type: Empirical Research Published on: 11 October 2023
- View Full Text
- View PDF
An acoustic echo canceller optimized for hands-free speech telecommunication in large vehicle cabins

Acoustic echo cancelation (AEC) is a system identification problem that has been addressed by various techniques and most commonly by normalized least mean square (NLMS) adaptive algorithms. However, performin...

Authors: Amin Saremi, Balaji Ramkumar, Ghazaleh Ghaffari and Zonghua Gu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:39

Content type: Empirical Research Published on: 7 October 2023
- View Full Text
- View PDF
Direction-of-arrival and power spectral density estimation using a single directional microphone and group-sparse optimization

In this paper, two approaches are proposed for estimating the direction of arrival (DOA) and power spectral density (PSD) of stationary point sources by using a single, rotating, directional microphone. These ...

Authors: Elisa Tengan, Thomas Dietzen, Filip Elvander and Toon van Waterschoot

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:38

Content type: Methodology Published on: 4 October 2023
- View Full Text
- View PDF
Cascade algorithms for combined acoustic feedback cancelation and noise reduction

This paper presents three cascade algorithms for combined acoustic feedback cancelation (AFC) and noise reduction (NR) in speech applications. A prediction error method (PEM)-based adaptive feedback cancelatio...

Authors: Santiago Ruiz, Toon van Waterschoot and Marc Moonen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:37

Content type: Methodology Published on: 21 September 2023
- View Full Text
- View PDF
Learning-based robust speaker counting and separation with the aid of spatial coherence

A three-stage approach is proposed for speaker counting and speech separation in noisy and reverberant environments. In the spatial feature extraction, a spatial coherence matrix (SCM) is computed using whiten...

Authors: Yicheng Hsu and Mingsian R. Bai

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:36

Content type: Empirical Research Published on: 20 September 2023
- View Full Text
- View PDF
Acoustic object canceller: removing a known signal from monaural recording using blind synchronization

In this paper, we propose a technique for removing a specific type of interference from a monaural recording. Nonstationary interferences are generally challenging to eliminate from such recordings. However, i...

Authors: Takao Kawamura, Kouei Yamaoka, Yukoh Wakabayashi, Nobutaka Ono and Ryoichi Miyazaki

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:35

Content type: Methodology Published on: 11 September 2023
- View Full Text
- View PDF
The power of humorous audio: exploring emotion regulation in traffic congestion through EEG-based study

Traffic congestion can lead to negative driving emotions, significantly increasing the likelihood of traffic accidents. Reducing negative driving emotions as a means to mitigate speeding, reckless overtaking, ...

Authors: Lekai Zhang, Yingfan Wang, Kailun He, Hailong Zhang, Baixi Xing, Xiaofeng Liu and Fo Hu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:34

Content type: Empirical Research Published on: 7 September 2023
- View Full Text
- View PDF
Learning domain-heterogeneous speaker recognition systems with personalized continual federated learning

Speaker recognition, the process of automatically identifying a speaker based on individual characteristics in speech signals, presents significant challenges when addressing heterogeneous-domain conditions. F...

Authors: Zhiyong Chen and Shugong Xu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:33

Content type: Methodology Published on: 5 September 2023
- View Full Text
- View PDF
Dual input neural networks for positional sound source localization

In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) a...

Authors: Eric Grinstein, Vincent W. Neo and Patrick A. Naylor

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:32

Content type: Methodology Published on: 30 August 2023
- View Full Text
- View PDF
Training audio transformers for cover song identification

In the past decades, convolutional neural networks (CNNs) have been commonly adopted in audio perception tasks, which aim to learn latent representations. However, for audio analysis, CNNs may exhibit limitati...

Authors: Te Zeng and Francis C. M. Lau

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:31

Content type: Methodology Published on: 25 August 2023
- View Full Text
- View PDF
Channel and temporal-frequency attention UNet for monaural speech enhancement

The presence of noise and reverberation significantly impedes speech clarity and intelligibility. To mitigate these effects, numerous deep learning-based network models have been proposed for speech enhancemen...

Authors: Shiyun Xu, Zehua Zhang and Mingjiang Wang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:30

Content type: Empirical Research Published on: 14 August 2023
- View Full Text
- View PDF
Microphone utility estimation in acoustic sensor networks using single-channel signal features

In multichannel signal processing with distributed sensors, choosing the optimal subset of observed sensor signals to be exploited is crucial in order to maximize algorithmic performance and reduce computation...

Authors: Michael Günther, Andreas Brendel and Walter Kellermann

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:29

Content type: Empirical Research Published on: 3 August 2023
- View Full Text
- View PDF
Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting

Personalized voice triggering is a key technology in voice assistants and serves as the first step for users to activate the voice assistant. Personalized voice triggering involves keyword spotting (KWS) and s...

Authors: Xingwei Liang, Zehua Zhang and Ruifeng Xu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:28

Content type: Empirical Research Published on: 1 July 2023
- View Full Text
- View PDF
Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization

The goal of sound event detection and localization (SELD) is to identify each individual sound event class and its activity time from a piece of audio, while estimating its spatial location at the time of acti...

Authors: Yuting Zhou and Hongjie Wan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:27

Content type: Methodology Published on: 30 June 2023
- View Full Text
- View PDF
Automatic detection of attachment style in married couples through conversation analysis

Analysis of couple interactions using speech processing techniques is an increasingly active multi-disciplinary field that poses challenges such as automatic relationship quality assessment and behavioral codi...

Authors: Tuğçe Melike Koçak, Büşra Çilem Dibek, Esma Nafiye Polat, Nilüfer Kafesçioğlu and Cenk Demiroğlu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:26

Content type: Review Published on: 31 May 2023
- View Full Text
- View PDF
Parallel processing of distributed beamforming and multichannel linear prediction for speech denoising and deverberation in wireless acoustic sensor networks

More and more smart home devices with microphones come into our life in these years; it is highly desirable to connect these microphones as wireless acoustic sensor networks (WASNs) so that these devices can b...

Authors: Zhe Han, Yuxuan Ke, Xiaodong Li and Chengshi Zheng

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:25

Content type: Methodology Published on: 22 May 2023
- View Full Text
- View PDF
Variational Autoencoders for chord sequence generation conditioned on Western harmonic music complexity

In recent years, the adoption of deep learning techniques has allowed to obtain major breakthroughs in the automatic music generation research field, sparking a renewed interest in generative music. A great de...

Authors: Luca Comanducci, Davide Gioiosa, Massimiliano Zanoni, Fabio Antonacci and Augusto Sarti

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:24

Content type: Empirical Research Published on: 15 May 2023
- View Full Text
- View PDF
Paralinguistic and spectral feature extraction for speech emotion classification using machine learning techniques

Emotion plays a dominant role in speech. The same utterance with different emotions can lead to a completely different meaning. The ability to perform various of emotion during speaking is also one of the typi...

Authors: Tong Liu and Xiaochen Yuan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:23

Content type: Methodology Published on: 15 May 2023
- View Full Text
- View PDF
Speech emotion recognition based on emotion perception

Speech emotion recognition (SER) is a hot topic in speech signal processing. With the advanced development of the cheap computing power and proliferation of research in data-driven methods, deep learning appro...

Authors: Gang Liu, Shifang Cai and Ce Wang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:22

Content type: Empirical Research Published on: 12 May 2023
- View Full Text
- View PDF
Time-domain adaptive attention network for single-channel speech separation

Recent years have witnessed a great progress in single-channel speech separation by applying self-attention based networks. Despite the excellent performance in mining relevant long-sequence contextual informa...

Authors: Kunpeng Wang, Hao Zhou, Jingxiang Cai, Wenna Li and Juan Yao

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:21

Content type: Methodology Published on: 11 May 2023
- View Full Text
- View PDF
Explicit-memory multiresolution adaptive framework for speech and music separation

The human auditory system employs a number of principles to facilitate the selection of perceptually separated streams from a complex sound mixture. The brain leverages multi-scale redundant representations of...

Authors: Ashwin Bellur, Karan Thakkar and Mounya Elhilali

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:20

Content type: Empirical Research Published on: 9 May 2023
- View Full Text
- View PDF
MUSIB: musical score inpainting benchmark

Music inpainting is a sub-task of automated music generation that aims to infill incomplete musical pieces to help musicians in their musical composition process. Many methods have been developed for this task...

Authors: Mauricio Araneda-Hernandez, Felipe Bravo-Marquez, Denis Parra and Rodrigo F. Cádiz

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:19

Content type: Empirical Research Published on: 5 May 2023
- View Full Text
- View PDF
A neural network-supported two-stage algorithm for lightweight dereverberation on hearing devices

A two-stage lightweight online dereverberation algorithm for hearing devices is presented in this paper. The approach combines a multi-channel multi-frame linear filter with a single-channel single-frame post-...

Authors: Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning and Timo Gerkmann

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:18

Content type: Empirical Research Published on: 1 May 2023
- View Full Text
- View PDF
MYRiAD: a multi-array room acoustic database

In the development of acoustic signal processing algorithms, their evaluation in various acoustic environments is of utmost importance. In order to advance evaluation in realistic and reproducible scenarios, s...

Authors: Thomas Dietzen, Randall Ali, Maja Taseska and Toon van Waterschoot

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:17

Content type: Empirical Research Published on: 26 April 2023
- View Full Text
- View PDF
Voice activity detection in the presence of transient based on graph

Voice activity detection remains a significant challenge in the presence of transients since transients are more dominant than speech, though it has achieved satisfactory performance in quasi-stationary noisy ...

Authors: Xiao-Yuan Guo, Chun-Xian Gao and Hui Liu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:16

Content type: Empirical Research Published on: 20 April 2023
- View Full Text
- View PDF
Benefits of pre-trained mono- and cross-lingual speech representations for spoken language understanding of Dutch dysarthric speech

With the rise of deep learning, spoken language understanding (SLU) for command-and-control applications such as a voice-controlled virtual assistant can offer reliable hands-free operation to physically disab...

Authors: Pu Wang and Hugo Van hamme

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:15

Content type: Empirical Research Published on: 7 April 2023
- View Full Text
- View PDF
Three-stage training and orthogonality regularization for spoken language recognition

Spoken language recognition has made significant progress in recent years, for which automatic speech recognition has been used as a parallel branch to extract phonetic features. However, there is still a lack...

Authors: Zimu Li, Yanyan Xu, Dengfeng Ke and Kaile Su

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:14

Content type: Methodology Published on: 6 April 2023
- View Full Text
- View PDF
AAM: a dataset of Artificial Audio Multitracks for diverse music information retrieval tasks

We present a new dataset of 3000 artificial music tracks with rich annotations based on real instrument samples and generated by algorithmic composition with respect to music theory. Our collection provides gr...

Authors: Fabian Ostermann, Igor Vatolkin and Martin Ebeling

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:13

Content type: Empirical Research Published on: 23 March 2023
- View Full Text
- View PDF
Deep learning-based wave digital modeling of rate-dependent hysteretic nonlinearities for virtual analog applications

Electromagnetic components greatly contribute to the peculiar timbre of analog audio gear. Indeed, distortion effects due to the nonlinear behavior of magnetic materials are known to play an important role in ...

Authors: Oliviero Massi, Alessandro Ilic Mezza, Riccardo Giampiccolo and Alberto Bernardini

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:12

Content type: Methodology Published on: 8 March 2023
- View Full Text
- View PDF
A latent rhythm complexity model for attribute-controlled drum pattern generation

Most music listeners have an intuitive understanding of the notion of rhythm complexity. Musicologists and scientists, however, have long sought objective ways to measure and model such a distinctively percept...

Authors: Alessandro Ilic Mezza, Massimiliano Zanoni and Augusto Sarti

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:11

Content type: Empirical Research Published on: 17 February 2023
- View Full Text
- View PDF
Research on monaural speech segregation based on feature selection

Speech feature model is the basis of speech and noise separation, speech expression, and different styles of speech conversion. With the development of signal processing methods, the feature types and dimensio...

Authors: Xiaoping Xie, Yongzhen Chen, Rufeng Shen and Dan Tian

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:10

Content type: Research Published on: 16 February 2023
- View Full Text
- View PDF
Correction: Trainable windows for SincNet architecture

Authors: H. C. Prashanth, Madhav Rao, Dhanya Eledath and V. Ramasubramanian

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:9

Content type: Correction Published on: 9 February 2023

The original article was published in EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:3
- View Full Text
- View PDF
Review of methods for coding of speech signals

Speech is the most common form of human communication, and many conversations use digital communication links. For efficient transmission, acoustic speech waveforms are usually converted to digital form, with ...

Authors: Douglas O’Shaughnessy

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:8

Content type: Review Published on: 7 February 2023
- View Full Text
- View PDF
An MMSE graph spectral magnitude estimator for speech signals residing on an undirected multiple graph

The paper uses the K-graphs learning method to construct weighted, connected, undirected multiple graphs, aiming to reveal intrinsic relationships of speech samples in the inter-frame and intra-frame. To benefit ...

Authors: Tingting Wang, Haiyan Guo, Zirui Ge, Qiquan Zhang and Zhen Yang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:7

Content type: Methodology Published on: 3 February 2023
- View Full Text
- View PDF
Heterogeneous separation consistency training for adaptation of unsupervised speech separation

Recently, supervised speech separation has made great progress. However, limited by the nature of supervised training, most existing separation methods require ground-truth sources and are trained on synthetic...

Authors: Jiangyu Han and Yanhua Long

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:6

Content type: Methodology Published on: 20 January 2023
- View Full Text
- View PDF
Sound event triage: detecting sound events considering priority of classes

We propose a new task for sound event detection (SED): sound event triage (SET). The goal of SET is to detect an arbitrary number of high-priority event classes while allowing misdetections of low-priority eve...

Authors: Noriyuki Tonami and Keisuke Imoto

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:5

Content type: Methodology Published on: 20 January 2023
- View Full Text
- View PDF
Beyond the Big Five personality traits for music recommendation systems

The aim of this paper is to investigate the influence of personality traits, characterized by the BFI (Big Five Inventory) and its significant revision called BFI-2, on music recommendation error. The BFI-2 de...

Authors: Mariusz Kleć, Alicja Wieczorkowska, Krzysztof Szklanny and Włodzimierz Strus

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:4

Content type: Empirical Research Published on: 19 January 2023
- View Full Text
- View PDF
Trainable windows for SincNet architecture

SincNet architecture has shown significant benefits over traditional Convolutional Neural Networks (CNN), especially for speaker recognition applications. SincNet comprises parameterized Sinc functions as filt...

Authors: Prashanth H C, Madhav Rao, Dhanya Eledath and Ramasubramanian V

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:3

Content type: Empirical Research Published on: 19 January 2023

The Correction to this article has been published in EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:9
- View Full Text
- View PDF
Stripe-Transformer: deep stripe feature learning for music source separation

Music source separation (MSS) is to isolate musical instrument signals from the given music mixture. Stripes widely exist in music spectrograms, which potentially indicate high-level music information. For exa...

Authors: Jiale Qian, Xinlu Liu, Yi Yu and Wei Li

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:2

Content type: Empirical Research Published on: 12 January 2023
- View Full Text
- View PDF
Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders

The purpose of this paper is to show a music mixing system that is capable of automatically mixing separate raw recordings with good quality regardless of the music genre. This work recalls selected methods fo...

Authors: Damian Koszewski, Thomas Görne, Grazina Korvel and Bozena Kostek

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2023 2023:1

Content type: Empirical Research Published on: 5 January 2023
- View Full Text
- View PDF
Points2Sound: from mono to binaural audio using 3D point cloud scenes

For immersive applications, the generation of binaural sound that matches its visual counterpart is crucial to bring meaningful experiences to people in a virtual environment. Recent studies have shown the pos...

Authors: Francesc Lluís, Vasileios Chatziioannou and Alex Hofmann

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:33

Content type: Empirical Research Published on: 29 December 2022
- View Full Text
- View PDF
Cross-corpus speech emotion recognition using subspace learning and domain adaption

Speech emotion recognition (SER) is a hot topic in speech signal processing. When the training data and the test data come from different corpus, their feature distributions are different, which leads to the d...

Authors: Xuan Cao, Maoshen Jia, Jiawei Ru and Tun-wen Pai

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:32

Content type: Methodology Published on: 27 December 2022
- View Full Text
- View PDF
MetaMGC: a music generation framework for concerts in metaverse

In recent years, there has been a national craze for metaverse concerts. However, existing meta-universe concert efforts often focus on immersive visual experiences and lack consideration of the musical and au...

Authors: Cong Jin, Fengjuan Wu, Jing Wang, Yang Liu, Zixuan Guan and Zhe Han

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:31

Content type: Empirical Research Published on: 13 December 2022
- View Full Text
- View PDF
Quantifying headphone listening experience in virtual sound environments using distraction

Headphones are commonly used in various environments including at home, outside and on public transport. However, the perception and modelling of the interaction of headphone audio and noisy environments is re...

Authors: Milap Rane, Philip Coleman, Russell Mason and Søren Bech

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:30

Content type: Empirical Research Published on: 9 December 2022
- View Full Text
- View PDF
Attention mechanism combined with residual recurrent neural network for sound event detection and localization

In the task of sound event detection and localization (SEDL) in a complex environment, the acoustic signals of different events usually have nonlinear superposition, so the detection and localization effect is...

Authors: Chaofeng Lan, Lei Zhang, Yuanyuan Zhang, Lirong Fu, Chao Sun, Yulan Han and Meng Zhang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:29

Content type: Empirical Research Published on: 5 December 2022
- View Full Text
- View PDF

How was your experience today?

Rating Please select one rating

Awful

Bad

Good

Great

Thank you for your feedback.

Tell us why (opens in a new tab)

Articles

Follow

Affiliated with

Annual Journal Metrics

Funding your APC

​​​​​​​​​