Articles

Page 3 of 11

Convolutional neural networks for the classification of guitar effects and extraction of the parameter settings of single and multi-guitar effects from instrument mixes

Guitar effects are commonly used in popular music to shape the guitar sound to fit specific genres, or to create more variety within musical compositions. The sound not only is determined by the choice of the ...

Authors: Reemt Hinrichs, Kevin Gerkens, Alexander Lange and Jörn Ostermann

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:28

Content type: Empirical Research Published on: 23 October 2022
- View Full Text
- View PDF
AUC optimization for deep learning-based voice activity detection

Voice activity detection (VAD) based on deep neural networks (DNN) have demonstrated good performance in adverse acoustic environments. Current DNN-based VAD optimizes a surrogate function, e.g., minimum cross...

Authors: Xiao-Lei Zhang and Menglong Xu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:27

Content type: Empirical Research Published on: 22 October 2022
- View Full Text
- View PDF
Automated audio captioning: an overview of recent progress and new challenges

Automated audio captioning is a cross-modal translation task that aims to generate natural language descriptions for given audio clips. This task has received increasing attention with the release of freely av...

Authors: Xinhao Mei, Xubo Liu, Mark D. Plumbley and Wenwu Wang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:26

Content type: Review Published on: 9 October 2022
- View Full Text
- View PDF
Multi-encoder attention-based architectures for sound recognition with partial visual assistance

Large-scale sound recognition data sets typically consist of acoustic recordings obtained from multimedia libraries. As a consequence, modalities other than audio can often be exploited to improve the outputs ...

Authors: Wim Boes and Hugo Van hamme

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:25

Content type: Methodology Published on: 8 October 2022
- View Full Text
- View PDF
Correction: N-dimensional N-microphone sound source localization

Authors: Ali Parsayan and Seyed Mohammad Ahadi

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:24

Content type: Correction Published on: 27 September 2022

The original article was published in EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:27
- View Full Text
- View PDF
Comparison of semi-supervised deep learning algorithms for audio classification

In this article, we adapted five recent SSL methods to the task of audio classification. The first two methods, namely Deep Co-Training (DCT) and Mean Teacher (MT), involve two collaborative neural networks. T...

Authors: Léo Cances, Etienne Labbé and Thomas Pellegrini

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:23

Content type: Empirical Research Published on: 19 September 2022
- View Full Text
- View PDF
A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence

In this paper, we propose a supervised single-channel speech enhancement method that combines Kullback-Leibler (KL) divergence-based non-negative matrix factorization (NMF) and a hidden Markov model (NMF-HMM)....

Authors: Yang Xiang, Liming Shi, Jesper Lisby Højvang, Morten Højfeldt Rasmussen and Mads Græsbøll Christensen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:22

Content type: Methodology Published on: 8 September 2022
- View Full Text
- View PDF
A large TV dataset for speech and music activity detection

Automatic speech and music activity detection (SMAD) is an enabling task that can help segment, index, and pre-process audio content in radio broadcast and TV programs. However, due to copyright concerns and t...

Authors: Yun-Ning Hung, Chih-Wei Wu, Iroro Orife, Aaron Hipple, William Wolcott and Alexander Lerch

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:21

Content type: Empirical Research Published on: 3 September 2022
- View Full Text
- View PDF
Black-box adversarial attacks through speech distortion for speech emotion recognition

Speech emotion recognition is a key branch of affective computing. Nowadays, it is common to detect emotional diseases through speech emotion recognition. Various detection methods of emotion recognition, such...

Authors: Jinxing Gao, Diqun Yan and Mingyu Dong

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:20

Content type: Review Published on: 17 August 2022
- View Full Text
- View PDF
Deep neural networks for automatic speech processing: a survey from large corpora to limited data

Most state-of-the-art speech systems use deep neural networks (DNNs). These systems require a large amount of data to be learned. Hence, training state-of-the-art frameworks on under-resourced speech challenge...

Authors: Vincent Roger, Jérôme Farinas and Julien Pinquier

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:19

Content type: Review Published on: 17 August 2022
- View Full Text
- View PDF
PlugSonic: a web- and mobile-based platform for dynamic and navigable binaural audio

PlugSonic is a series of web- and mobile-based applications designed to edit samples and apply audio effects (PlugSonic Sample) and create and experience dynamic and navigable soundscapes and sonic narratives ...

Authors: Marco Comunità, Andrea Gerino and Lorenzo Picinali

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:18

Content type: Software Published on: 15 July 2022
- View Full Text
- View PDF
Masked multi-center angular margin loss for language recognition

Language recognition based on embedding aims to maximize inter-class variance and minimize intra-class variance. Previous researches are limited to the training constraint of a single centroid, which cannot ac...

Authors: Minghang Ju, Yanyan Xu, Dengfeng Ke and Kaile Su

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:17

Content type: Methodology Published on: 7 July 2022
- View Full Text
- View PDF
DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models

By means of spatial clustering and time-frequency masking, a mixture of multiple speakers and noise can be separated into the underlying signal components. The parameters of a model, such as a complex angular ...

Authors: Alexander Bohlender, Lucas Van Severen, Jonathan Sterckx and Nilesh Madhu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:16

Content type: Methodology Published on: 18 June 2022
- View Full Text
- View PDF
Robust single- and multi-loudspeaker least-squares-based equalization for hearing devices

To improve the sound quality of hearing devices, equalization filters can be used to achieve acoustic transparency, i.e., listening with the device in the ear is perceptually similar to the open ear. The equal...

Authors: Henning Schepker, Florian Denk, Birger Kollmeier and Simon Doclo

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:15

Content type: Empirical Research Published on: 11 June 2022
- View Full Text
- View PDF
Language agnostic missing subtitle detection

Subtitles are a crucial component of Digital Entertainment Content (DEC such as movies and TV shows) localization. With ever increasing catalog (≈ 2M titles) and localization expansion (30+ languages), automat...

Authors: Honey Gupta and Mayank Sharma

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:14

Content type: Research Published on: 11 June 2022
- View Full Text
- View PDF
Data-based spatial audio processing

Authors: Maximo Cobos, Jens Ahrens, Konrad Kowalczyk and Archontis Politis

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:13

Content type: Editorial Published on: 8 June 2022
- View Full Text
- View PDF
Improving sign-algorithm convergence rate using natural gradient for lossless audio compression

In lossless audio compression, the predictive residuals must remain sparse when entropy coding is applied. The sign algorithm (SA) is a conventional method for minimizing the magnitudes of residuals; however, ...

Authors: Taiyo Mineo and Hayaru Shouno

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:12

Content type: Methodology Published on: 21 May 2022
- View Full Text
- View PDF
Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music

Multiple predominant instrument recognition in polyphonic music is addressed using decision level fusion of three transformer-based architectures on an ensemble of visual representations. The ensemble consists...

Authors: Lekshmi Chandrika Reghunath and Rajeev Rajan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:11

Content type: Empirical Research Published on: 16 May 2022
- View Full Text
- View PDF
An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction

The domain of spatial audio comprises methods for capturing, processing, and reproducing audio content that contains spatial information. Data-based methods are those that operate directly on the spatial infor...

Authors: Maximo Cobos, Jens Ahrens, Konrad Kowalczyk and Archontis Politis

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:10

Content type: Review Published on: 16 May 2022
- View Full Text
- View PDF
Interaural time difference individualization in HRTF by scaling through anthropometric parameters

Head-related transfer function (HRTF) individualization can improve the perception of binaural sound. The interaural time difference (ITD) of the HRTF is a relevant cue for sound localization, especially in az...

Authors: Pablo Gutierrez-Parera, Jose J. Lopez, Javier M. Mora-Merchan and Diego F. Larios

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:9

Content type: Research Published on: 12 May 2022
- View Full Text
- View PDF
Paralinguistic singing attribute recognition using supervised machine learning for describing the classical tenor solo singing voice in vocal pedagogy

Humans can recognize someone’s identity through their voice and describe the timbral phenomena of voices. Likewise, the singing voice also has timbral phenomena. In vocal pedagogy, vocal teachers listen and th...

Authors: Yanze Xu, Weiqing Wang, Huahua Cui, Mingyang Xu and Ming Li

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:8

Content type: Empirical Research Published on: 15 April 2022
- View Full Text
- View PDF
Estimation of playable piano fingering by pitch-difference fingering match model

Most existing statistical models used to predict piano fingering apply explicit constraints among fingers and between fingers and notes; however, they disregard the relationship among notes. Furthermore, the s...

Authors: Xin Guan, Haoyue Zhao and Qiang Li

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:7

Content type: Research Published on: 11 April 2022
- View Full Text
- View PDF
On the selection of the number of beamformers in beamforming-based binaural reproduction

In recent years, spatial audio reproduction has been widely researched with many studies focusing on headphone-based spatial reproduction. A popular format for spatial audio is higher order Ambisonics (HOA), w...

Authors: Itay Ifergan and Boaz Rafaely

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:6

Content type: Research Published on: 30 March 2022
- View Full Text
- View PDF
Improved capsule routing for weakly labeled sound event detection

Polyphonic sound event detection aims to detect the types of sound events that occur in given audio clips, and their onset and offset times, in which multiple sound events may occur simultaneously. Deep learni...

Authors: Haitao Li, Shuguo Yang and Wenwu Wang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:5

Content type: Research Published on: 7 March 2022
- View Full Text
- View PDF
RPCA-DRNN technique for monaural singing voice separation

In this study, we propose a methodology for separating a singing voice from musical accompaniment in a monaural musical mixture. The proposed method uses robust principal component analysis (RPCA), followed by...

Authors: Wen-Hsing Lai and Siou-Lin Wang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:4

Content type: Research Published on: 5 February 2022
- View Full Text
- View PDF
Automatic discrimination between front and back ensemble locations in HRTF-convolved binaural recordings of music

One of the greatest challenges in the development of binaural machine audition systems is the disambiguation between front and back audio sources, particularly in complex spatial audio scenes. The goal of this...

Authors: Sławomir K. Zieliński, Paweł Antoniuk, Hyunkook Lee and Dale Johnson

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:3

Content type: Empirical Research Published on: 15 January 2022
- View Full Text
- View PDF
Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling

Conventional automatic speech recognition (ASR) and emerging end-to-end (E2E) speech recognition have achieved promising results after being provided with sufficient resources. However, for low-resource langua...

Authors: Siqing Qin, Longbiao Wang, Sheng Li, Jianwu Dang and Lixin Pan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:2

Content type: Research Published on: 12 January 2022
- View Full Text
- View PDF
Auxiliary function-based algorithm for blind extraction of a moving speaker

In this paper, we propose a novel algorithm for blind source extraction (BSE) of a moving acoustic source recorded by multiple microphones. The algorithm is based on independent vector extraction (IVE) where t...

Authors: Jakub Janský, Zbyněk Koldovský, Jiří Málek, Tomáš Kounovský and Jaroslav Čmejla

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:1

Content type: Research Published on: 4 January 2022
- View Full Text
- View PDF
Anchor voiceprint recognition in live streaming via RawNet-SA and gated recurrent unit

With the sharp booming of online live streaming platforms, some anchors seek profits and accumulate popularity by mixing inappropriate content into live programs. After being blacklisted, these anchors even fo...

Authors: Jiacheng Yao, Jing Zhang, Jiafeng Li and Li Zhuo

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:45

Content type: Research Published on: 20 December 2021
- View Full Text
- View PDF
Unsupervised domain adaptation for lip reading based on cross-modal knowledge distillation

We present an unsupervised domain adaptation (UDA) method for a lip-reading model that is an image-based speech recognition model. Most of conventional UDA methods cannot be applied when the adaptation data co...

Authors: Yuki Takashima, Ryoichi Takashima, Ryota Tsunoda, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki and Nobuaki Motoyama

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:44

Content type: Research Published on: 11 December 2021
- View Full Text
- View PDF
A recursive expectation-maximization algorithm for speaker tracking and separation

The problem of blind and online speaker localization and separation using multiple microphones is addressed based on the recursive expectation-maximization (REM) procedure. A two-stage REM-based algorithm is p...

Authors: Ofer Schwartz and Sharon Gannot

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:43

Content type: Research Published on: 4 December 2021
- View Full Text
- View PDF
Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation

Deep learning techniques are currently being applied in automated text-to-speech (TTS) systems, resulting in significant improvements in performance. However, these methods require large amounts of text-speech...

Authors: Zolzaya Byambadorj, Ryota Nishimura, Altangerel Ayush, Kengo Ohta and Norihide Kitaoka

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:42

Content type: Research Published on: 4 December 2021
- View Full Text
- View PDF
Spherical harmonic covariance and magnitude function encodings for beamformer design

Microphone and speaker array designs have increasingly diverged from simple topologies due to diversity of physical host geometries and use cases. Effective beamformer design must now account for variation in ...

Authors: Yuancheng Luo

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:41

Content type: Research Published on: 3 December 2021
- View Full Text
- View PDF
U²-VC: one-shot voice conversion using two-level nested U-structure

Voice conversion is to transform a source speaker to the target one, while keeping the linguistic content unchanged. Recently, one-shot voice conversion gradually becomes a hot topic for its potentially wide r...

Authors: Fangkun Liu, Hui Wang, Renhua Peng, Chengshi Zheng and Xiaodong Li

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:40

Content type: Research Published on: 24 November 2021
- View Full Text
- View PDF
dEchorate: a calibrated room impulse response dataset for echo-aware signal processing

This paper presents a new dataset of measured multichannel room impulse responses (RIRs) named dEchorate. It includes annotations of early echo timings and 3D positions of microphones, real sources, and image ...

Authors: Diego Di Carlo, Pinchas Tandeitnik, Cedrić Foy, Nancy Bertin, Antoine Deleforge and Sharon Gannot

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:39

Content type: Empirical Research Published on: 23 November 2021
- View Full Text
- View PDF
A multichannel learning-based approach for sound source separation in reverberant environments

In this paper, a multichannel learning-based network is proposed for sound source separation in reverberant field. The network can be divided into two parts according to the training strategies. In the first s...

Authors: You-Siang Chen, Zi-Jie Lin and Mingsian R. Bai

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:38

Content type: Research Published on: 20 November 2021
- View Full Text
- View PDF
Efficient binaural rendering of spherical microphone array data by linear filtering

High-quality rendering of spatial sound fields in real-time is becoming increasingly important with the steadily growing interest in virtual and augmented reality technologies. Typically, a spherical microphon...

Authors: Johannes M. Arend, Tim Lübeck and Christoph Pörschmann

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:37

Content type: Research Published on: 6 November 2021
- View Full Text
- View PDF
Comparative evaluation of interpolation methods for the directivity of musical instruments

Measurements of the directivity of acoustic sound sources must be interpolated in almost all cases, either for spatial upsampling to higher resolution representations of the data, for spatial resampling to ano...

Authors: David Ackermann, Fabian Brinkmann, Franz Zotter, Malte Kob and Stefan Weinzierl

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:36

Content type: Research Published on: 30 October 2021
- View Full Text
- View PDF
Nonlinear residual echo suppression based on dual-stream DPRNN

The acoustic echo cannot be entirely removed by linear adaptive filters due to the nonlinear relationship between the echo and the far-end signal. Usually, a post-processing module is required to further suppr...

Authors: Hongsheng Chen, Guoliang Chen, Kai Chen and Jing Lu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:35

Content type: Research Published on: 7 September 2021
- View Full Text
- View PDF
Pronunciation augmentation for Mandarin-English code-switching speech recognition

Code-switching (CS) refers to the phenomenon of using more than one language in an utterance, and it presents great challenge to automatic speech recognition (ASR) due to the code-switching property in one utt...

Authors: Yanhua Long, Shuang Wei, Jie Lian and Yijie Li

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:34

Content type: Research Published on: 30 August 2021
- View Full Text
- View PDF
An online algorithm for echo cancellation, dereverberation and noise reduction based on a Kalman-EM Method

Many modern smart devices are equipped with a microphone array and a loudspeaker (or are able to connect to one). Acoustic echo cancellation algorithms, specifically their multi-microphone variants, are essent...

Authors: Nili Cohen, Gershon Hazan, Boaz Schwartz and Sharon Gannot

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:33

Content type: Research Published on: 28 August 2021
- View Full Text
- View PDF
A noise PSD estimation algorithm using derivative-based high-pass filter in non-stationary noise conditions

The minimum mean-square error (MMSE)-based noise PSD estimators have been used widely for speech enhancement. However, the MMSE noise PSD estimators assume that the noise signal changes at a slower rate than t...

Authors: Sujan Kumar Roy and Kuldip K. Paliwal

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:32

Content type: Research Published on: 14 August 2021
- View Full Text
- View PDF
Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

The performance of speech recognition systems trained with neutral utterances degrades significantly when these systems are tested with emotional speech. Since everybody can speak emotionally in the real-world...

Authors: Masoud Geravanchizadeh, Elnaz Forouhandeh and Meysam Bashirpour

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:31

Content type: Research Published on: 4 August 2021
- View Full Text
- View PDF
Musical note onset detection based on a spectral sparsity measure

If music is the language of the universe, musical note onsets may be the syllables for this language. Not only do note onsets define the temporal pattern of a musical piece, but their time-frequency characteri...

Authors: Mina Mounir, Peter Karsmakers and Toon van Waterschoot

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:30

Content type: Research Published on: 28 July 2021
- View Full Text
- View PDF
Single-channel speech enhancement based on joint constrained dictionary learning

To improve the performance of speech enhancement in a complex noise environment, a joint constrained dictionary learning method for single-channel speech enhancement is proposed, which solves the “cross projec...

Authors: Linhui Sun, Yunyi Bu, Pingan Li and Zihao Wu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:29

Content type: Research Published on: 27 July 2021
- View Full Text
- View PDF
Performance vs. hardware requirements in state-of-the-art automatic speech recognition

The last decade brought significant advances in automatic speech recognition (ASR) thanks to the evolution of deep learning methods. ASR systems evolved from pipeline-based systems, that modeled hand-crafted s...

Authors: Alexandru-Lucian Georgescu, Alessandro Pappalardo, Horia Cucu and Michaela Blott

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:28

Content type: Review Published on: 21 July 2021
- View Full Text
- View PDF
Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system

Many end-to-end approaches have been proposed to detect predefined keywords. For scenarios of multi-keywords, there are still two bottlenecks that need to be resolved: (1) the distribution of important data th...

Authors: Gui-Xin Shi, Wei-Qiang Zhang, Guan-Bo Wang, Jing Zhao, Shu-Zhou Chai and Ze-Yu Zhao

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:27

Content type: Research Published on: 8 July 2021
- View Full Text
- View PDF
Adversarial joint training with self-attention mechanism for robust end-to-end speech recognition

Lately, the self-attention mechanism has marked a new milestone in the field of automatic speech recognition (ASR). Nevertheless, its performance is susceptible to environmental intrusions as the system predic...

Authors: Lujun Li, Yikai Kang, Yuchen Shi, Ludwig Kürzinger, Tobias Watzel and Gerhard Rigoll

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:26

Content type: Research Published on: 5 July 2021
- View Full Text
- View PDF
Geometry calibration in wireless acoustic sensor networks utilizing DoA and distance information

Due to the ad hoc nature of wireless acoustic sensor networks, the position of the sensor nodes is typically unknown. This contribution proposes a technique to estimate the position and orientation of the sens...

Authors: Tobias Gburrek, Joerg Schmalenstroeer and Reinhold Haeb-Umbach

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:25

Content type: Methodology Published on: 2 July 2021
- View Full Text
- View PDF
Components loss for neural networks in mask-based speech enhancement

Estimating time-frequency domain masks for single-channel speech enhancement using deep learning methods has recently become a popular research field with promising results. In this paper, we propose a novel comp...

Authors: Ziyi Xu, Samy Elshamy, Ziyue Zhao and Tim Fingscheidt

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:24

Content type: Research Published on: 2 July 2021
- View Full Text
- View PDF

How was your experience today?

Rating Please select one rating

Awful

Bad

Good

Great

Thank you for your feedback.

Tell us why (opens in a new tab)

Articles

Follow

Affiliated with

Annual Journal Metrics

Funding your APC

​​​​​​​​​