Articles

Page 4 of 11

Multi-source localization by using offset residual weight

Multiple sound source localization is a hot issue of concern in recent years. The Single Source Zone (SSZ) based localization methods achieve good performance due to the detection and utilization of the Time-F...

Authors: Maoshen Jia, Shang Gao and Changchun Bao

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:23

Content type: Research Published on: 24 June 2021
- View Full Text
- View PDF
Feature compensation based on independent noise estimation for robust speech recognition

In this paper, we propose a novel feature compensation algorithm based on independent noise estimation, which employs a Gaussian mixture model (GMM) with fewer Gaussian components to rapidly estimate the noise...

Authors: Yong Lü, Han Lin, Pingping Wu and Yitao Chen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:22

Content type: Research Published on: 16 June 2021
- View Full Text
- View PDF
Residual feedback suppression with extended model-based postfilters

When designing closed-loop electro-acoustic systems, which can commonly be found in hearing aids or public address systems, the most challenging task is canceling and/or suppressing the feedback caused by the ...

Authors: Marco Gimm, Philipp Bulling and Gerhard Schmidt

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:21

Content type: Research Published on: 28 May 2021
- View Full Text
- View PDF
Neural network-based non-intrusive speech quality assessment using attention pooling function

Recently, the non-intrusive speech quality assessment method has attracted a lot of attention since it does not require the original reference signals. At the same time, neural networks began to be applied to ...

Authors: Miao Liu, Jing Wang, Weiming Yi and Fang Liu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:20

Content type: Research Published on: 17 May 2021
- View Full Text
- View PDF
Frequency-dependent auto-pooling function for weakly supervised sound event detection

Sound event detection (SED), which is typically treated as a supervised problem, aims at detecting types of sound events and corresponding temporal information. It requires to estimate onset and offset annotat...

Authors: Sichen Liu, Feiran Yang, Yin Cao and Jun Yang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:19

Content type: Research Published on: 17 May 2021
- View Full Text
- View PDF
End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network

Amongst the various characteristics of a speech signal, the expression of emotion is one of the characteristics that exhibits the slowest temporal dynamics. Hence, a performant speech emotion recognition (SER)...

Authors: Duowei Tang, Peter Kuppens, Luc Geurts and Toon van Waterschoot

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:18

Content type: Research Published on: 12 May 2021
- View Full Text
- View PDF
Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms

Deep learning-based speech enhancement algorithms have shown their powerful ability in removing both stationary and non-stationary noise components from noisy speech observations. But they often introduce arti...

Authors: Yuxuan Ke, Andong Li, Chengshi Zheng, Renhua Peng and Xiaodong Li

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:17

Content type: Research Published on: 12 April 2021
- View Full Text
- View PDF
Dynamically localizing multiple speakers based on the time-frequency domain

In this study, we present a deep neural network-based online multi-speaker localization algorithm based on a multi-microphone array. Following the W-disjoint orthogonality principle in the spectral domain, tim...

Authors: Hodaya Hammer, Shlomo E. Chazan, Jacob Goldberger and Sharon Gannot

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:16

Content type: Research Published on: 8 April 2021
- View Full Text
- View PDF
Correction to: An integrated MVDR beamformer for speech enhancement using a local microphone array and external microphones

An amendment to this paper has been published and can be accessed via the original article.

Authors: Randall Ali, Toon van Waterschoot and Marc Moonen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:15

Content type: Correction Published on: 6 April 2021

The original article was published in EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:10
- View Full Text
- View PDF
Acoustic DOA estimation using space alternating sparse Bayesian learning

Estimating the direction-of-arrival (DOA) of multiple acoustic sources is one of the key technologies for humanoid robots and drones. However, it is a most challenging problem due to a number of factors, inclu...

Authors: Zonglong Bai, Liming Shi, Jesper Rindom Jensen, Jinwei Sun and Mads Græsbøll Christensen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:14

Content type: Research Published on: 6 April 2021
- View Full Text
- View PDF
NMF-weighted SRP for multi-speaker direction of arrival estimation: robustness to spatial aliasing while exploiting sparsity in the atom-time domain

Localization of multiple speakers using microphone arrays remains a challenging problem, especially in the presence of noise and reverberation. State-of-the-art localization algorithms generally exploit the sp...

Authors: Sushmita Thakallapalli, Suryakanth V. Gangashetty and Nilesh Madhu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:13

Content type: Research Published on: 3 March 2021
- View Full Text
- View PDF
Analysis of transition cost and model parameters in speaker diarization for meetings

There has been little work in the literature on the speaker diarization of meetings with multiple distance microphones since the publications in 2012 related to the last National Institute of Standards (NIST) ...

Authors: Beatriz Martínez-González, José M. Pardo, José A. Vallejo-Pinto, Rubén San-Segundo and Javier Ferreiros

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:12

Content type: Research Published on: 24 February 2021
- View Full Text
- View PDF
Accent modification for speech recognition of non-native speakers using neural style transfer

Nowadays automatic speech recognition (ASR) systems can achieve higher and higher accuracy rates depending on the methodology applied and datasets used. The rate decreases significantly when the ASR system is ...

Authors: Kacper Radzikowski, Le Wang, Osamu Yoshie and Robert Nowak

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:11

Content type: Research Published on: 18 February 2021
- View Full Text
- View PDF
An integrated MVDR beamformer for speech enhancement using a local microphone array and external microphones

An integrated version of the minimum variance distortionless response (MVDR) beamformer for speech enhancement using a microphone array has been recently developed, which merges the benefits of imposing constr...

Authors: Randall Ali, Toon van Waterschoot and Marc Moonen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:10

Content type: Research Published on: 10 February 2021

The Correction to this article has been published in EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:15
- View Full Text
- View PDF
A CNN-based approach to identification of degradations in speech signals

The presence of degradations in speech signals, which causes acoustic mismatch between training and operating conditions, deteriorates the performance of many speech-based systems. A variety of enhancement tec...

Authors: Yuki Saishu, Amir Hossein Poorjam and Mads Græsbøll Christensen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:9

Content type: Research Published on: 5 February 2021
- View Full Text
- View PDF
A review of infant cry analysis and classification

This paper reviews recent research works in infant cry signal analysis and classification tasks. A broad range of literatures are reviewed mainly from the aspects of data acquisition, cross domain signal proce...

Authors: Chunyan Ji, Thosini Bamunu Mudiyanselage, Yutong Gao and Yi Pan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:8

Content type: Review Published on: 5 February 2021
- View Full Text
- View PDF
Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices

Over the recent years, machine learning techniques have been employed to produce state-of-the-art results in several audio related tasks. The success of these approaches has been largely due to access to large...

Authors: Rajat Hebbar, Pavlos Papadopoulos, Ramon Reyes, Alexander F. Danvers, Angelina J. Polsinelli, Suzanne A. Moseley, David A. Sbarra, Matthias R. Mehl and Shrikanth Narayanan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:7

Content type: Research Published on: 3 February 2021
- View Full Text
- View PDF
Sparse pursuit and dictionary learning for blind source separation in polyphonic music recordings

We propose an algorithm for the blind separation of single-channel audio signals. It is based on a parametric model that describes the spectral properties of the sounds of musical instruments independently of ...

Authors: Sören Schulze and Emily J. King

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:6

Content type: Research Published on: 28 January 2021
- View Full Text
- View PDF
Audio source separation by activity probability detection with maximum correlation and simplex geometry

Two novel methods for speaker separation of multi-microphone recordings that can also detect speakers with infrequent activity are presented. The proposed methods are based on a statistical model of the probab...

Authors: Bracha Laufer-Goldshtein, Ronen Talmon and Sharon Gannot

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:5

Content type: Research Published on: 28 January 2021
- View Full Text
- View PDF
Dynamic out-of-vocabulary word registration to language model for speech recognition

We propose a method of dynamically registering out-of-vocabulary (OOV) words by assigning the pronunciations of these words to pre-inserted OOV tokens, editing the pronunciations of the tokens. To do this, we ...

Authors: Norihide Kitaoka, Bohan Chen and Yuya Obashi

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:4

Content type: Research Published on: 25 January 2021
- View Full Text
- View PDF
Time–frequency scattering accurately models auditory similarities between instrumental playing techniques

Instrumentalplaying techniques such as vibratos, glissandos, and trills often denote musical expressivity, both in classical and folk contexts. However, most existing approaches to music similarity retrieval f...

Authors: Vincent Lostanlen, Christian El-Hajj, Mathias Rossignol, Grégoire Lafay, Joakim Andén and Mathieu Lagrange

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:3

Content type: Research Published on: 11 January 2021
- View Full Text
- View PDF
Forward-backward recursive expectation-maximization for concurrent speaker tracking

In this paper, a study addressing the task of tracking multiple concurrent speakers in reverberant conditions is presented. Since both past and future observations can contribute to the current location estima...

Authors: Yuval Dorfan, Boaz Schwartz and Sharon Gannot

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:2

Content type: Research Published on: 9 January 2021
- View Full Text
- View PDF
Progressive loss functions for speech enhancement with deep neural networks

The progressive paradigm is a promising strategy to optimize network performance for speech enhancement purposes. Recent works have shown different strategies to improve the accuracy of speech enhancement solu...

Authors: Jorge Llombart, Dayana Ribas, Antonio Miguel, Luis Vicente, Alfonso Ortega and Eduardo Lleida

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:1

Content type: Research Published on: 7 January 2021
- View Full Text
- View PDF
Binaural speaker identification using the equalization-cancelation technique

In real applications, environmental effects such as additive noise and room reverberation lead to a mismatch between training and testing signals that substantially reduces the performance of far-field speaker...

Authors: Masoud Geravanchizadeh and Sina Ghalamiosgouei

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:20

Content type: Research Published on: 3 December 2020
- View Full Text
- View PDF
Towards cross-modal pre-training and learning tempo-spatial characteristics for audio recognition with convolutional and recurrent neural networks

In this paper, we investigate the performance of two deep learning paradigms for the audio-based tasks of acoustic scene, environmental sound and domestic activity classification. In particular, a convolutiona...

Authors: Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Lukas Stappen, Alice Baird, Lukas Koebe and Björn Schuller

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:19

Content type: Research Published on: 2 December 2020
- View Full Text
- View PDF
A simulation study on optimal scores for speaker recognition

In this article, we conduct a comprehensive simulation study for the optimal scores of speaker recognition systems that are based on speaker embedding. For that purpose, we first revisit the optimal scores for...

Authors: Dong Wang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:18

Content type: Research Published on: 25 November 2020
- View Full Text
- View PDF
Depression-level assessment from multi-lingual conversational speech data using acoustic and text features

Depression is a widespread mental health problem around the world with a significant burden on economies. Its early diagnosis and treatment are critical to reduce the costs and even save lives. One key aspect ...

Authors: Cenk Demiroglu, Aslı Beşirli, Yasin Ozkanca and Selime Çelik

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:17

Content type: Research Published on: 17 November 2020
- View Full Text
- View PDF
DOANet: a deep dilated convolutional neural network approach for search and rescue with drone-embedded sound source localization

Drone-embedded sound source localization (SSL) has interesting application perspective in challenging search and rescue scenarios due to bad lighting conditions or occlusions. However, the problem gets complic...

Authors: Alif Bin Abdul Qayyum, K. M. Naimul Hassan, Adrita Anika, Md. Farhan Shadiq, Md Mushfiqur Rahman, Md. Tariqul Islam, Sheikh Asif Imran, Shahruk Hossain and Mohammad Ariful Haque

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:16

Content type: Research Published on: 5 November 2020
- View Full Text
- View PDF
Steerable differential beamformers with planar microphone arrays

Humanoid robots require to use microphone arrays to acquire speech signals from the human communication partner while suppressing noise, reverberation, and interferences. Unlike many other applications, microp...

Authors: Gongping Huang, Jingdong Chen, Jacob Benesty, Israel Cohen and Xudong Zhao

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:15

Content type: Research Published on: 4 November 2020
- View Full Text
- View PDF
Multichannel speaker interference reduction using frequency domain adaptive filtering

Microphone leakage or crosstalk is a common problem in multichannel close-talk audio recordings (e.g., meetings or live music performances), which occurs when a target signal does not only couple into its dedi...

Authors: Patrick Meyer, Samy Elshamy and Tim Fingscheidt

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:14

Content type: Research Published on: 4 November 2020
- View Full Text
- View PDF
Noise power spectral density scaled SNR response estimation with restricted range search for sound source localisation using unmanned aerial vehicles

A method to locate sound sources using an audio recording system mounted on an unmanned aerial vehicle (UAV) is proposed. The method introduces extension algorithms to apply on top of a baseline approach, whic...

Authors: Benjamin Yen and Yusuke Hioka

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:13

Content type: Research Published on: 22 September 2020
- View Full Text
- View PDF
Estimation of acoustic echoes using expectation-maximization methods

Estimation problems like room geometry estimation and localization of acoustic reflectors are of great interest and importance in robot and drone audition. Several methods for tackling these problems exist, bu...

Authors: Usama Saqib, Sharon Gannot and Jesper Rindom Jensen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:12

Content type: Research Published on: 8 August 2020
- View Full Text
- View PDF
Motor data-regularized nonnegative matrix factorization for ego-noise suppression

Ego-noise, i.e., the noise a robot causes by its own motions, significantly corrupts the microphone signal and severely impairs the robot’s capability to interact seamlessly with its environment. Therefore, su...

Authors: Alexander Schmidt, Andreas Brendel, Thomas Haubner and Walter Kellermann

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:11

Content type: Research Published on: 31 July 2020
- View Full Text
- View PDF
A depthwise separable convolutional neural network for keyword spotting on an embedded system

A keyword spotting algorithm implemented on an embedded system using a depthwise separable convolutional neural network classifier is reported. The proposed system was derived from a high-complexity system wit...

Authors: Peter Mølgaard Sørensen, Bastian Epp and Tobias May

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:10

Content type: Research Published on: 25 June 2020
- View Full Text
- View PDF
Joint speaker localization and array calibration using expectation-maximization

Ad hoc acoustic networks comprising multiple nodes, each of which consists of several microphones, are addressed. From the ad hoc nature of the node constellation, microphone positions are unknown. Hence, typi...

Authors: Yuval Dorfan, Ofer Schwartz and Sharon Gannot

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:9

Content type: Research Published on: 9 June 2020
- View Full Text
- View PDF
Ensemble of convolutional neural networks to improve animal audio classification

In this work, we present an ensemble for automated audio classification that fuses different types of features extracted from audio files. These features are evaluated, compared, and fused with the goal of pro...

Authors: Loris Nanni, Yandre M. G. Costa, Rafael L. Aguiar, Rafael B. Mangolin, Sheryl Brahnam and Carlos N. Silla Jr.

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:8

Content type: Research Published on: 26 May 2020
- View Full Text
- View PDF
Quadratic approach for single-channel noise reduction

In this paper, we introduce a quadratic approach for single-channel noise reduction. The desired signal magnitude is estimated by applying a linear filter to a modified version of the observations’ vector. The...

Authors: Gal Itzhak, Jacob Benesty and Israel Cohen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:7

Content type: Research Published on: 15 April 2020
- View Full Text
- View PDF
Discriminative features based on modified log magnitude spectrum for playback speech detection

In order to improve the performance of hand-crafted features to detect playback speech, two discriminative features, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients,...

Authors: Jichen Yang, Longting Xu, Bo Ren and Yunyun Ji

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:6

Content type: Research Published on: 7 April 2020
- View Full Text
- View PDF
Multiclass audio segmentation based on recurrent neural networks for broadcast domain data

This paper presents a new approach based on recurrent neural networks (RNN) to the multiclass audio segmentation task whose goal is to classify an audio signal as speech, music, noise or a combination of these...

Authors: Pablo Gimeno, Ignacio Viñals, Alfonso Ortega, Antonio Miguel and Eduardo Lleida

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:5

Content type: Research Published on: 5 March 2020
- View Full Text
- View PDF
Binaural sound localization based on deep neural network and affinity propagation clustering in mismatched HRTF condition

Binaural sound source localization is an important and widely used perceptually based method and it has been applied to machine learning studies by many researchers based on head-related transfer function (HRT...

Authors: Jing Wang, Jin Wang, Kai Qian, Xiang Xie and Jingming Kuang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:4

Content type: Research Published on: 10 February 2020
- View Full Text
- View PDF
Segment boundary detection directed attention for online end-to-end speech recognition

Attention-based encoder-decoder models have recently shown competitive performance for automatic speech recognition (ASR) compared to conventional ASR systems. However, how to employ attention models for onlin...

Authors: Junfeng Hou, Wu Guo, Yan Song and Li-Rong Dai

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:3

Content type: Research Published on: 30 January 2020
- View Full Text
- View PDF
The aerodynamics of voiced stop closures

Experimental data combining complementary measures based on the oral airflow signal is presented in this paper, exploring the view that European Portuguese voiced stops are produced in a similar fashion to Ger...

Authors: Luis M. T. Jesus and Maria Conceição Costa

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:2

Content type: Research Published on: 28 January 2020
- View Full Text
- View PDF
Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network

In this paper, we use empirical mode decomposition and Hurst-based mode selection (EMDH) along with deep learning architecture using a convolutional neural network (CNN) to improve the recognition of dysarthri...

Authors: Mohammed Sidi Yakoub, Sid-ahmed Selouani, Brahim-Fares Zaidi and Asma Bouchair

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:1

Content type: Research Published on: 13 January 2020
- View Full Text
- View PDF
Unsupervised adaptation of PLDA models for broadcast diarization

We present a novel model adaptation approach to deal with data variability for speaker diarization in a broadcast environment. Expensive human annotated data can be used to mitigate the domain mismatch by mean...

Authors: Ignacio Viñals, Alfonso Ortega, Jesús Villalba, Antonio Miguel and Eduardo Lleida

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:24

Content type: Research Published on: 27 December 2019
- View Full Text
- View PDF
Online/offline score informed music signal decomposition: application to minus one

In this paper, we propose a score-informed source separation framework based on non-negative matrix factorization (NMF) and dynamic time warping (DTW) that suits for both offline and online systems. The propos...

Authors: Antonio Jesús Munoz-Montoro, Julio José Carabias-Orti, Pedro Vera-Candeas, Francisco Jesús Canadas-Quesada and Nicolás Ruiz-Reyes

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:23

Content type: Research Published on: 23 December 2019
- View Full Text
- View PDF
A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept

Text-to-speech (TTS) synthesis systems have been widely used in general-purpose applications based on the generation of speech. Nonetheless, there are some domains, such as storytelling or voice output aid dev...

Authors: Marc Freixes, Francesc Alías and Joan Claudi Socoró

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:22

Content type: Research Published on: 16 December 2019
- View Full Text
- View PDF
Signal enhancement for communication systems used by fire fighters

So-called full-face masks are essential for fire fighters to ensure respiratory protection in smoke diving incidents. While such masks are absolutely necessary for protection purposes on one hand, they impair the...

Authors: Michael Brodersen, Achim Volmer and Gerhard Schmidt

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:21

Content type: Research Published on: 12 December 2019
- View Full Text
- View PDF
Speech enhancement methods based on binaural cue coding

According to the encoding and decoding mechanism of binaural cue coding (BCC), in this paper, the speech and noise are considered as left channel signal and right channel signal of the BCC framework, respectiv...

Authors: Xianyun Wang and Changchun Bao

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:20

Content type: Research Published on: 11 December 2019
- View Full Text
- View PDF
Introducing phonetic information to speaker embedding for speaker verification

Phonetic information is one of the most essential components of a speech signal, playing an important role for many speech processing tasks. However, it is difficult to integrate phonetic information into spea...

Authors: Yi Liu, Liang He, Jia Liu and Michael T. Johnson

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:19

Content type: Research Published on: 5 December 2019
- View Full Text
- View PDF
A new joint CTC-attention-based speech recognition model with multi-level multi-head attention

A method called joint connectionist temporal classification (CTC)-attention-based speech recognition has recently received increasing focus and has achieved impressive performance. A hybrid end-to-end architec...

Authors: Chu-Xiong Qin, Wen-Lin Zhang and Dan Qu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:18

Content type: Research Published on: 28 October 2019
- View Full Text
- View PDF

How was your experience today?

Rating Please select one rating

Awful

Bad

Good

Great

Thank you for your feedback.

Tell us why (opens in a new tab)

Articles

Follow

Affiliated with

Annual Journal Metrics

Funding your APC

​​​​​​​​​