Articles
Page 3 of 11
-
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:24
-
Comparison of semi-supervised deep learning algorithms for audio classification
In this article, we adapted five recent SSL methods to the task of audio classification. The first two methods, namely Deep Co-Training (DCT) and Mean Teacher (MT), involve two collaborative neural networks. T...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:23 -
A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence
In this paper, we propose a supervised single-channel speech enhancement method that combines Kullback-Leibler (KL) divergence-based non-negative matrix factorization (NMF) and a hidden Markov model (NMF-HMM)....
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:22 -
A large TV dataset for speech and music activity detection
Automatic speech and music activity detection (SMAD) is an enabling task that can help segment, index, and pre-process audio content in radio broadcast and TV programs. However, due to copyright concerns and t...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:21 -
Black-box adversarial attacks through speech distortion for speech emotion recognition
Speech emotion recognition is a key branch of affective computing. Nowadays, it is common to detect emotional diseases through speech emotion recognition. Various detection methods of emotion recognition, such...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:20 -
Deep neural networks for automatic speech processing: a survey from large corpora to limited data
Most state-of-the-art speech systems use deep neural networks (DNNs). These systems require a large amount of data to be learned. Hence, training state-of-the-art frameworks on under-resourced speech challenge...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:19 -
PlugSonic: a web- and mobile-based platform for dynamic and navigable binaural audio
PlugSonic is a series of web- and mobile-based applications designed to edit samples and apply audio effects (PlugSonic Sample) and create and experience dynamic and navigable soundscapes and sonic narratives ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:18 -
Masked multi-center angular margin loss for language recognition
Language recognition based on embedding aims to maximize inter-class variance and minimize intra-class variance. Previous researches are limited to the training constraint of a single centroid, which cannot ac...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:17 -
DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models
By means of spatial clustering and time-frequency masking, a mixture of multiple speakers and noise can be separated into the underlying signal components. The parameters of a model, such as a complex angular ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:16 -
Robust single- and multi-loudspeaker least-squares-based equalization for hearing devices
To improve the sound quality of hearing devices, equalization filters can be used to achieve acoustic transparency, i.e., listening with the device in the ear is perceptually similar to the open ear. The equal...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:15 -
Language agnostic missing subtitle detection
Subtitles are a crucial component of Digital Entertainment Content (DEC such as movies and TV shows) localization. With ever increasing catalog (≈ 2M titles) and localization expansion (30+ languages), automat...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:14 -
Data-based spatial audio processing
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:13 -
Improving sign-algorithm convergence rate using natural gradient for lossless audio compression
In lossless audio compression, the predictive residuals must remain sparse when entropy coding is applied. The sign algorithm (SA) is a conventional method for minimizing the magnitudes of residuals; however, ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:12 -
Transformer-based ensemble method for multiple predominant instruments recognition in polyphonic music
Multiple predominant instrument recognition in polyphonic music is addressed using decision level fusion of three transformer-based architectures on an ensemble of visual representations. The ensemble consists...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:11 -
An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction
The domain of spatial audio comprises methods for capturing, processing, and reproducing audio content that contains spatial information. Data-based methods are those that operate directly on the spatial infor...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:10 -
Interaural time difference individualization in HRTF by scaling through anthropometric parameters
Head-related transfer function (HRTF) individualization can improve the perception of binaural sound. The interaural time difference (ITD) of the HRTF is a relevant cue for sound localization, especially in az...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:9 -
Paralinguistic singing attribute recognition using supervised machine learning for describing the classical tenor solo singing voice in vocal pedagogy
Humans can recognize someone’s identity through their voice and describe the timbral phenomena of voices. Likewise, the singing voice also has timbral phenomena. In vocal pedagogy, vocal teachers listen and th...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:8 -
Estimation of playable piano fingering by pitch-difference fingering match model
Most existing statistical models used to predict piano fingering apply explicit constraints among fingers and between fingers and notes; however, they disregard the relationship among notes. Furthermore, the s...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:7 -
On the selection of the number of beamformers in beamforming-based binaural reproduction
In recent years, spatial audio reproduction has been widely researched with many studies focusing on headphone-based spatial reproduction. A popular format for spatial audio is higher order Ambisonics (HOA), w...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:6 -
Improved capsule routing for weakly labeled sound event detection
Polyphonic sound event detection aims to detect the types of sound events that occur in given audio clips, and their onset and offset times, in which multiple sound events may occur simultaneously. Deep learni...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:5 -
RPCA-DRNN technique for monaural singing voice separation
In this study, we propose a methodology for separating a singing voice from musical accompaniment in a monaural musical mixture. The proposed method uses robust principal component analysis (RPCA), followed by...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:4 -
Automatic discrimination between front and back ensemble locations in HRTF-convolved binaural recordings of music
One of the greatest challenges in the development of binaural machine audition systems is the disambiguation between front and back audio sources, particularly in complex spatial audio scenes. The goal of this...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:3 -
Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling
Conventional automatic speech recognition (ASR) and emerging end-to-end (E2E) speech recognition have achieved promising results after being provided with sufficient resources. However, for low-resource langua...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:2 -
Auxiliary function-based algorithm for blind extraction of a moving speaker
In this paper, we propose a novel algorithm for blind source extraction (BSE) of a moving acoustic source recorded by multiple microphones. The algorithm is based on independent vector extraction (IVE) where t...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2022 2022:1 -
Anchor voiceprint recognition in live streaming via RawNet-SA and gated recurrent unit
With the sharp booming of online live streaming platforms, some anchors seek profits and accumulate popularity by mixing inappropriate content into live programs. After being blacklisted, these anchors even fo...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:45 -
Unsupervised domain adaptation for lip reading based on cross-modal knowledge distillation
We present an unsupervised domain adaptation (UDA) method for a lip-reading model that is an image-based speech recognition model. Most of conventional UDA methods cannot be applied when the adaptation data co...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:44 -
A recursive expectation-maximization algorithm for speaker tracking and separation
The problem of blind and online speaker localization and separation using multiple microphones is addressed based on the recursive expectation-maximization (REM) procedure. A two-stage REM-based algorithm is p...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:43 -
Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
Deep learning techniques are currently being applied in automated text-to-speech (TTS) systems, resulting in significant improvements in performance. However, these methods require large amounts of text-speech...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:42 -
Spherical harmonic covariance and magnitude function encodings for beamformer design
Microphone and speaker array designs have increasingly diverged from simple topologies due to diversity of physical host geometries and use cases. Effective beamformer design must now account for variation in ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:41 -
U2-VC: one-shot voice conversion using two-level nested U-structure
Voice conversion is to transform a source speaker to the target one, while keeping the linguistic content unchanged. Recently, one-shot voice conversion gradually becomes a hot topic for its potentially wide r...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:40 -
dEchorate: a calibrated room impulse response dataset for echo-aware signal processing
This paper presents a new dataset of measured multichannel room impulse responses (RIRs) named dEchorate. It includes annotations of early echo timings and 3D positions of microphones, real sources, and image ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:39 -
A multichannel learning-based approach for sound source separation in reverberant environments
In this paper, a multichannel learning-based network is proposed for sound source separation in reverberant field. The network can be divided into two parts according to the training strategies. In the first s...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:38 -
Efficient binaural rendering of spherical microphone array data by linear filtering
High-quality rendering of spatial sound fields in real-time is becoming increasingly important with the steadily growing interest in virtual and augmented reality technologies. Typically, a spherical microphon...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:37 -
Comparative evaluation of interpolation methods for the directivity of musical instruments
Measurements of the directivity of acoustic sound sources must be interpolated in almost all cases, either for spatial upsampling to higher resolution representations of the data, for spatial resampling to ano...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:36 -
Nonlinear residual echo suppression based on dual-stream DPRNN
The acoustic echo cannot be entirely removed by linear adaptive filters due to the nonlinear relationship between the echo and the far-end signal. Usually, a post-processing module is required to further suppr...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:35 -
Pronunciation augmentation for Mandarin-English code-switching speech recognition
Code-switching (CS) refers to the phenomenon of using more than one language in an utterance, and it presents great challenge to automatic speech recognition (ASR) due to the code-switching property in one utt...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:34 -
An online algorithm for echo cancellation, dereverberation and noise reduction based on a Kalman-EM Method
Many modern smart devices are equipped with a microphone array and a loudspeaker (or are able to connect to one). Acoustic echo cancellation algorithms, specifically their multi-microphone variants, are essent...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:33 -
A noise PSD estimation algorithm using derivative-based high-pass filter in non-stationary noise conditions
The minimum mean-square error (MMSE)-based noise PSD estimators have been used widely for speech enhancement. However, the MMSE noise PSD estimators assume that the noise signal changes at a slower rate than t...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:32 -
Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition
The performance of speech recognition systems trained with neutral utterances degrades significantly when these systems are tested with emotional speech. Since everybody can speak emotionally in the real-world...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:31 -
Musical note onset detection based on a spectral sparsity measure
If music is the language of the universe, musical note onsets may be the syllables for this language. Not only do note onsets define the temporal pattern of a musical piece, but their time-frequency characteri...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:30 -
Single-channel speech enhancement based on joint constrained dictionary learning
To improve the performance of speech enhancement in a complex noise environment, a joint constrained dictionary learning method for single-channel speech enhancement is proposed, which solves the “cross projec...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:29 -
Performance vs. hardware requirements in state-of-the-art automatic speech recognition
The last decade brought significant advances in automatic speech recognition (ASR) thanks to the evolution of deep learning methods. ASR systems evolved from pipeline-based systems, that modeled hand-crafted s...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:28 -
Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system
Many end-to-end approaches have been proposed to detect predefined keywords. For scenarios of multi-keywords, there are still two bottlenecks that need to be resolved: (1) the distribution of important data th...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:27 -
Adversarial joint training with self-attention mechanism for robust end-to-end speech recognition
Lately, the self-attention mechanism has marked a new milestone in the field of automatic speech recognition (ASR). Nevertheless, its performance is susceptible to environmental intrusions as the system predic...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:26 -
Geometry calibration in wireless acoustic sensor networks utilizing DoA and distance information
Due to the ad hoc nature of wireless acoustic sensor networks, the position of the sensor nodes is typically unknown. This contribution proposes a technique to estimate the position and orientation of the sens...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:25 -
Components loss for neural networks in mask-based speech enhancement
Estimating time-frequency domain masks for single-channel speech enhancement using deep learning methods has recently become a popular research field with promising results. In this paper, we propose a novel comp...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:24 -
Multi-source localization by using offset residual weight
Multiple sound source localization is a hot issue of concern in recent years. The Single Source Zone (SSZ) based localization methods achieve good performance due to the detection and utilization of the Time-F...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:23 -
Feature compensation based on independent noise estimation for robust speech recognition
In this paper, we propose a novel feature compensation algorithm based on independent noise estimation, which employs a Gaussian mixture model (GMM) with fewer Gaussian components to rapidly estimate the noise...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:22 -
Residual feedback suppression with extended model-based postfilters
When designing closed-loop electro-acoustic systems, which can commonly be found in hearing aids or public address systems, the most challenging task is canceling and/or suppressing the feedback caused by the ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:21 -
Neural network-based non-intrusive speech quality assessment using attention pooling function
Recently, the non-intrusive speech quality assessment method has attracted a lot of attention since it does not require the original reference signals. At the same time, neural networks began to be applied to ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2021 2021:20
Follow
Who reads the journal?
Learn more about the impact the EURASIP Journal on Audio, Speech, and Music Processing has worldwide
Affiliated with
Annual Journal Metrics
-
Citation Impact 2023
Journal Impact Factor: 1.7
5-year Journal Impact Factor: 1.6
Source Normalized Impact per Paper (SNIP): 1.051
SCImago Journal Rank (SJR): 0.414Speed 2023
Submission to first editorial decision (median days): 17
Submission to acceptance (median days): 154Usage 2023
Downloads: 368,607
Altmetric mentions: 70
Funding your APC
- ISSN: 1687-4722 (electronic)