The self-taught vocal interface
Speech technology is firmly rooted in daily life, most notably in command-and-control (C&C) applications. C&C usability downgrades quickly, however, when used by people with non-standard speech. We pursue a fu...
Page 7 of 11
Speech technology is firmly rooted in daily life, most notably in command-and-control (C&C) applications. C&C usability downgrades quickly, however, when used by people with non-standard speech. We pursue a fu...
In this paper, we propose a wideband (WB) to super-wideband audio bandwidth extension (BWE) method based on temporal smoothing cepstral coefficients (TSCC). A temporal relationship of audio signals is included...
Although the field of automatic speaker or speech recognition has been extensively studied over the past decades, the lack of robustness has remained a major challenge. The missing data technique (MDT) is a pr...
The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equiv...
Building a voice-operated system for learning disabled users is a difficult task that requires a considerable amount of time and effort. Due to the wide spectrum of disabilities and their different related pho...
Feature-based vocoders, e.g., STRAIGHT, offer a way to manipulate the perceived characteristics of the speech signal in speech transformation and synthesis. For the harmonic model, which provide excellent perc...
In this paper, we propose a semi-blind, imperceptible, and robust digital audio watermarking algorithm. The proposed algorithm is based on cascading two well-known transforms: the discrete wavelet transform an...
Model-based speech enhancement algorithms that employ trained models, such as codebooks, hidden Markov models, Gaussian mixture models, etc., containing representations of speech such as linear predictive coef...
The task of automatic retrieval and extraction of lyrics from the web is of great importance to different Music Information Retrieval applications. However, despite its importance, very little research has bee...
This paper studies a novel audio segmentation-by-classification approach based on factor analysis. The proposed technique compensates the within-class variability by using class-dependent factor loading matric...
This paper proposes a new speech enhancement (SE) algorithm utilizing constraints to the Wiener gain function which is capable of working at 10 dB and lower signal-to-noise ratios (SNRs). The wavelet threshold...
The current paper examines influences of speech rate on Fujisaki model parameters based on read speech from the BonnTempo-Corpus containing productions by 12 native speakers of German at five different intende...
In many speech communication applications, robust localization and tracking of multiple speakers in noisy and reverberant environments are of major importance. Several algorithms to tackle this problem have be...
In a bid to enhance the search performance, this paper presents an improved version of reduced candidate mechanism (RCM), an algebraic codebook search conducted on an algebraic code-excited linear prediction (...
In this paper, we advocate the use of the uncompressed form of i-vector and depend on subspace modeling using probabilistic linear discriminant analysis (PLDA) in handling the speaker and session (or channel) var...
This paper proposes two novel approaches for parameter estimation of a superpositional intonation model. These approaches present linguistic and paralinguistic assumptions for initializing a pre-existing stand...
In this paper, unsupervised learning is used to separate percussive and harmonic sounds from monaural non-vocal polyphonic signals. Our algorithm is based on a modified non-negative matrix factorization (NMF) ...
Composers may not provide instructions for playing their works, especially for instrument solos, and therefore, different musicians may give very different interpretations of the same work. Such differences us...
To achieve a better trade-off between the vector dimension and the memory requirements of a vector quantizer (VQ), an entropy-constrained VQ (ECVQ) scheme with finite memory, called finite-state ECVQ (FS-ECVQ)...
This paper investigates the estimation of underlying articulatory targets of Thai vowels as invariant representation of vocal tract shapes by means of analysis-by-synthesis based on acoustic data. The basic id...
The paper describes an auditory processing-based feature extraction strategy for robust speech recognition in environments, where conventional automatic speech recognition (ASR) approaches are not successful. ...
In this paper, a two-stage scheme is proposed to deal with the difficult problem of acoustic echo cancellation (AEC) in single-channel scenario in the presence of noise. In order to overcome the major challeng...
Neural network language models (NNLM) have been proved to be quite powerful for sequence modeling, including feed-forward NNLM (FNNLM), recurrent NNLM (RNNLM), etc. One main issue concerned for NNLM is the hea...
When several acoustic sources are simultaneously active in a meeting room scenario, and both the position of the sources and the identity of the time-overlapped sound classes have been estimated, the problem o...
Speech enhancement has an increasing demand in mobile communications and faces a great challenge in a real ambient noisy environment. This paper develops an effective spatial-frequency domain speech enhancemen...
It was recently shown that delta-sigma quantization (DSQ) can be used for optimal multiple description (MD) coding of Gaussian sources. The DSQ scheme combined oversampling, prediction, and noise-shaping in or...
Previously, a dereverberation method based on generalized spectral subtraction (GSS) using multi-channel least mean-squares (MCLMS) has been proposed. The results of speech recognition experiments showed that ...
The robustness of n-gram language models depends on the quality of text data on which they have been trained. The text corpora collected from various resources such as web pages or electronic documents are charac...
We present a feature enhancement method that uses neural networks (NNs) to map the reverberant feature in a log-melspectral domain to its corresponding anechoic feature. The mapping is done by cascade NNs trai...
Decision tree-clustered context-dependent hidden semi-Markov models (HSMMs) are typically used in statistical parametric speech synthesis to represent probability densities of acoustic features given contextua...
Eigenphone-based speaker adaptation outperforms conventional maximum likelihood linear regression (MLLR) and eigenvoice methods when there is sufficient adaptation data. However, it suffers from severe over-fi...
Three-dimensional (3D) audio technologies are booming with the success of 3D video technology. The surge in audio channels makes its huge data unacceptable for transmitting bandwidth and storage media, and the...
An approach is proposed for creating location-specific audio textures for virtual location-exploration services. The presented approach creates audio textures by processing a small amount of audio recorded at ...
In this paper, an analytical approach to estimate the instantaneous frequencies of a multicomponent signal is presented. A non-stationary signal composed of oscillation modes or resonances is described by a mu...
This paper presents an optical music recognition (OMR) system to process the handwritten musical scores of Kunqu Opera written in Gong-Che Notation (GCN). First, it introduces the background of Kunqu Opera and GC...
Speech production and speech perception studies were conducted to compare (de)voicing in the Romance languages European Portuguese (EP) and Italian. For the speech production part, velar stops in two positions...
We present in this paper a voice conversion (VC) method for a person with an articulation disorder resulting from athetoid cerebral palsy. The movement of such speakers is limited by their athetoid symptoms, a...
We propose a novel approach of integrating exemplar-based template matching with statistical modeling to improve continuous speech recognition. We choose the template unit to be context-dependent phone segment...
This paper proposes a new aliasing cancelation algorithm for the transition between non-aliased coding and transform coding with time domain aliasing cancelation (TDAC). It is effectively utilized for unified ...
We propose an integrative method of recognizing gestures such as pointing, accompanying speech. Speech generated simultaneously with gestures can assist in the recognition of gestures, and since this occurs in...
Bandwidth extension is an effective technique for enhancing the quality of audio signals by reconstructing their high-frequency components. In this paper, a novel blind bandwidth extension method is proposed b...
Prosody and prosodic boundaries carry significant information regarding linguistics and paralinguistics and are important aspects of speech. In the field of prosodic event detection, many local acoustic featur...
In this paper, we propose a novel noise-robustness method known as weighted sub-band histogram equalization (WS-HEQ) to improve speech recognition accuracy in noise-corrupted environments. Considering the obse...
The framework of voice conversion system is expected to emphasize both the static and dynamic characteristics of the speech signal. The conventional approaches like Mel frequency cepstrum coefficients and line...
This paper investigates real-time N-dimensional wideband sound source localization in outdoor (far-field) and low-degree reverberation cases, using a simple N-microphone arrangement. Outdoor sound source localiza...
Affective computing, especially from speech, is one of the key steps toward building more natural and effective human-machine interaction. In recent years, several emotional speech corpora in different languag...
The performance of thresholding-based methods for speech enhancement largely depends upon the estimation of the exact threshold value. In this paper, a new thresholding-based speech enhancement approach, where...
This paper investigates multi-modal aspects of audiovisual quality assessment for interactive communication services. It shows how perceived auditory and visual qualities integrate to an overall audiovisual qu...
Query-by-Example Spoken Term Detection (QbE STD) aims at retrieving data from a speech data repository given an acoustic query containing the term of interest as input. Nowadays, it has been receiving much int...
The recurrent neural network language model (RNNLM) has shown significant promise for statistical language modeling. In this work, a new class-based output layer method is introduced to further improve the RNN...
Learn more about the impact the EURASIP Journal on Audio, Speech, and Music Processing has worldwide
2022 Citation Impact
2.4 - 2-year Impact Factor
2.0 - 5-year Impact Factor
1.081 - SNIP (Source Normalized Impact per Paper)
0.458 - SJR (SCImago Journal Rank)
2023 Speed
17 days submission to first editorial decision for all manuscripts (Median)
154 days submission to accept (Median)
2023 Usage
368,607 downloads
70 Altmetric mentions