Articles
Page 7 of 12
-
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:30
-
Physical task stress and speaker variability in voice quality
The presence of physical task stress induces changes in the speech production system which in turn produces changes in speaking behavior. This results in measurable acoustic correlates including changes to for...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:29 -
Speech enhancement based on Bayesian decision and spectral amplitude estimation
In this paper, a single-channel speech enhancement method based on Bayesian decision and spectral amplitude estimation is proposed, in which the speech detection module and spectral amplitude estimation module...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:28 -
Biomimetic spectro-temporal features for music instrument recognition in isolated notes and solo phrases
The identity of musical instruments is reflected in the acoustic attributes of musical notes played with them. Recently, it has been argued that these characteristics of musical identity (or timbre) can be bes...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:27 -
Exploiting spectro-temporal locality in deep learning based acoustic event detection
In recent years, deep learning has not only permeated the computer vision and speech recognition research fields but also fields such as acoustic event detection (AED). One of the aims of AED is to detect and ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:26 -
Phone recognition with hierarchical convolutional deep maxout networks
Deep convolutional neural networks (CNNs) have recently been shown to outperform fully connected deep neural networks (DNNs) both on low-resource and on large-scale speech tasks. Experiments indicate that conv...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:25 -
Multimodal voice conversion based on non-negative matrix factorization
A multimodal voice conversion (VC) method for noisy environments is proposed. In our previous non-negative matrix factorization (NMF)-based VC method, source and target exemplars are extracted from parallel tr...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:24 -
The Latin Music Mood Database
In this paper we present the Latin Music Mood Database, an extension of the Latin Music Database but for the task of music mood/emotion classification. The method for assigning mood labels to the musical recor...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:23 -
Regularized minimum class variance extreme learning machine for language recognition
Support vector machines (SVMs) have played an important role in the state-of-the-art language recognition systems. The recently developed extreme learning machine (ELM) tends to have better scalability and ach...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:22 -
Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion
Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia inf...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:21 -
Advanced acoustic modelling techniques in MP3 speech recognition
The automatic recognition of MP3 compressed speech presents a challenge to the current systems due to the lossy nature of compression which causes irreversible degradation of the speech wave. This article eval...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:20 -
Emotion in the singing voice—a deeperlook at acoustic features in the light ofautomatic classification
We investigate the automatic recognition of emotions in the singing voice and study the worth and role of a variety of relevant acoustic parameters. The data set contains phrases and vocalises sung by eight re...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:19 -
An improved i-vector extraction algorithm for speaker verification
Over recent years, i-vector-based framework has been proven to provide state-of-the-art performance in speaker verification. Each utterance is projected onto a total factor space and is represented by a low-di...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:18 -
Exploiting foreign resources for DNN-based ASR
Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specifi...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:17 -
Singer identification using perceptual features and cepstral coefficients of an audio signal from Indian video songs
Singer identification is a difficult topic in music information retrieval because background instrumental music is included with singing voice which reduces performance of a system. One of the main disadvantag...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:16 -
Stereo-based histogram equalization for robust speech recognition
Optimal automatic speech recognition (ASR) takes place when the recognition system is tested under circumstances identical to those in which it was trained. However, in the actual real world, there exist many ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:15 -
Robust design of Farrow-structure-based steerable broadband beamformers with sparse tap weights via convex optimization
The Farrow-structure-based steerable broadband beamformer (FSBB) is particularly useful in the applications where sound source of interest may move around a wide angular range. However, in contrast with conven...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:14 -
ViSQOL: an objective speech quality model
This paper presents an objective speech quality model, ViSQOL, the Virtual Speech Quality Objective Listener. It is a signal-based, full-reference, intrusive metric that models human speech quality perception ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:13 -
Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification
Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:12 -
Lightweight multi-DOA tracking of mobile speech sources
Estimating the directions of arrival (DOAs) of multiple simultaneous mobile sound sources is an important step for various audio signal processing applications. In this contribution, we present an approach tha...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:11 -
An acoustic data transmission system based on audio data hiding: method and performance evaluation
Acoustic data transmission (ADT) forms a branch of the audio data hiding techniques with its capability of communicating data in short-range aerial space between a loudspeaker and a microphone. In this paper, ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:10 -
Evaluation of linguistic and prosodic features for detection of Alzheimer’s disease in Turkish conversational speech
Automatic diagnosis and monitoring of Alzheimer’s disease can have a significant impact on society as well as the well-being of patients. The part of the brain cortex that processes language abilities is one o...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:9 -
Voice conversion using speaker-dependent conditional restricted Boltzmann machine
This paper presents a voice conversion (VC) method that utilizes conditional restricted Boltzmann machines (CRBMs) for each speaker to obtain high-order speaker-independent spaces where voice features are conv...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:8 -
An investigation of supervector regression for forensic voice comparison on small data
Automatic forensic voice comparison (FVC) systems employed in forensic casework have often relied on Gaussian Mixture Model - Universal Background Models (GMM-UBMs) for modelling with relatively little researc...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:7 -
SIFT-based local spectrogram image descriptor: a novel feature for robust music identification
Music identification via audio fingerprinting has been an active research field in recent years. In the real-world environment, music queries are often deformed by various interferences which typically include...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:6 -
A signal subspace approach to spatio-temporal prediction for multichannel speech enhancement
The spatio-temporal-prediction (STP) method for multichannel speech enhancement has recently been proposed. This approach makes it theoretically possible to attenuate the residual noise without distorting spee...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:5 -
Within and cross-corpus speech emotion recognition using latent topic model-based features
Owing to the suprasegmental behavior of emotional speech, turn-level features have demonstrated a better success than frame-level features for recognition-related tasks. Conventionally, such features are obtai...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:4 -
A novel hybrid of genetic algorithm and ANN for developing a high efficient method for vocal fold pathology diagnosis
In this paper, an initial feature vector based on the combination of the wavelet packet decomposition (WPD) and the Mel frequency cepstral coefficients (MFCCs) is proposed. For optimizing the initial feature v...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:3 -
Noisy training for deep neural networks in speech recognition
Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:2 -
Simulation of tremulous voices using a biomechanical model
Vocal tremor has been simulated using a high-dimensional discrete vocal fold model. Specifically, respiratory, phonatory, and articulatory tremors have been modeled as instabilities in six parameters of the mo...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:1 -
Homogenous ensemble phonotactic language recognition based on SVM supervector reconstruction
Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the r...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:42 -
The self-taught vocal interface
Speech technology is firmly rooted in daily life, most notably in command-and-control (C&C) applications. C&C usability downgrades quickly, however, when used by people with non-standard speech. We pursue a fu...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:43 -
Audio bandwidth extension based on temporal smoothing cepstral coefficients
In this paper, we propose a wideband (WB) to super-wideband audio bandwidth extension (BWE) method based on temporal smoothing cepstral coefficients (TSCC). A temporal relationship of audio signals is included...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:41 -
A sub-band-based feature reconstruction approach for robust speaker recognition
Although the field of automatic speaker or speech recognition has been extensively studied over the past decades, the lack of robustness has remained a major challenge. The missing data technique (MDT) is a pr...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:40 -
Sparse coding of the modulation spectrum for noise-robust automatic speech recognition
The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equiv...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:36 -
A cross-lingual adaptation approach for rapid development of speech recognizers for learning disabled users
Building a voice-operated system for learning disabled users is a difficult task that requires a considerable amount of time and effort. Due to the wide spectrum of disabilities and their different related pho...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:39 -
A uniform phase representation for the harmonic model in speech synthesis applications
Feature-based vocoders, e.g., STRAIGHT, offer a way to manipulate the perceived characteristics of the speech signal in speech transformation and synthesis. For the harmonic model, which provide excellent perc...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:38 -
An imperceptible and robust audio watermarking algorithm
In this paper, we propose a semi-blind, imperceptible, and robust digital audio watermarking algorithm. The proposed algorithm is based on cascading two well-known transforms: the discrete wavelet transform an...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:37 -
Robust Bayesian estimation for context-based speech enhancement
Model-based speech enhancement algorithms that employ trained models, such as codebooks, hidden Markov models, Gaussian mixture models, etc., containing representations of speech such as linear predictive coef...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:35 -
The Ethnic Lyrics Fetcher tool
The task of automatic retrieval and extraction of lyrics from the web is of great importance to different Music Information Retrieval applications. However, despite its importance, very little research has bee...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:27 -
Audio segmentation-by-classification approach based on factor analysis in broadcast news domain
This paper studies a novel audio segmentation-by-classification approach based on factor analysis. The proposed technique compensates the within-class variability by using class-dependent factor loading matric...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:34 -
A modified Wiener filtering method combined with wavelet thresholding multitaper spectrum for speech enhancement
This paper proposes a new speech enhancement (SE) algorithm utilizing constraints to the Wiener gain function which is capable of working at 10 dB and lower signal-to-noise ratios (SNRs). The wavelet threshold...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:32 -
The influence of speech rate on Fujisaki model parameters
The current paper examines influences of speech rate on Fujisaki model parameters based on read speech from the BonnTempo-Corpus containing productions by 12 native speakers of German at five different intende...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:33 -
Joint estimation of pitch and direction of arrival: improving robustness and accuracy for multi-speaker scenarios
In many speech communication applications, robust localization and tracking of multiple speakers in noisy and reverberant environments are of major importance. Several algorithms to tackle this problem have be...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:31 -
An efficient algebraic codebook search for ACELP speech coder
In a bid to enhance the search performance, this paper presents an improved version of reduced candidate mechanism (RCM), an algebraic codebook search conducted on an algebraic code-excited linear prediction (...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:30 -
PLDA in the i-supervector space for text-independent speaker verification
In this paper, we advocate the use of the uncompressed form of i-vector and depend on subspace modeling using probabilistic linear discriminant analysis (PLDA) in handling the speaker and session (or channel) var...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:29 -
Linguistically motivated parameter estimation methods for a superpositional intonation model
This paper proposes two novel approaches for parameter estimation of a superpositional intonation model. These approaches present linguistic and paralinguistic assumptions for initializing a pre-existing stand...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:28 -
Percussive/harmonic sound separation by non-negative matrix factorization with smoothness/sparseness constraints
In this paper, unsupervised learning is used to separate percussive and harmonic sounds from monaural non-vocal polyphonic signals. Our algorithm is based on a modified non-negative matrix factorization (NMF) ...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:26 -
Musical note analysis of solo violin recordings using recursive regularization
Composers may not provide instructions for playing their works, especially for instrument solos, and therefore, different musicians may give very different interpretations of the same work. Such differences us...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:25 -
A memory efficient finite-state source coding algorithm for audio MDCT coefficients
To achieve a better trade-off between the vector dimension and the memory requirements of a vector quantizer (VQ), an entropy-constrained VQ (ECVQ) scheme with finite memory, called finite-state ECVQ (FS-ECVQ)...
Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:22
Follow
Who reads the journal?
Learn more about the impact the EURASIP Journal on Audio, Speech, and Music Processing has worldwide
Affiliated with
Annual Journal Metrics
-
Citation Impact 2023
Journal Impact Factor: 1.7
5-year Journal Impact Factor: 1.6
Source Normalized Impact per Paper (SNIP): 1.051
SCImago Journal Rank (SJR): 0.414Speed 2023
Submission to first editorial decision (median days): 17
Submission to acceptance (median days): 154Usage 2023
Downloads: 368,607
Altmetric mentions: 70
Funding your APC
- ISSN: 1687-4722 (electronic)