Articles

Page 7 of 11

Singer identification using perceptual features and cepstral coefficients of an audio signal from Indian video songs

Singer identification is a difficult topic in music information retrieval because background instrumental music is included with singing voice which reduces performance of a system. One of the main disadvantag...

Authors: Tushar Ratanpara and Narendra Patel

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:16

Content type: Research Published on: 25 June 2015
- View Full Text
- View PDF
Stereo-based histogram equalization for robust speech recognition

Optimal automatic speech recognition (ASR) takes place when the recognition system is tested under circumstances identical to those in which it was trained. However, in the actual real world, there exist many ...

Authors: Randa Al-Wakeel, Mahmoud Shoman, Magdy Aboul-Ela and Sherif Abdou

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:15

Content type: Research Published on: 9 June 2015
- View Full Text
- View PDF
Robust design of Farrow-structure-based steerable broadband beamformers with sparse tap weights via convex optimization

The Farrow-structure-based steerable broadband beamformer (FSBB) is particularly useful in the applications where sound source of interest may move around a wide angular range. However, in contrast with conven...

Authors: Tiannan Wang and Huawei Chen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:14

Content type: Research Published on: 4 June 2015
- View Full Text
- View PDF
ViSQOL: an objective speech quality model

This paper presents an objective speech quality model, ViSQOL, the Virtual Speech Quality Objective Listener. It is a signal-based, full-reference, intrusive metric that models human speech quality perception ...

Authors: Andrew Hines, Jan Skoglund, Anil C Kokaram and Naomi Harte

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:13

Content type: Research Published on: 17 May 2015
- View Full Text
- View PDF
Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this...

Authors: Zhaofeng Zhang, Longbiao Wang, Atsuhiko Kai, Takanori Yamada, Weifeng Li and Masahiro Iwahashi

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:12

Content type: Research Published on: 12 May 2015
- View Full Text
- View PDF
Lightweight multi-DOA tracking of mobile speech sources

Estimating the directions of arrival (DOAs) of multiple simultaneous mobile sound sources is an important step for various audio signal processing applications. In this contribution, we present an approach tha...

Authors: Caleb Rascon, Gibran Fuentes and Ivan Meza

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:11

Content type: Research Published on: 7 May 2015
- View Full Text
- View PDF
An acoustic data transmission system based on audio data hiding: method and performance evaluation

Acoustic data transmission (ADT) forms a branch of the audio data hiding techniques with its capability of communicating data in short-range aerial space between a loudspeaker and a microphone. In this paper, ...

Authors: Kiho Cho, Jae Choi and Nam Soo Kim

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:10

Content type: Research Published on: 18 April 2015
- View Full Text
- View PDF
Evaluation of linguistic and prosodic features for detection of Alzheimer’s disease in Turkish conversational speech

Automatic diagnosis and monitoring of Alzheimer’s disease can have a significant impact on society as well as the well-being of patients. The part of the brain cortex that processes language abilities is one o...

Authors: Ali Khodabakhsh, Fatih Yesil, Ekrem Guner and Cenk Demiroglu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:9

Content type: Research Published on: 25 March 2015
- View Full Text
- View PDF
Voice conversion using speaker-dependent conditional restricted Boltzmann machine

This paper presents a voice conversion (VC) method that utilizes conditional restricted Boltzmann machines (CRBMs) for each speaker to obtain high-order speaker-independent spaces where voice features are conv...

Authors: Toru Nakashika, Tetsuya Takiguchi and Yasuo Ariki

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:8

Content type: Research Published on: 25 February 2015
- View Full Text
- View PDF
An investigation of supervector regression for forensic voice comparison on small data

Automatic forensic voice comparison (FVC) systems employed in forensic casework have often relied on Gaussian Mixture Model - Universal Background Models (GMM-UBMs) for modelling with relatively little researc...

Authors: Chee Cheun Huang, Julien Epps and Tharmarajah Thiruvaran

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:7

Content type: Research Published on: 24 February 2015
- View Full Text
- View PDF
SIFT-based local spectrogram image descriptor: a novel feature for robust music identification

Music identification via audio fingerprinting has been an active research field in recent years. In the real-world environment, music queries are often deformed by various interferences which typically include...

Authors: Xiu Zhang, Bilei Zhu, Linwei Li, Wei Li, Xiaoqiang Li, Wei Wang, Peizhong Lu and Wenqiang Zhang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:6

Content type: Research Published on: 12 February 2015
- View Full Text
- View PDF
A signal subspace approach to spatio-temporal prediction for multichannel speech enhancement

The spatio-temporal-prediction (STP) method for multichannel speech enhancement has recently been proposed. This approach makes it theoretically possible to attenuate the residual noise without distorting spee...

Authors: Adam Borowicz

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:5

Content type: Research Published on: 10 February 2015
- View Full Text
- View PDF
Within and cross-corpus speech emotion recognition using latent topic model-based features

Owing to the suprasegmental behavior of emotional speech, turn-level features have demonstrated a better success than frame-level features for recognition-related tasks. Conventionally, such features are obtai...

Authors: Mohit Shah, Chaitali Chakrabarti and Andreas Spanias

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:4

Content type: Research Published on: 25 January 2015
- View Full Text
- View PDF
A novel hybrid of genetic algorithm and ANN for developing a high efficient method for vocal fold pathology diagnosis

In this paper, an initial feature vector based on the combination of the wavelet packet decomposition (WPD) and the Mel frequency cepstral coefficients (MFCCs) is proposed. For optimizing the initial feature v...

Authors: Vahid Majidnezhad

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:3

Content type: Research Published on: 21 January 2015
- View Full Text
- View PDF
Noisy training for deep neural networks in speech recognition

Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however...

Authors: Shi Yin, Chao Liu, Zhiyong Zhang, Yiye Lin, Dong Wang, Javier Tejedor, Thomas Fang Zheng and Yinguo Li

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:2

Content type: Research Published on: 20 January 2015
- View Full Text
- View PDF
Simulation of tremulous voices using a biomechanical model

Vocal tremor has been simulated using a high-dimensional discrete vocal fold model. Specifically, respiratory, phonatory, and articulatory tremors have been modeled as instabilities in six parameters of the mo...

Authors: Rubén Fraile, Juan Ignacio Godino-Llorente and Malte Kob

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:1

Content type: Research Published on: 8 January 2015
- View Full Text
- View PDF
Homogenous ensemble phonotactic language recognition based on SVM supervector reconstruction

Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the r...

Authors: Wei-Wei Liu, Wei-Qiang Zhang, Michael T Johnson and Jia Liu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:42

Content type: Research Published on: 24 December 2014
- View Full Text
- View PDF
The self-taught vocal interface

Speech technology is firmly rooted in daily life, most notably in command-and-control (C&C) applications. C&C usability downgrades quickly, however, when used by people with non-standard speech. We pursue a fu...

Authors: Bart Ons, Jort F Gemmeke and Hugo Van hamme

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:43

Content type: Research Published on: 19 December 2014
- View Full Text
- View PDF
Audio bandwidth extension based on temporal smoothing cepstral coefficients

In this paper, we propose a wideband (WB) to super-wideband audio bandwidth extension (BWE) method based on temporal smoothing cepstral coefficients (TSCC). A temporal relationship of audio signals is included...

Authors: Xin Liu and Chang-Chun Bao

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:41

Content type: Research Published on: 25 November 2014
- View Full Text
- View PDF
A sub-band-based feature reconstruction approach for robust speaker recognition

Although the field of automatic speaker or speech recognition has been extensively studied over the past decades, the lack of robustness has remained a major challenge. The missing data technique (MDT) is a pr...

Authors: Furong Yan, Yanbin Zhang and Jiachang Yan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:40

Content type: Research Published on: 21 October 2014
- View Full Text
- View PDF
Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equiv...

Authors: Sara Ahmadi, Seyed Mohammad Ahadi, Bert Cranen and Lou Boves

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:36

Content type: Research Published on: 21 October 2014
- View Full Text
- View PDF
A cross-lingual adaptation approach for rapid development of speech recognizers for learning disabled users

Building a voice-operated system for learning disabled users is a difficult task that requires a considerable amount of time and effort. Due to the wide spectrum of disabilities and their different related pho...

Authors: Marek Bohac, Michaela Kucharova, Zoraida Callejas, Jan Nouza and Petr Červa

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:39

Content type: Research Published on: 18 October 2014
- View Full Text
- View PDF
A uniform phase representation for the harmonic model in speech synthesis applications

Feature-based vocoders, e.g., STRAIGHT, offer a way to manipulate the perceived characteristics of the speech signal in speech transformation and synthesis. For the harmonic model, which provide excellent perc...

Authors: Gilles Degottex and Daniel Erro

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:38

Content type: Research Published on: 16 October 2014
- View Full Text
- View PDF
An imperceptible and robust audio watermarking algorithm

In this paper, we propose a semi-blind, imperceptible, and robust digital audio watermarking algorithm. The proposed algorithm is based on cascading two well-known transforms: the discrete wavelet transform an...

Authors: Ali Al-Haj

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:37

Content type: Research Published on: 9 October 2014
- View Full Text
- View PDF
Robust Bayesian estimation for context-based speech enhancement

Model-based speech enhancement algorithms that employ trained models, such as codebooks, hidden Markov models, Gaussian mixture models, etc., containing representations of speech such as linear predictive coef...

Authors: Devireddy Hanumantha Rao Naidu and Sriram Srinivasan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:35

Content type: Research Published on: 12 September 2014
- View Full Text
- View PDF
The Ethnic Lyrics Fetcher tool

The task of automatic retrieval and extraction of lyrics from the web is of great importance to different Music Information Retrieval applications. However, despite its importance, very little research has bee...

Authors: Rafael P Ribeiro, Murilo AP Almeida and Carlos N Silla Jr

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:27

Content type: Research Published on: 4 September 2014
- View Full Text
- View PDF
Audio segmentation-by-classification approach based on factor analysis in broadcast news domain

This paper studies a novel audio segmentation-by-classification approach based on factor analysis. The proposed technique compensates the within-class variability by using class-dependent factor loading matric...

Authors: Diego Castán, Alfonso Ortega, Antonio Miguel and Eduardo Lleida

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:34

Content type: Research Published on: 28 August 2014
- View Full Text
- View PDF
A modified Wiener filtering method combined with wavelet thresholding multitaper spectrum for speech enhancement

This paper proposes a new speech enhancement (SE) algorithm utilizing constraints to the Wiener gain function which is capable of working at 10 dB and lower signal-to-noise ratios (SNRs). The wavelet threshold...

Authors: Yanna Ma and Akinori Nishihara

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:32

Content type: Research Published on: 27 August 2014
- View Full Text
- View PDF
The influence of speech rate on Fujisaki model parameters

The current paper examines influences of speech rate on Fujisaki model parameters based on read speech from the BonnTempo-Corpus containing productions by 12 native speakers of German at five different intende...

Authors: Hansjörg Mixdorff, Adrian Leemann and Volker Dellwo

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:33

Content type: Research Published on: 13 August 2014
- View Full Text
- View PDF
Joint estimation of pitch and direction of arrival: improving robustness and accuracy for multi-speaker scenarios

In many speech communication applications, robust localization and tracking of multiple speakers in noisy and reverberant environments are of major importance. Several algorithms to tackle this problem have be...

Authors: Stephan Gerlach, Jörg Bitzer, Stefan Goetze and Simon Doclo

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:31

Content type: Research Published on: 12 August 2014
- View Full Text
- View PDF
An efficient algebraic codebook search for ACELP speech coder

In a bid to enhance the search performance, this paper presents an improved version of reduced candidate mechanism (RCM), an algebraic codebook search conducted on an algebraic code-excited linear prediction (...

Authors: Ning-Yun Ku, Cheng-Yu Yeh and Shaw-Hwa Hwang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:30

Content type: Research Published on: 2 August 2014
- View Full Text
- View PDF
PLDA in the i-supervector space for text-independent speaker verification

In this paper, we advocate the use of the uncompressed form of i-vector and depend on subspace modeling using probabilistic linear discriminant analysis (PLDA) in handling the speaker and session (or channel) var...

Authors: Ye Jiang, Kong Aik Lee and Longbiao Wang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:29

Content type: Research Published on: 15 July 2014
- View Full Text
- View PDF
Linguistically motivated parameter estimation methods for a superpositional intonation model

This paper proposes two novel approaches for parameter estimation of a superpositional intonation model. These approaches present linguistic and paralinguistic assumptions for initializing a pre-existing stand...

Authors: Humberto M Torres, Jorge A Gurlekian, Hansjörg Mixdorff and Hartmut Pfitzinger

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:28

Content type: Research Published on: 15 July 2014
- View Full Text
- View PDF
Percussive/harmonic sound separation by non-negative matrix factorization with smoothness/sparseness constraints

In this paper, unsupervised learning is used to separate percussive and harmonic sounds from monaural non-vocal polyphonic signals. Our algorithm is based on a modified non-negative matrix factorization (NMF) ...

Authors: Francisco Jesus Canadas-Quesada, Pedro Vera-Candeas, Nicolas Ruiz-Reyes, Julio Carabias-Orti and Pablo Cabanas-Molero

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:26

Content type: Research Published on: 11 July 2014
- View Full Text
- View PDF
Musical note analysis of solo violin recordings using recursive regularization

Composers may not provide instructions for playing their works, especially for instrument solos, and therefore, different musicians may give very different interpretations of the same work. Such differences us...

Authors: Yi-Ju Lin, Tien-Ming Wang, Ta-Chun Chen, Yin-Lin Chen, Wei-Chen Chang and Alvin WY Su

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:25

Content type: Research Published on: 14 June 2014
- View Full Text
- View PDF
A memory efficient finite-state source coding algorithm for audio MDCT coefficients

To achieve a better trade-off between the vector dimension and the memory requirements of a vector quantizer (VQ), an entropy-constrained VQ (ECVQ) scheme with finite memory, called finite-state ECVQ (FS-ECVQ)...

Authors: Sumxin Jiang, Rendong Yin and Peilin Liu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:22

Content type: Research Published on: 12 May 2014
- View Full Text
- View PDF
Identifying underlying articulatory targets of Thai vowels from acoustic data based on an analysis-by-synthesis approach

This paper investigates the estimation of underlying articulatory targets of Thai vowels as invariant representation of vocal tract shapes by means of analysis-by-synthesis based on acoustic data. The basic id...

Authors: Santitham Prom-on, Peter Birkholz and Yi Xu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:23

Content type: Research Published on: 8 May 2014
- View Full Text
- View PDF
Auditory processing-based features for improving speech recognition in adverse acoustic conditions

The paper describes an auditory processing-based feature extraction strategy for robust speech recognition in environments, where conventional automatic speech recognition (ASR) approaches are not successful. ...

Authors: Hari Krishna Maganti and Marco Matassoni

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:21

Content type: Research Published on: 6 May 2014
- View Full Text
- View PDF
Single-channel acoustic echo cancellation in noise based on gradient-based adaptive filtering

In this paper, a two-stage scheme is proposed to deal with the difficult problem of acoustic echo cancellation (AEC) in single-channel scenario in the presence of noise. In order to overcome the major challeng...

Authors: Upal Mahbub, Shaikh Anowarul Fattah, Wei-Ping Zhu and M Omair Ahmad

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:20

Content type: Research Published on: 3 May 2014
- View Full Text
- View PDF
Empirically combining unnormalized NNLM and back-off N-gram for fast N-best rescoring in speech recognition

Neural network language models (NNLM) have been proved to be quite powerful for sequence modeling, including feed-forward NNLM (FNNLM), recurrent NNLM (RNNLM), etc. One main issue concerned for NNLM is the hea...

Authors: Yongzhe Shi, Wei-Qiang Zhang, Meng Cai and Jia Liu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:19

Content type: Research Published on: 28 April 2014
- View Full Text
- View PDF
Source ambiguity resolution of overlapped sounds in a multi-microphone room environment

When several acoustic sources are simultaneously active in a meeting room scenario, and both the position of the sources and the identity of the time-overlapped sound classes have been estimated, the problem o...

Authors: Rupayan Chakraborty, Climent Nadeu and Taras Butko

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:18

Content type: Research Published on: 28 April 2014
- View Full Text
- View PDF
Speech enhancement with an acoustic vector sensor: an effective adaptive beamforming and post-filtering approach

Speech enhancement has an increasing demand in mobile communications and faces a great challenge in a real ambient noisy environment. This paper develops an effective spatial-frequency domain speech enhancemen...

Authors: Yue Xian Zou, Peng Wang, Yong Qing Wang, Christian H Ritz and Jiangtao Xi

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:17

Content type: Research Published on: 27 April 2014
- View Full Text
- View PDF
Practical design of delta-sigma multiple description audio coding

It was recently shown that delta-sigma quantization (DSQ) can be used for optimal multiple description (MD) coding of Gaussian sources. The DSQ scheme combined oversampling, prediction, and noise-shaping in or...

Authors: Jack Leegaard, Jan Østergaard, Søren Holdt Jensen and Ram Zamir

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:16

Content type: Research Published on: 22 April 2014
- View Full Text
- View PDF
Distant-talking speaker identification by generalized spectral subtraction-based dereverberation and its efficient computation

Previously, a dereverberation method based on generalized spectral subtraction (GSS) using multi-channel least mean-squares (MCLMS) has been proposed. The results of speech recognition experiments showed that ...

Authors: Zhaofeng Zhang, Longbiao Wang and Atsuhiko Kai

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:15

Content type: Research Published on: 15 April 2014
- View Full Text
- View PDF
Classification of heterogeneous text data for robust domain-specific language modeling

The robustness of n-gram language models depends on the quality of text data on which they have been trained. The text corpora collected from various resources such as web pages or electronic documents are charac...

Authors: Ján Staš, Jozef Juhár and Daniel Hládek

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:14

Content type: Research Published on: 15 April 2014
- View Full Text
- View PDF
Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition

We present a feature enhancement method that uses neural networks (NNs) to map the reverberant feature in a log-melspectral domain to its corresponding anechoic feature. The mapping is done by cascade NNs trai...

Authors: Aditya Arie Nugraha, Kazumasa Yamamoto and Seiichi Nakagawa

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:13

Content type: Research Published on: 10 April 2014
- View Full Text
- View PDF
Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis

Decision tree-clustered context-dependent hidden semi-Markov models (HSMMs) are typically used in statistical parametric speech synthesis to represent probability densities of acoustic features given contextua...

Authors: Soheil Khorram, Hossein Sameti, Fahimeh Bahmaninezhad, Simon King and Thomas Drugman

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:12

Content type: Research Published on: 7 April 2014
- View Full Text
- View PDF
Speaker adaptation based on regularized speaker-dependent eigenphone matrix estimation

Eigenphone-based speaker adaptation outperforms conventional maximum likelihood linear regression (MLLR) and eigenvoice methods when there is sufficient adaptation data. However, it suffers from severe over-fi...

Authors: Wen-Lin Zhang, Wei-Qiang Zhang, Dan Qu and Bi-Cheng Li

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:11

Content type: Research Published on: 5 April 2014
- View Full Text
- View PDF
Expanded three-channel mid/side coding for three-dimensional multichannel audio systems

Three-dimensional (3D) audio technologies are booming with the success of 3D video technology. The surge in audio channels makes its huge data unacceptable for transmitting bandwidth and storage media, and the...

Authors: Shi Dong, Ruimin Hu, Xiaochen Wang, Yuhong Yang and Weiping Tu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:10

Content type: Research Published on: 24 March 2014
- View Full Text
- View PDF
Method for creating location-specific audio textures

An approach is proposed for creating location-specific audio textures for virtual location-exploration services. The presented approach creates audio textures by processing a small amount of audio recorded at ...

Authors: Toni Heittola, Annamaria Mesaros, Dani Korpi, Antti Eronen and Tuomas Virtanen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:9

Content type: Research Published on: 11 March 2014
- View Full Text
- View PDF

How was your experience today?

Rating Please select one rating

Awful

Bad

Good

Great

Thank you for your feedback.

Tell us why (opens in a new tab)

Articles

Follow

Affiliated with

Annual Journal Metrics

Funding your APC

​​​​​​​​​