Articles

Page 6 of 11

JND-based spatial parameter quantization of multichannel audio signals

In multichannel spatial audio coding (SAC), the accurate representations of virtual sounds and the efficient compressions of spatial parameters are the key to perfect reproduction of spatial sound effects in 3...

Authors: Li Gao, Ruimin Hu, Xiaochen Wang, Gang Li, Yuhong Yang and Weiping Tu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:13

Content type: Research Published on: 21 May 2016
- View Full Text
- View PDF
Audio bandwidth extension using ensemble of recurrent neural networks

In audio communication systems, the perceptual audio quality of the reproduced audio signals such as the naturalness of the sound is limited by the available audio bandwidth. In this paper, a wideband to super...

Authors: Xin Liu and Chang-Chun Bao

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:12

Content type: Research Published on: 12 May 2016
- View Full Text
- View PDF
Novel adaptive muting technique for packet loss concealment of ITU-T G.722 using optimized parametric shaping functions

Adaptive muting method using an optimized parametric shaping function as a part of the ITU-T G.722 Appendix IV packet loss concealment algorithm is proposed. The packet loss concealment algorithm incorporating...

Authors: Bong-Ki Lee and Joon-Hyuk Chang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:11

Content type: Research Published on: 21 April 2016
- View Full Text
- View PDF
Wise teachers train better DNN acoustic models

Automatic speech recognition is becoming more ubiquitous as recognition performance improves, capable devices increase in number, and areas of new application open up. Neural network acoustic models that can u...

Authors: Ryan Price, Ken-ichi Iso and Koichi Shinoda

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:10

Content type: Research Published on: 12 April 2016
- View Full Text
- View PDF
Semi-supervised feature selection for audio classification based on constraint compensated Laplacian score

Audio classification, classifying audio segments into broad categories such as speech, non-speech, and silence, is an important front-end problem in speech signal processing. Dozens of features have been propo...

Authors: Xu-Kui Yang, Liang He, Dan Qu, Wei-Qiang Zhang and Michael T. Johnson

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:9

Content type: Research Published on: 15 March 2016
- View Full Text
- View PDF
Prosodic mapping of text font based on the dimensional theory of emotions: a case study on style and size

Current text-to-speech systems do not support the effective provision of the semantics and the cognitive aspects of the documents’ typographic cues (e.g., font type, style, and size). A novel approach is intro...

Authors: Dimitrios Tsonos and Georgios Kouroupetroglou

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:8

Content type: Research Published on: 15 March 2016
- View Full Text
- View PDF
Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks

Time-frequency (T-F) masking is an effective method for stereo speech source separation. However, reliable estimation of the T-F mask from sound mixtures is a challenging task, especially when room reverberati...

Authors: Yang Yu, Wenwu Wang and Peng Han

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:7

Content type: Research Published on: 4 March 2016
- View Full Text
- View PDF
Developing a unit selection voice given audio without corresponding text

Today, a large amount of audio data is available on the web in the form of audiobooks, podcasts, video lectures, video blogs, news bulletins, etc. In addition, we can effortlessly record and store audio data s...

Authors: Tejas Godambe, Sai Krishna Rallabandi, Suryakanth V. Gangashetty, Ashraf Alkhairy and Afshan Jafri

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:6

Content type: Research Published on: 1 March 2016
- View Full Text
- View PDF
iSargam: music notation representation for Indian Carnatic music

Indian classical music, including its two varieties, Carnatic and Hindustani music, has a rich music tradition and enjoys a wide audience from various parts of the world. The Carnatic music which is more popul...

Authors: Stanly Mammen, Ilango Krishnamurthi, A. Jalaja Varma and G. Sujatha

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:5

Content type: Research Published on: 16 February 2016
- View Full Text
- View PDF
Hybrid statistical/unit-selection Turkish speech synthesis using suffix units

Unit selection based text-to-speech synthesis (TTS) has been the dominant TTS approach of the last decade. Despite its success, unit selection approach has its disadvantages. One of the most significant disadv...

Authors: Cenk Demiroğlu and Ekrem Güner

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:4

Content type: Research Published on: 2 February 2016
- View Full Text
- View PDF
Grid-based approximation for voice conversion in low resource environments

The goal of voice conversion is to modify a source speaker’s speech to sound as if spoken by a target speaker. Common conversion methods are based on Gaussian mixture modeling (GMM). They aim to statistically ...

Authors: Hadas Benisty, David Malah and Koby Crammer

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:3

Content type: Research Published on: 21 January 2016
- View Full Text
- View PDF
Detecting fingering of overblown flute sound using sparse feature learning

In woodwind instruments such as a flute, producing a higher-pitched tone than a standard tone by increasing the blowing pressure is called overblowing, and this allows several distinct fingerings for the same ...

Authors: Yoonchang Han and Kyogu Lee

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:2

Content type: Research Published on: 21 January 2016
- View Full Text
- View PDF
Comparison of ALBAYZIN query-by-example spoken term detection 2012 and 2014 evaluations

Query-by-example spoken term detection (QbE STD) aims at retrieving data from a speech repository given an acoustic query containing the term of interest as input. Nowadays, it is receiving much interest due t...

Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez and Carmen Garcia-Mateo

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:1

Content type: Research Published on: 13 January 2016
- View Full Text
- View PDF
Speech signal modeling using multivariate distributions

Using a proper distribution function for speech signal or for its representations is of crucial importance in statistical-based speech processing algorithms. Although the most commonly used probability density...

Authors: Ali Aroudi, Hadi Veisi, Hossein Sameti and Zahra Mafakheri

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:35

Content type: Research Published on: 30 December 2015
- View Full Text
- View PDF
A multichannel diffuse power estimator for dereverberation in the presence of multiple sources

Using a recently proposed informed spatial filter, it is possible to effectively and robustly reduce reverberation from speech signals captured in noisy environments using multiple microphones. Late reverberat...

Authors: Sebastian Braun and Emanuël A. P. Habets

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:34

Content type: Research Published on: 4 December 2015
- View Full Text
- View PDF
Albayzín-2014 evaluation: audio segmentation and classification in broadcast news domains

Audio segmentation is important as a pre-processing task to improve the performance of many speech technology tasks and, therefore, it has an undoubted research interest. This paper describes the database, the...

Authors: Diego Castán, David Tavarez, Paula Lopez-Otero, Javier Franco-Pedroso, Héctor Delgado, Eva Navas, Laura Docio-Fernández, Daniel Ramos, Javier Serrano, Alfonso Ortega and Eduardo Lleida

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:33

Content type: Research Published on: 1 December 2015
- View Full Text
- View PDF
Small-parallel exemplar-based voice conversion in noisy environments using affine non-negative matrix factorization

The need to have a large amount of parallel data is a large hurdle for the practical use of voice conversion (VC). This paper presents a novel framework of exemplar-based VC that only requires a small number o...

Authors: Ryo Aihara, Takao Fujii, Toru Nakashika, Tetsuya Takiguchi and Yasuo Ariki

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:32

Content type: Research Published on: 25 November 2015
- View Full Text
- View PDF
Semi-fragile digital speech watermarking for online speaker recognition

In this paper, a semi-fragile and blind digital speech watermarking technique for online speaker recognition systems based on the discrete wavelet packet transform (DWPT) and quantization index modulation (QIM...

Authors: Mohammad Ali Nematollahi, Mohammad Ali Akhaee, S. A. R. Al-Haddad and Hamurabi Gamboa-Rosales

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:31

Content type: Research Published on: 21 October 2015
- View Full Text
- View PDF
Erratum to: Efficient voice activity detection algorithm using long-term spectral flatness measure

Authors: Yanna Ma and Akinori Nishihara

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:30

Content type: Erratum Published on: 20 October 2015

The original article was published in EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:87
- View Full Text
- View PDF
Physical task stress and speaker variability in voice quality

The presence of physical task stress induces changes in the speech production system which in turn produces changes in speaking behavior. This results in measurable acoustic correlates including changes to for...

Authors: Keith W. Godin and John H. L. Hansen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:29

Content type: Research Published on: 8 October 2015
- View Full Text
- View PDF
Speech enhancement based on Bayesian decision and spectral amplitude estimation

In this paper, a single-channel speech enhancement method based on Bayesian decision and spectral amplitude estimation is proposed, in which the speech detection module and spectral amplitude estimation module...

Authors: Feng Deng and Chang-Chun Bao

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:28

Content type: Research Published on: 7 October 2015
- View Full Text
- View PDF
Biomimetic spectro-temporal features for music instrument recognition in isolated notes and solo phrases

The identity of musical instruments is reflected in the acoustic attributes of musical notes played with them. Recently, it has been argued that these characteristics of musical identity (or timbre) can be bes...

Authors: Kailash Patil and Mounya Elhilali

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:27

Content type: Research Published on: 17 September 2015
- View Full Text
- View PDF
Exploiting spectro-temporal locality in deep learning based acoustic event detection

In recent years, deep learning has not only permeated the computer vision and speech recognition research fields but also fields such as acoustic event detection (AED). One of the aims of AED is to detect and ...

Authors: Miquel Espi, Masakiyo Fujimoto, Keisuke Kinoshita and Tomohiro Nakatani

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:26

Content type: Research Published on: 14 September 2015
- View Full Text
- View PDF
Phone recognition with hierarchical convolutional deep maxout networks

Deep convolutional neural networks (CNNs) have recently been shown to outperform fully connected deep neural networks (DNNs) both on low-resource and on large-scale speech tasks. Experiments indicate that conv...

Authors: László Tóth

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:25

Content type: Research Published on: 4 September 2015
- View Full Text
- View PDF
Multimodal voice conversion based on non-negative matrix factorization

A multimodal voice conversion (VC) method for noisy environments is proposed. In our previous non-negative matrix factorization (NMF)-based VC method, source and target exemplars are extracted from parallel tr...

Authors: Kenta Masaka, Ryo Aihara, Tetsuya Takiguchi and Yasuo Ariki

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:24

Content type: Research Published on: 4 September 2015
- View Full Text
- View PDF
The Latin Music Mood Database

In this paper we present the Latin Music Mood Database, an extension of the Latin Music Database but for the task of music mood/emotion classification. The method for assigning mood labels to the musical recor...

Authors: Carolina L. dos Santos and Carlos N. Silla Jr

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:23

Content type: Research Published on: 21 August 2015
- View Full Text
- View PDF
Regularized minimum class variance extreme learning machine for language recognition

Support vector machines (SVMs) have played an important role in the state-of-the-art language recognition systems. The recently developed extreme learning machine (ELM) tends to have better scalability and ach...

Authors: Jiaming Xu, Wei-Qiang Zhang, Jia Liu and Shanhong Xia

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:22

Content type: Research Published on: 13 August 2015
- View Full Text
- View PDF
Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia inf...

Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez, Carmen Garcia-Mateo, Antonio Cardenal, Julian David Echeverry-Correa, Alejandro Coucheiro-Limeres, Julia Olcoz and Antonio Miguel

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:21

Content type: Research Published on: 7 August 2015
- View Full Text
- View PDF
Advanced acoustic modelling techniques in MP3 speech recognition

The automatic recognition of MP3 compressed speech presents a challenge to the current systems due to the lossy nature of compression which causes irreversible degradation of the speech wave. This article eval...

Authors: Michal Borsky, Petr Pollak and Petr Mizera

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:20

Content type: Research Published on: 28 July 2015
- View Full Text
- View PDF
Emotion in the singing voice—a deeperlook at acoustic features in the light ofautomatic classification

We investigate the automatic recognition of emotions in the singing voice and study the worth and role of a variety of relevant acoustic parameters. The data set contains phrases and vocalises sung by eight re...

Authors: Florian Eyben, Gláucia L Salomão, Johan Sundberg, Klaus R Scherer and Björn W Schuller

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:19

Content type: Research Published on: 30 June 2015
- View Full Text
- View PDF
An improved i-vector extraction algorithm for speaker verification

Over recent years, i-vector-based framework has been proven to provide state-of-the-art performance in speaker verification. Each utterance is projected onto a total factor space and is represented by a low-di...

Authors: Wei Li, Tianfan Fu and Jie Zhu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:18

Content type: Research Published on: 27 June 2015
- View Full Text
- View PDF
Exploiting foreign resources for DNN-based ASR

Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specifi...

Authors: Petr Motlicek, David Imseng, Blaise Potard, Philip N. Garner and Ivan Himawan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:17

Content type: Research Published on: 26 June 2015
- View Full Text
- View PDF
Singer identification using perceptual features and cepstral coefficients of an audio signal from Indian video songs

Singer identification is a difficult topic in music information retrieval because background instrumental music is included with singing voice which reduces performance of a system. One of the main disadvantag...

Authors: Tushar Ratanpara and Narendra Patel

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:16

Content type: Research Published on: 25 June 2015
- View Full Text
- View PDF
Stereo-based histogram equalization for robust speech recognition

Optimal automatic speech recognition (ASR) takes place when the recognition system is tested under circumstances identical to those in which it was trained. However, in the actual real world, there exist many ...

Authors: Randa Al-Wakeel, Mahmoud Shoman, Magdy Aboul-Ela and Sherif Abdou

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:15

Content type: Research Published on: 9 June 2015
- View Full Text
- View PDF
Robust design of Farrow-structure-based steerable broadband beamformers with sparse tap weights via convex optimization

The Farrow-structure-based steerable broadband beamformer (FSBB) is particularly useful in the applications where sound source of interest may move around a wide angular range. However, in contrast with conven...

Authors: Tiannan Wang and Huawei Chen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:14

Content type: Research Published on: 4 June 2015
- View Full Text
- View PDF
ViSQOL: an objective speech quality model

This paper presents an objective speech quality model, ViSQOL, the Virtual Speech Quality Objective Listener. It is a signal-based, full-reference, intrusive metric that models human speech quality perception ...

Authors: Andrew Hines, Jan Skoglund, Anil C Kokaram and Naomi Harte

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:13

Content type: Research Published on: 17 May 2015
- View Full Text
- View PDF
Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this...

Authors: Zhaofeng Zhang, Longbiao Wang, Atsuhiko Kai, Takanori Yamada, Weifeng Li and Masahiro Iwahashi

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:12

Content type: Research Published on: 12 May 2015
- View Full Text
- View PDF
Lightweight multi-DOA tracking of mobile speech sources

Estimating the directions of arrival (DOAs) of multiple simultaneous mobile sound sources is an important step for various audio signal processing applications. In this contribution, we present an approach tha...

Authors: Caleb Rascon, Gibran Fuentes and Ivan Meza

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:11

Content type: Research Published on: 7 May 2015
- View Full Text
- View PDF
An acoustic data transmission system based on audio data hiding: method and performance evaluation

Acoustic data transmission (ADT) forms a branch of the audio data hiding techniques with its capability of communicating data in short-range aerial space between a loudspeaker and a microphone. In this paper, ...

Authors: Kiho Cho, Jae Choi and Nam Soo Kim

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:10

Content type: Research Published on: 18 April 2015
- View Full Text
- View PDF
Evaluation of linguistic and prosodic features for detection of Alzheimer’s disease in Turkish conversational speech

Automatic diagnosis and monitoring of Alzheimer’s disease can have a significant impact on society as well as the well-being of patients. The part of the brain cortex that processes language abilities is one o...

Authors: Ali Khodabakhsh, Fatih Yesil, Ekrem Guner and Cenk Demiroglu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:9

Content type: Research Published on: 25 March 2015
- View Full Text
- View PDF
Voice conversion using speaker-dependent conditional restricted Boltzmann machine

This paper presents a voice conversion (VC) method that utilizes conditional restricted Boltzmann machines (CRBMs) for each speaker to obtain high-order speaker-independent spaces where voice features are conv...

Authors: Toru Nakashika, Tetsuya Takiguchi and Yasuo Ariki

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:8

Content type: Research Published on: 25 February 2015
- View Full Text
- View PDF
An investigation of supervector regression for forensic voice comparison on small data

Automatic forensic voice comparison (FVC) systems employed in forensic casework have often relied on Gaussian Mixture Model - Universal Background Models (GMM-UBMs) for modelling with relatively little researc...

Authors: Chee Cheun Huang, Julien Epps and Tharmarajah Thiruvaran

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:7

Content type: Research Published on: 24 February 2015
- View Full Text
- View PDF
SIFT-based local spectrogram image descriptor: a novel feature for robust music identification

Music identification via audio fingerprinting has been an active research field in recent years. In the real-world environment, music queries are often deformed by various interferences which typically include...

Authors: Xiu Zhang, Bilei Zhu, Linwei Li, Wei Li, Xiaoqiang Li, Wei Wang, Peizhong Lu and Wenqiang Zhang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:6

Content type: Research Published on: 12 February 2015
- View Full Text
- View PDF
A signal subspace approach to spatio-temporal prediction for multichannel speech enhancement

The spatio-temporal-prediction (STP) method for multichannel speech enhancement has recently been proposed. This approach makes it theoretically possible to attenuate the residual noise without distorting spee...

Authors: Adam Borowicz

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:5

Content type: Research Published on: 10 February 2015
- View Full Text
- View PDF
Within and cross-corpus speech emotion recognition using latent topic model-based features

Owing to the suprasegmental behavior of emotional speech, turn-level features have demonstrated a better success than frame-level features for recognition-related tasks. Conventionally, such features are obtai...

Authors: Mohit Shah, Chaitali Chakrabarti and Andreas Spanias

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:4

Content type: Research Published on: 25 January 2015
- View Full Text
- View PDF
A novel hybrid of genetic algorithm and ANN for developing a high efficient method for vocal fold pathology diagnosis

In this paper, an initial feature vector based on the combination of the wavelet packet decomposition (WPD) and the Mel frequency cepstral coefficients (MFCCs) is proposed. For optimizing the initial feature v...

Authors: Vahid Majidnezhad

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:3

Content type: Research Published on: 21 January 2015
- View Full Text
- View PDF
Noisy training for deep neural networks in speech recognition

Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however...

Authors: Shi Yin, Chao Liu, Zhiyong Zhang, Yiye Lin, Dong Wang, Javier Tejedor, Thomas Fang Zheng and Yinguo Li

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:2

Content type: Research Published on: 20 January 2015
- View Full Text
- View PDF
Simulation of tremulous voices using a biomechanical model

Vocal tremor has been simulated using a high-dimensional discrete vocal fold model. Specifically, respiratory, phonatory, and articulatory tremors have been modeled as instabilities in six parameters of the mo...

Authors: Rubén Fraile, Juan Ignacio Godino-Llorente and Malte Kob

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:1

Content type: Research Published on: 8 January 2015
- View Full Text
- View PDF
Homogenous ensemble phonotactic language recognition based on SVM supervector reconstruction

Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the r...

Authors: Wei-Wei Liu, Wei-Qiang Zhang, Michael T Johnson and Jia Liu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:42

Content type: Research Published on: 24 December 2014
- View Full Text
- View PDF
The self-taught vocal interface

Speech technology is firmly rooted in daily life, most notably in command-and-control (C&C) applications. C&C usability downgrades quickly, however, when used by people with non-standard speech. We pursue a fu...

Authors: Bart Ons, Jort F Gemmeke and Hugo Van hamme

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:43

Content type: Research Published on: 19 December 2014
- View Full Text
- View PDF

How was your experience today?

Rating Please select one rating

Awful

Bad

Good

Great

Thank you for your feedback.

Tell us why (opens in a new tab)

Articles

Follow

Affiliated with

Annual Journal Metrics

Funding your APC

​​​​​​​​​