Skip to main content

Articles

Page 3 of 8

  1. Query-by-example spoken term detection (QbE STD) aims at retrieving data from a speech repository given an acoustic query containing the term of interest as input. Nowadays, it is receiving much interest due t...

    Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez and Carmen Garcia-Mateo

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:1

    Content type: Research

    Published on:

  2. Using a proper distribution function for speech signal or for its representations is of crucial importance in statistical-based speech processing algorithms. Although the most commonly used probability density...

    Authors: Ali Aroudi, Hadi Veisi, Hossein Sameti and Zahra Mafakheri

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:35

    Content type: Research

    Published on:

  3. Using a recently proposed informed spatial filter, it is possible to effectively and robustly reduce reverberation from speech signals captured in noisy environments using multiple microphones. Late reverberat...

    Authors: Sebastian Braun and Emanuël A. P. Habets

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:34

    Content type: Research

    Published on:

  4. Audio segmentation is important as a pre-processing task to improve the performance of many speech technology tasks and, therefore, it has an undoubted research interest. This paper describes the database, the...

    Authors: Diego Castán, David Tavarez, Paula Lopez-Otero, Javier Franco-Pedroso, Héctor Delgado, Eva Navas, Laura Docio-Fernández, Daniel Ramos, Javier Serrano, Alfonso Ortega and Eduardo Lleida

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:33

    Content type: Research

    Published on:

  5. The need to have a large amount of parallel data is a large hurdle for the practical use of voice conversion (VC). This paper presents a novel framework of exemplar-based VC that only requires a small number o...

    Authors: Ryo Aihara, Takao Fujii, Toru Nakashika, Tetsuya Takiguchi and Yasuo Ariki

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:32

    Content type: Research

    Published on:

  6. In this paper, a semi-fragile and blind digital speech watermarking technique for online speaker recognition systems based on the discrete wavelet packet transform (DWPT) and quantization index modulation (QIM...

    Authors: Mohammad Ali Nematollahi, Mohammad Ali Akhaee, S. A. R. Al-Haddad and Hamurabi Gamboa-Rosales

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:31

    Content type: Research

    Published on:

  7. The presence of physical task stress induces changes in the speech production system which in turn produces changes in speaking behavior. This results in measurable acoustic correlates including changes to for...

    Authors: Keith W. Godin and John H. L. Hansen

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:29

    Content type: Research

    Published on:

  8. The identity of musical instruments is reflected in the acoustic attributes of musical notes played with them. Recently, it has been argued that these characteristics of musical identity (or timbre) can be bes...

    Authors: Kailash Patil and Mounya Elhilali

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:27

    Content type: Research

    Published on:

  9. In recent years, deep learning has not only permeated the computer vision and speech recognition research fields but also fields such as acoustic event detection (AED). One of the aims of AED is to detect and ...

    Authors: Miquel Espi, Masakiyo Fujimoto, Keisuke Kinoshita and Tomohiro Nakatani

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:26

    Content type: Research

    Published on:

  10. A multimodal voice conversion (VC) method for noisy environments is proposed. In our previous non-negative matrix factorization (NMF)-based VC method, source and target exemplars are extracted from parallel tr...

    Authors: Kenta Masaka, Ryo Aihara, Tetsuya Takiguchi and Yasuo Ariki

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:24

    Content type: Research

    Published on:

  11. In this paper we present the Latin Music Mood Database, an extension of the Latin Music Database but for the task of music mood/emotion classification. The method for assigning mood labels to the musical recor...

    Authors: Carolina L. dos Santos and Carlos N. Silla Jr

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:23

    Content type: Research

    Published on:

  12. Support vector machines (SVMs) have played an important role in the state-of-the-art language recognition systems. The recently developed extreme learning machine (ELM) tends to have better scalability and ach...

    Authors: Jiaming Xu, Wei-Qiang Zhang, Jia Liu and Shanhong Xia

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:22

    Content type: Research

    Published on:

  13. Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia inf...

    Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez, Carmen Garcia-Mateo, Antonio Cardenal, Julian David Echeverry-Correa, Alejandro Coucheiro-Limeres, Julia Olcoz and Antonio Miguel

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:21

    Content type: Research

    Published on:

  14. The automatic recognition of MP3 compressed speech presents a challenge to the current systems due to the lossy nature of compression which causes irreversible degradation of the speech wave. This article eval...

    Authors: Michal Borsky, Petr Pollak and Petr Mizera

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:20

    Content type: Research

    Published on:

  15. We investigate the automatic recognition of emotions in the singing voice and study the worth and role of a variety of relevant acoustic parameters. The data set contains phrases and vocalises sung by eight re...

    Authors: Florian Eyben, Gláucia L Salomão, Johan Sundberg, Klaus R Scherer and Björn W Schuller

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:19

    Content type: Research

    Published on:

  16. Over recent years, i-vector-based framework has been proven to provide state-of-the-art performance in speaker verification. Each utterance is projected onto a total factor space and is represented by a low-di...

    Authors: Wei Li, Tianfan Fu and Jie Zhu

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:18

    Content type: Research

    Published on:

  17. Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specifi...

    Authors: Petr Motlicek, David Imseng, Blaise Potard, Philip N. Garner and Ivan Himawan

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:17

    Content type: Research

    Published on:

  18. Singer identification is a difficult topic in music information retrieval because background instrumental music is included with singing voice which reduces performance of a system. One of the main disadvantag...

    Authors: Tushar Ratanpara and Narendra Patel

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:16

    Content type: Research

    Published on:

  19. Optimal automatic speech recognition (ASR) takes place when the recognition system is tested under circumstances identical to those in which it was trained. However, in the actual real world, there exist many ...

    Authors: Randa Al-Wakeel, Mahmoud Shoman, Magdy Aboul-Ela and Sherif Abdou

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:15

    Content type: Research

    Published on:

  20. The Farrow-structure-based steerable broadband beamformer (FSBB) is particularly useful in the applications where sound source of interest may move around a wide angular range. However, in contrast with conven...

    Authors: Tiannan Wang and Huawei Chen

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:14

    Content type: Research

    Published on:

  21. This paper presents an objective speech quality model, ViSQOL, the Virtual Speech Quality Objective Listener. It is a signal-based, full-reference, intrusive metric that models human speech quality perception ...

    Authors: Andrew Hines, Jan Skoglund, Anil C Kokaram and Naomi Harte

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:13

    Content type: Research

    Published on:

  22. Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this...

    Authors: Zhaofeng Zhang, Longbiao Wang, Atsuhiko Kai, Takanori Yamada, Weifeng Li and Masahiro Iwahashi

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:12

    Content type: Research

    Published on:

  23. Estimating the directions of arrival (DOAs) of multiple simultaneous mobile sound sources is an important step for various audio signal processing applications. In this contribution, we present an approach tha...

    Authors: Caleb Rascon, Gibran Fuentes and Ivan Meza

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:11

    Content type: Research

    Published on:

  24. Acoustic data transmission (ADT) forms a branch of the audio data hiding techniques with its capability of communicating data in short-range aerial space between a loudspeaker and a microphone. In this paper, ...

    Authors: Kiho Cho, Jae Choi and Nam Soo Kim

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:10

    Content type: Research

    Published on:

  25. Automatic diagnosis and monitoring of Alzheimer’s disease can have a significant impact on society as well as the well-being of patients. The part of the brain cortex that processes language abilities is one o...

    Authors: Ali Khodabakhsh, Fatih Yesil, Ekrem Guner and Cenk Demiroglu

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:9

    Content type: Research

    Published on:

  26. This paper presents a voice conversion (VC) method that utilizes conditional restricted Boltzmann machines (CRBMs) for each speaker to obtain high-order speaker-independent spaces where voice features are conv...

    Authors: Toru Nakashika, Tetsuya Takiguchi and Yasuo Ariki

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:8

    Content type: Research

    Published on:

  27. Automatic forensic voice comparison (FVC) systems employed in forensic casework have often relied on Gaussian Mixture Model - Universal Background Models (GMM-UBMs) for modelling with relatively little researc...

    Authors: Chee Cheun Huang, Julien Epps and Tharmarajah Thiruvaran

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:7

    Content type: Research

    Published on:

  28. Music identification via audio fingerprinting has been an active research field in recent years. In the real-world environment, music queries are often deformed by various interferences which typically include...

    Authors: Xiu Zhang, Bilei Zhu, Linwei Li, Wei Li, Xiaoqiang Li, Wei Wang, Peizhong Lu and Wenqiang Zhang

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:6

    Content type: Research

    Published on:

  29. Owing to the suprasegmental behavior of emotional speech, turn-level features have demonstrated a better success than frame-level features for recognition-related tasks. Conventionally, such features are obtai...

    Authors: Mohit Shah, Chaitali Chakrabarti and Andreas Spanias

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:4

    Content type: Research

    Published on:

  30. In this paper, an initial feature vector based on the combination of the wavelet packet decomposition (WPD) and the Mel frequency cepstral coefficients (MFCCs) is proposed. For optimizing the initial feature v...

    Authors: Vahid Majidnezhad

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:3

    Content type: Research

    Published on:

  31. Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however...

    Authors: Shi Yin, Chao Liu, Zhiyong Zhang, Yiye Lin, Dong Wang, Javier Tejedor, Thomas Fang Zheng and Yinguo Li

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:2

    Content type: Research

    Published on:

  32. Vocal tremor has been simulated using a high-dimensional discrete vocal fold model. Specifically, respiratory, phonatory, and articulatory tremors have been modeled as instabilities in six parameters of the mo...

    Authors: Rubén Fraile, Juan Ignacio Godino-Llorente and Malte Kob

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2015 2015:1

    Content type: Research

    Published on:

  33. Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the r...

    Authors: Wei-Wei Liu, Wei-Qiang Zhang, Michael T Johnson and Jia Liu

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:42

    Content type: Research

    Published on:

  34. Speech technology is firmly rooted in daily life, most notably in command-and-control (C&C) applications. C&C usability downgrades quickly, however, when used by people with non-standard speech. We pursue a fu...

    Authors: Bart Ons, Jort F Gemmeke and Hugo Van hamme

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:43

    Content type: Research

    Published on:

  35. The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equiv...

    Authors: Sara Ahmadi, Seyed Mohammad Ahadi, Bert Cranen and Lou Boves

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:36

    Content type: Research

    Published on:

  36. Building a voice-operated system for learning disabled users is a difficult task that requires a considerable amount of time and effort. Due to the wide spectrum of disabilities and their different related pho...

    Authors: Marek Bohac, Michaela Kucharova, Zoraida Callejas, Jan Nouza and Petr Červa

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:39

    Content type: Research

    Published on:

  37. In this paper, we propose a semi-blind, imperceptible, and robust digital audio watermarking algorithm. The proposed algorithm is based on cascading two well-known transforms: the discrete wavelet transform an...

    Authors: Ali Al-Haj

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:37

    Content type: Research

    Published on:

  38. Model-based speech enhancement algorithms that employ trained models, such as codebooks, hidden Markov models, Gaussian mixture models, etc., containing representations of speech such as linear predictive coef...

    Authors: Devireddy Hanumantha Rao Naidu and Sriram Srinivasan

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:35

    Content type: Research

    Published on:

  39. The task of automatic retrieval and extraction of lyrics from the web is of great importance to different Music Information Retrieval applications. However, despite its importance, very little research has bee...

    Authors: Rafael P Ribeiro, Murilo AP Almeida and Carlos N Silla Jr

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:27

    Content type: Research

    Published on:

  40. This paper studies a novel audio segmentation-by-classification approach based on factor analysis. The proposed technique compensates the within-class variability by using class-dependent factor loading matric...

    Authors: Diego Castán, Alfonso Ortega, Antonio Miguel and Eduardo Lleida

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:34

    Content type: Research

    Published on:

  41. This paper proposes a new speech enhancement (SE) algorithm utilizing constraints to the Wiener gain function which is capable of working at 10 dB and lower signal-to-noise ratios (SNRs). The wavelet threshold...

    Authors: Yanna Ma and Akinori Nishihara

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:32

    Content type: Research

    Published on:

  42. The current paper examines influences of speech rate on Fujisaki model parameters based on read speech from the BonnTempo-Corpus containing productions by 12 native speakers of German at five different intende...

    Authors: Hansjörg Mixdorff, Adrian Leemann and Volker Dellwo

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:33

    Content type: Research

    Published on:

  43. In many speech communication applications, robust localization and tracking of multiple speakers in noisy and reverberant environments are of major importance. Several algorithms to tackle this problem have be...

    Authors: Stephan Gerlach, Jörg Bitzer, Stefan Goetze and Simon Doclo

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:31

    Content type: Research

    Published on:

Latest Tweets

Your browser needs to have JavaScript enabled to view this timeline

Who reads the journal?

Learn more about the impact the EURASIP Journal on Audio, Speech, and Music Processing has worldwide

Annual Journal Metrics

Funding your APC

​​​​​​​Open access funding and policy support by SpringerOpen​​

​​​​We offer a free open access support service to make it easier for you to discover and apply for article-processing charge (APC) funding. Learn more here