Skip to main content

Articles

Page 2 of 8

  1. Recurrent neural networks (RNNs) have shown an ability to model temporal dependencies. However, the problem of exploding or vanishing gradients has limited their application. In recent years, long short-term m...

    Authors: Jian Kang, Wei-Qiang Zhang, Wei-Wei Liu, Jia Liu and Michael T. Johnson

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:6

    Content type: Research

    Published on:

  2. The speech intelligibility of indoor public address systems is degraded by reverberation and background noise. This paper proposes a preprocessing method that combines speech enhancement and inverse filtering ...

    Authors: Huan-Yu Dong and Chang-Myung Lee

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:3

    Content type: Research

    Published on:

  3. Query-by-example Spoken Term Detection (QbE STD) aims to retrieve data from a speech repository given an acoustic (spoken) query containing the term of interest as the input. This paper presents the systems su...

    Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez, Jorge Proença, Fernando Perdigão, Fernando García-Granada, Emilio Sanchis, Anna Pompili and Alberto Abad

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:2

    Content type: Research

    Published on:

  4. Automatic extraction of acoustic regions of interest from recordings captured in realistic clinical environments is a necessary preprocessing step in any cry analysis system. In this study, we propose a hidden...

    Authors: Gaurav Naithani, Jaana Kivinummi, Tuomas Virtanen, Outi Tammela, Mikko J. Peltola and Jukka M. Leppänen

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:1

    Content type: Research

    Published on:

  5. Audio signals are a type of high-dimensional data, and their clustering is critical. However, distance calculation failures, inefficient index trees, and cluster overlaps, derived from the equidistance, redund...

    Authors: Wenfa Li, Gongming Wang and Ke Li

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:26

    Content type: Research

    Published on:

  6. In speech enhancement, noise power spectral density (PSD) estimation plays a key role in determining appropriate de-nosing gains. In this paper, we propose a robust noise PSD estimator for binaural speech enha...

    Authors: Youna Ji, Yonghyun Baek and Young-cheol Park

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:25

    Content type: Research

    Published on:

  7. Large vocabulary continuous speech recognition (LVCSR) has naturally been demanded for transcribing daily conversations, while developing spoken text data to train LVCSR is costly and time-consuming. In this p...

    Authors: Vataya Chunwijitra and Chai Wutiwiwatchai

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:24

    Content type: Research

    Published on:

  8. Robustness against background noise is a major research area for speech-related applications such as speech recognition and speaker recognition. One of the many solutions for this problem is to detect speech-d...

    Authors: Gökay Dişken, Zekeriya Tüfekci and Ulus Çevik

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:23

    Content type: Research

    Published on:

  9. Within search-on-speech, Spoken Term Detection (STD) aims to retrieve data from a speech repository given a textual representation of a search term. This paper presents an international open evaluation for sea...

    Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez, Luis Serrano, Inma Hernaez, Alejandro Coucheiro-Limeres, Javier Ferreiros, Julia Olcoz and Jorge Llombart

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:22

    Content type: Research

    Published on:

  10. The task of speaker diarization is to answer the question "who spoke when?" In this paper, we present different clustering approaches which consist of Evolutionary Computation Algorithms (ECAs) such as Genetic...

    Authors: Karim Dabbabi, Salah Hajji and Adnen Cherif

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:21

    Content type: Research

    Published on:

  11. An artificial neural network is an important model for training features of voice conversion (VC) tasks. Typically, neural networks (NNs) are very effective in processing nonlinear features, such as Mel Cepstr...

    Authors: Zhaojie Luo, Jinhui Chen, Tetsuya Takiguchi and Yasuo Ariki

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:18

    Content type: Research

    Published on:

  12. Audio fingerprinting has been an active research field typically used for music identification. Robust audio fingerprinting technology is used to successfully perform content-based audio identification regardl...

    Authors: Dominic Williams, Akash Pooransingh and Jesse Saitoo

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:17

    Content type: Research

    Published on:

  13. In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. Voice conversion is a technique where only speaker-specific information in the source spee...

    Authors: Toru Nakashika and Yasuhiro Minami

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:16

    Content type: Research

    Published on:

  14. Onset detection still has room for improvement, especially when dealing with polyphonic music signals. For certain purposes in which the correctness of the result is a must, user intervention is hence required...

    Authors: Jose J. Valero-Mas and José M. Iñesta

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:15

    Content type: Research

    Published on:

  15. Speech synthesis has been applied in many kinds of practical applications. Currently, state-of-the-art speech synthesis uses statistical methods based on hidden Markov model (HMM). Speech synthesized by statis...

    Authors: Gia-Nhu Nguyen and Trung-Nghia Phung

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:14

    Content type: Research

    Published on:

  16. Autocorrelation domain is a proper domain for clean speech signal and noise separation. In this paper, a method is proposed to decrease effects of noise on the clean speech signal, autocorrelation-based noise ...

    Authors: Gholamreza Farahani

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:13

    Content type: Research

    Published on:

  17. Various musical descriptors have been developed for Cover Song Identification (CSI). However, different descriptors are based on various assumptions, designed for representing distinct characteristics of music...

    Authors: Ning Chen, Mingyu Li and Haidong Xiao

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:12

    Content type: Research

    Published on:

  18. The automatic sound event classification (SEC) has attracted a growing attention in recent years. Feature extraction is a critical factor in SEC system, and the deep neural network (DNN) algorithms have achiev...

    Authors: Junjie Zhang, Jie Yin, Qi Zhang, Jun Shi and Yan Li

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:11

    Content type: Research

    Published on:

  19. This paper outlines a package synchronization scheme for blind speech watermarking in the discrete wavelet transform (DWT) domain. Following two-level DWT decomposition, watermark bits and synchronization code...

    Authors: Hwai-Tsu Hu, Shiow-Jyu Lin and Ling-Yuan Hsu

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:10

    Content type: Research

    Published on:

  20. With the exponential growth in computing power and progress in speech recognition technology, spoken dialog systems (SDSs) with which a user interacts through natural speech has been widely used in human-compu...

    Authors: Chung-Hsien Wu, Ming-Hsiang Su and Wei-Bin Liang

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:9

    Content type: Research

    Published on:

  21. The benefit of auditory models for solving three music recognition tasks—onset detection, pitch estimation, and instrument recognition—is analyzed. Appropriate features are introduced which enable the use of s...

    Authors: Klaus Friedrichs, Nadja Bauer, Rainer Martin and Claus Weihs

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:7

    Content type: Research

    Published on:

  22. The incorporation of grammatical information into speech recognition systems is often used to increase performance in morphologically rich languages. However, this introduces demands for sufficiently large tra...

    Authors: Gregor Donaj and Zdravko Kačič

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:6

    Content type: Research

    Published on:

  23. This article presents the original results of Polish language statistical analysis, based on the orthographic and phonemic language corpus. Phonemic language corpus for Polish was developed by using automatic ...

    Authors: Piotr Kłosowski

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:5

    Content type: Research

    Published on:

  24. This research paper presents parametrization of emotional speech using a pool of common features utilized in emotion recognition such as fundamental frequency, formants, energy, MFCC, PLP, and LPC coefficients. T...

    Authors: Dorota Kamińska, Tomasz Sapiński and Gholamreza Anbarjafari

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:3

    Content type: Research

    Published on:

  25. Cantor Digitalis is a performative singing synthesizer that is composed of two main parts: a chironomic control interface and a parametric voice synthesizer. The control interface is based on a pen/touch graph...

    Authors: Lionel Feugère, Christophe d’Alessandro, Boris Doval and Olivier Perrotin

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:2

    Content type: Research

    Published on:

  26. Present-day IP transport platforms being what they are, it will never be possible to rule out conflicts between the available services. The logical consequence of this assertion is the inevitable conclusion th...

    Authors: Tadeus Uhl, Stefan Paulsen and Krzysztof Nowicki

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:1

    Content type: Research

    Published on:

  27. In this study, we investigate the effect of tiny acoustic differences on the efficiency of prosodic information transmission. Study participants listened to textually ambiguous sentences, which could be unders...

    Authors: Bohan Chen, Norihide Kitaoka and Kazuya Takeda

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:19

    Content type: Research

    Published on:

  28. Statistics of pauses appearing in Polish as a potential source of biometry information for automatic speaker recognition were described. The usage of three main types of acoustic pauses (silent, filled and bre...

    Authors: Magdalena Igras-Cybulska, Bartosz Ziółko, Piotr Żelasko and Marcin Witkowski

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:18

    Content type: Research

    Published on:

  29. We present an algorithm for the estimation of fundamental frequencies in voiced audio signals. The method is based on an autocorrelation of a signal with a segment of the same signal. During operation, frequen...

    Authors: Michael Staudacher, Viktor Steixner, Andreas Griessner and Clemens Zierhofer

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:17

    Content type: Research

    Published on:

  30. We present a novel non-iterative and rigorously motivated approach for estimating hidden Markov models (HMMs) and factorial hidden Markov models (FHMMs) of high-dimensional signals. Our approach utilizes the a...

    Authors: Yochay R. Yeminy, Yosi Keller and Sharon Gannot

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:16

    Content type: Research

    Published on:

  31. Substantial amounts of resources are usually required to robustly develop a language model for an open vocabulary speech recognition system as out-of-vocabulary (OOV) words can hurt recognition accuracy. In th...

    Authors: Vataya Chunwijitra, Ananlada Chotimongkol and Chai Wutiwiwatchai

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:15

    Content type: Research

    Published on:

  32. A new voice activity detection algorithm based on long-term pitch divergence is presented. The long-term pitch divergence not only decomposes speech signals with a bionic decomposition but also makes full use ...

    Authors: Xu-Kui Yang, Liang He, Dan Qu and Wei-Qiang Zhang

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:14

    Content type: Research

    Published on:

  33. In multichannel spatial audio coding (SAC), the accurate representations of virtual sounds and the efficient compressions of spatial parameters are the key to perfect reproduction of spatial sound effects in 3...

    Authors: Li Gao, Ruimin Hu, Xiaochen Wang, Gang Li, Yuhong Yang and Weiping Tu

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:13

    Content type: Research

    Published on:

  34. Adaptive muting method using an optimized parametric shaping function as a part of the ITU-T G.722 Appendix IV packet loss concealment algorithm is proposed. The packet loss concealment algorithm incorporating...

    Authors: Bong-Ki Lee and Joon-Hyuk Chang

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:11

    Content type: Research

    Published on:

  35. Automatic speech recognition is becoming more ubiquitous as recognition performance improves, capable devices increase in number, and areas of new application open up. Neural network acoustic models that can u...

    Authors: Ryan Price, Ken-ichi Iso and Koichi Shinoda

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:10

    Content type: Research

    Published on:

  36. Audio classification, classifying audio segments into broad categories such as speech, non-speech, and silence, is an important front-end problem in speech signal processing. Dozens of features have been propo...

    Authors: Xu-Kui Yang, Liang He, Dan Qu, Wei-Qiang Zhang and Michael T. Johnson

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:9

    Content type: Research

    Published on:

  37. Current text-to-speech systems do not support the effective provision of the semantics and the cognitive aspects of the documents’ typographic cues (e.g., font type, style, and size). A novel approach is intro...

    Authors: Dimitrios Tsonos and Georgios Kouroupetroglou

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:8

    Content type: Research

    Published on:

  38. Time-frequency (T-F) masking is an effective method for stereo speech source separation. However, reliable estimation of the T-F mask from sound mixtures is a challenging task, especially when room reverberati...

    Authors: Yang Yu, Wenwu Wang and Peng Han

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:7

    Content type: Research

    Published on:

  39. Today, a large amount of audio data is available on the web in the form of audiobooks, podcasts, video lectures, video blogs, news bulletins, etc. In addition, we can effortlessly record and store audio data s...

    Authors: Tejas Godambe, Sai Krishna Rallabandi, Suryakanth V. Gangashetty, Ashraf Alkhairy and Afshan Jafri

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:6

    Content type: Research

    Published on:

  40. Indian classical music, including its two varieties, Carnatic and Hindustani music, has a rich music tradition and enjoys a wide audience from various parts of the world. The Carnatic music which is more popul...

    Authors: Stanly Mammen, Ilango Krishnamurthi, A. Jalaja Varma and G. Sujatha

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2016 2016:5

    Content type: Research

    Published on:

Latest Tweets

Your browser needs to have JavaScript enabled to view this timeline

Who reads the journal?

Learn more about the impact the EURASIP Journal on Audio, Speech, and Music Processing has worldwide

Annual Journal Metrics

Funding your APC

​​​​​​​Open access funding and policy support by SpringerOpen​​

​​​​We offer a free open access support service to make it easier for you to discover and apply for article-processing charge (APC) funding. Learn more here