Skip to main content

Articles

Page 5 of 11

  1. Text-to-speech (TTS) synthesis systems have been widely used in general-purpose applications based on the generation of speech. Nonetheless, there are some domains, such as storytelling or voice output aid dev...

    Authors: Marc Freixes, Francesc Alías and Joan Claudi Socoró
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:22
  2. So-called full-face masks are essential for fire fighters to ensure respiratory protection in smoke diving incidents. While such masks are absolutely necessary for protection purposes on one hand, they impair the...

    Authors: Michael Brodersen, Achim Volmer and Gerhard Schmidt
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:21
  3. According to the encoding and decoding mechanism of binaural cue coding (BCC), in this paper, the speech and noise are considered as left channel signal and right channel signal of the BCC framework, respectiv...

    Authors: Xianyun Wang and Changchun Bao
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:20
  4. Phonetic information is one of the most essential components of a speech signal, playing an important role for many speech processing tasks. However, it is difficult to integrate phonetic information into spea...

    Authors: Yi Liu, Liang He, Jia Liu and Michael T. Johnson
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:19
  5. A method called joint connectionist temporal classification (CTC)-attention-based speech recognition has recently received increasing focus and has achieved impressive performance. A hybrid end-to-end architec...

    Authors: Chu-Xiong Qin, Wen-Lin Zhang and Dan Qu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:18
  6. Voice conversion (VC) is a technique of exclusively converting speaker-specific information in the source speech while preserving the associated phonemic information. Non-negative matrix factorization (NMF)-ba...

    Authors: Yuki Takashima, Toru Nakashika, Tetsuya Takiguchi and Yasuo Ariki
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:17
  7. Search on speech (SoS) is a challenging area due to the huge amount of information stored in audio and video repositories. Spoken term detection (STD) is an SoS-related task aiming to retrieve data from a spee...

    Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez, Ana R. Montalvo, Jose M. Ramirez, Mikel Peñagarikano and Luis Javier Rodriguez-Fuentes
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:16
  8. Voice-enabled interaction systems in domestic environments have attracted significant interest recently, being the focus of smart home research projects and commercial voice assistant home devices. Within the ...

    Authors: Panagiotis Giannoulis, Gerasimos Potamianos and Petros Maragos
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:15
  9. Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may ...

    Authors: Mohit Shah, Ming Tu, Visar Berisha, Chaitali Chakrabarti and Andreas Spanias
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:14
  10. The huge amount of information stored in audio and video repositories makes search on speech (SoS) a priority area nowadays. Within SoS, Query-by-Example Spoken Term Detection (QbE STD) aims to retrieve data f...

    Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez, Mikel Peñagarikano, Luis Javier Rodriguez-Fuentes and Antonio Moreno-Sandoval
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:13
  11. In this paper, we apply a latent class model (LCM) to the task of speaker diarization. LCM is similar to Patrick Kenny’s variational Bayes (VB) method in that it uses soft information and avoids premature hard...

    Authors: Liang He, Xianhong Chen, Can Xu, Yi Liu, Jia Liu and Michael T. Johnson
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:12
  12. We propose a new method for music detection from broadcasting contents using the convolutional neural networks with a Mel-scale kernel. In this detection task, music segments should be annotated from the broad...

    Authors: Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim and Oh-Wook Kwon
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:11
  13. Singing voice analysis has been a topic of research to assist several applications in the domain of music information retrieval system. One such major area is singer identification (SID). There has been enormo...

    Authors: Deepali Y. Loni and Shaila Subbaraman
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:10
  14. Audio signals represent a wide diversity of acoustic events, from background environmental noise to spoken communication. Machine learning models such as neural networks have already been proposed for audio si...

    Authors: Diego de Benito-Gorron, Alicia Lozano-Diez, Doroteo T. Toledano and Joaquin Gonzalez-Rodriguez
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:9
  15. There are many studies on detecting human speech from artificially generated speech and automatic speaker verification (ASV) that aim to detect and identify whether the given speech belongs to a given speaker....

    Authors: Zeyan Oo, Longbiao Wang, Khomdet Phapatanaburi, Meng Liu, Seiichi Nakagawa, Masahiro Iwahashi and Jianwu Dang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:8
  16. In this paper, an adaptive averaging a priori SNR estimation employing critical band processing is proposed. The proposed method modifies the current decision-directed a priori SNR estimation to achieve faster...

    Authors: Lara Nahma, Pei Chee Yong, Hai Huyen Dam and Sven Nordholm
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:7
  17. In response to renewed interest in virtual and augmented reality, the need for high-quality spatial audio systems has emerged. The reproduction of immersive and realistic virtual sound requires high resolution...

    Authors: Zamir Ben-Hur, David Lou Alon, Boaz Rafaely and Ravish Mehra
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:5
  18. This paper proposes two novel linguistic features extracted from text input for prosody generation in a Mandarin text-to-speech system. The first feature is the punctuation confidence (PC), which measures the ...

    Authors: Chen-Yu Chiang, Yu-Ping Hung, Han-Yun Yeh, I-Bin Liao and Chen-Ming Pan
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:4
  19. Current automatic speech recognition (ASR) systems achieve over 90–95% accuracy, depending on the methodology applied and datasets used. However, the level of accuracy decreases significantly when the same ASR...

    Authors: Kacper Radzikowski, Robert Nowak, Le Wang and Osamu Yoshie
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:3
  20. Filter banks on spectrums play an important role in many audio applications. Traditionally, the filters are linearly distributed on perceptual frequency scale such as Mel scale. To make the output smoother, th...

    Authors: Teng Zhang and Ji Wu
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:1
  21. This paper deals with a project of Automatic Bird Species Recognition Based on Bird Vocalization. Eighteen bird species of 6 different families were analyzed. At first, human factor cepstral coefficients repre...

    Authors: Jiri Stastny, Michal Munk and Lubos Juranek
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:19
  22. A transfer learning-based end-to-end speech recognition approach is presented in two levels in our framework. Firstly, a feature extraction approach combining multilingual deep neural network (DNN) training wi...

    Authors: Chu-Xiong Qin, Dan Qu and Lian-Hai Zhang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:18
  23. In this paper, a web-based spoken dialog generation environment which enables users to edit dialogs with a video virtual assistant is developed and to also select the 3D motions and tone of voice for the assis...

    Authors: Ryota Nishimura, Daisuke Yamamoto, Takahiro Uchiya and Ichi Takumi
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:17
  24. In this paper, a robust and highly imperceptible audio watermarking technique is presented based on discrete cosine transform (DCT) and singular value decomposition (SVD). The low-frequency components of the a...

    Authors: Aniruddha Kanhe and Aghila Gnanasekaran
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:16
  25. The emerging field of computational acoustic monitoring aims at retrieving high-level information from acoustic scenes recorded by some network of sensors. These networks gather large amounts of data requiring...

    Authors: Vincent Lostanlen, Grégoire Lafay, Joakim Andén and Mathieu Lagrange
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:15
  26. Several factors contribute to the performance of speaker diarization systems. For instance, the appropriate selection of speech features is one of the key aspects that affect speaker diarization systems. The o...

    Authors: Abraham Woubie Zewoudie, Jordi Luque and Javier Hernando
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:14
  27. Recently, sound recognition has been used to identify sounds, such as the sound of a car, or a river. However, sounds have nuances that may be better described by adjective-noun pairs such as “slow car” and ve...

    Authors: Sebastian Säger, Benjamin Elizalde, Damian Borth, Christian Schulze, Bhiksha Raj and Ian Lane
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:12
  28. As the foundation of many applications, multipitch estimation problem has always been the focus of acoustic music processing; however, existing algorithms perform deficiently due to its complexity. In this pap...

    Authors: Xingda Li, Yujing Guan, Yingnian Wu and Zhongbo Zhang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:11
  29. Voice activity detection (VAD) is an important preprocessing step for various speech applications to identify speech and non-speech periods in input signals. In this paper, we propose a deep neural network (DN...

    Authors: Suci Dwijayanti, Kei Yamamori and Masato Miyoshi
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:10
  30. The performance of automatic speech recognition systems degrades in the presence of emotional states and in adverse environments (e.g., noisy conditions). This greatly limits the deployment of speech recogniti...

    Authors: Meysam Bashirpour and Masoud Geravanchizadeh
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:9
  31. The successful treatment of hearing loss depends on the individual practitioner’s experience and skill. So far, there is no standard available to evaluate the practitioner’s testing skills. To assess every pra...

    Authors: Alexander Kocian, Guido Cattani, Stefano Chessa and Wilko Grolman
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:8
  32. Recurrent neural networks (RNNs) have shown an ability to model temporal dependencies. However, the problem of exploding or vanishing gradients has limited their application. In recent years, long short-term m...

    Authors: Jian Kang, Wei-Qiang Zhang, Wei-Wei Liu, Jia Liu and Michael T. Johnson
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:6
  33. The speech intelligibility of indoor public address systems is degraded by reverberation and background noise. This paper proposes a preprocessing method that combines speech enhancement and inverse filtering ...

    Authors: Huan-Yu Dong and Chang-Myung Lee
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:3
  34. Query-by-example Spoken Term Detection (QbE STD) aims to retrieve data from a speech repository given an acoustic (spoken) query containing the term of interest as the input. This paper presents the systems su...

    Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez, Jorge Proença, Fernando Perdigão, Fernando García-Granada, Emilio Sanchis, Anna Pompili and Alberto Abad
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:2
  35. Automatic extraction of acoustic regions of interest from recordings captured in realistic clinical environments is a necessary preprocessing step in any cry analysis system. In this study, we propose a hidden...

    Authors: Gaurav Naithani, Jaana Kivinummi, Tuomas Virtanen, Outi Tammela, Mikko J. Peltola and Jukka M. Leppänen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:1
  36. Audio signals are a type of high-dimensional data, and their clustering is critical. However, distance calculation failures, inefficient index trees, and cluster overlaps, derived from the equidistance, redund...

    Authors: Wenfa Li, Gongming Wang and Ke Li
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:26
  37. In speech enhancement, noise power spectral density (PSD) estimation plays a key role in determining appropriate de-nosing gains. In this paper, we propose a robust noise PSD estimator for binaural speech enha...

    Authors: Youna Ji, Yonghyun Baek and Young-cheol Park
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:25
  38. Large vocabulary continuous speech recognition (LVCSR) has naturally been demanded for transcribing daily conversations, while developing spoken text data to train LVCSR is costly and time-consuming. In this p...

    Authors: Vataya Chunwijitra and Chai Wutiwiwatchai
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:24
  39. Robustness against background noise is a major research area for speech-related applications such as speech recognition and speaker recognition. One of the many solutions for this problem is to detect speech-d...

    Authors: Gökay Dişken, Zekeriya Tüfekci and Ulus Çevik
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:23
  40. Within search-on-speech, Spoken Term Detection (STD) aims to retrieve data from a speech repository given a textual representation of a search term. This paper presents an international open evaluation for sea...

    Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez, Luis Serrano, Inma Hernaez, Alejandro Coucheiro-Limeres, Javier Ferreiros, Julia Olcoz and Jorge Llombart
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:22
  41. The task of speaker diarization is to answer the question "who spoke when?" In this paper, we present different clustering approaches which consist of Evolutionary Computation Algorithms (ECAs) such as Genetic...

    Authors: Karim Dabbabi, Salah Hajji and Adnen Cherif
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:21
  42. An artificial neural network is an important model for training features of voice conversion (VC) tasks. Typically, neural networks (NNs) are very effective in processing nonlinear features, such as Mel Cepstr...

    Authors: Zhaojie Luo, Jinhui Chen, Tetsuya Takiguchi and Yasuo Ariki
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:18

Who reads the journal?

Learn more about the impact the EURASIP Journal on Audio, Speech, and Music Processing has worldwide

Annual Journal Metrics

  • Citation Impact 2023
    Journal Impact Factor: 1.7
    5-year Journal Impact Factor: 1.6
    Source Normalized Impact per Paper (SNIP): 1.051
    SCImago Journal Rank (SJR): 0.414

    Speed 2023
    Submission to first editorial decision (median days): 17
    Submission to acceptance (median days): 154

    Usage 2023
    Downloads: 368,607
    Altmetric mentions: 70

Funding your APC

​​​​​​​Open access funding and policy support by SpringerOpen​​

​​​​We offer a free open access support service to make it easier for you to discover and apply for article-processing charge (APC) funding. Learn more here