Skip to main content

Articles

Page 1 of 8

  1. In this article, we conduct a comprehensive simulation study for the optimal scores of speaker recognition systems that are based on speaker embedding. For that purpose, we first revisit the optimal scores for...

    Authors: Dong Wang

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:18

    Content type: Research

    Published on:

  2. Depression is a widespread mental health problem around the world with a significant burden on economies. Its early diagnosis and treatment are critical to reduce the costs and even save lives. One key aspect ...

    Authors: Cenk Demiroglu, Aslı Beşirli, Yasin Ozkanca and Selime Çelik

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:17

    Content type: Research

    Published on:

  3. Drone-embedded sound source localization (SSL) has interesting application perspective in challenging search and rescue scenarios due to bad lighting conditions or occlusions. However, the problem gets complic...

    Authors: Alif Bin Abdul Qayyum, K. M. Naimul Hassan, Adrita Anika, Md. Farhan Shadiq, Md Mushfiqur Rahman, Md. Tariqul Islam, Sheikh Asif Imran, Shahruk Hossain and Mohammad Ariful Haque

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:16

    Content type: Research

    Published on:

  4. Humanoid robots require to use microphone arrays to acquire speech signals from the human communication partner while suppressing noise, reverberation, and interferences. Unlike many other applications, microp...

    Authors: Gongping Huang, Jingdong Chen, Jacob Benesty, Israel Cohen and Xudong Zhao

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:15

    Content type: Research

    Published on:

  5. Microphone leakage or crosstalk is a common problem in multichannel close-talk audio recordings (e.g., meetings or live music performances), which occurs when a target signal does not only couple into its dedi...

    Authors: Patrick Meyer, Samy Elshamy and Tim Fingscheidt

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:14

    Content type: Research

    Published on:

  6. A method to locate sound sources using an audio recording system mounted on an unmanned aerial vehicle (UAV) is proposed. The method introduces extension algorithms to apply on top of a baseline approach, whic...

    Authors: Benjamin Yen and Yusuke Hioka

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:13

    Content type: Research

    Published on:

  7. Estimation problems like room geometry estimation and localization of acoustic reflectors are of great interest and importance in robot and drone audition. Several methods for tackling these problems exist, bu...

    Authors: Usama Saqib, Sharon Gannot and Jesper Rindom Jensen

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:12

    Content type: Research

    Published on:

  8. Ego-noise, i.e., the noise a robot causes by its own motions, significantly corrupts the microphone signal and severely impairs the robot’s capability to interact seamlessly with its environment. Therefore, su...

    Authors: Alexander Schmidt, Andreas Brendel, Thomas Haubner and Walter Kellermann

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:11

    Content type: Research

    Published on:

  9. A keyword spotting algorithm implemented on an embedded system using a depthwise separable convolutional neural network classifier is reported. The proposed system was derived from a high-complexity system wit...

    Authors: Peter Mølgaard Sørensen, Bastian Epp and Tobias May

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:10

    Content type: Research

    Published on:

  10. In this work, we present an ensemble for automated audio classification that fuses different types of features extracted from audio files. These features are evaluated, compared, and fused with the goal of pro...

    Authors: Loris Nanni, Yandre M. G. Costa, Rafael L. Aguiar, Rafael B. Mangolin, Sheryl Brahnam and Carlos N. Silla Jr.

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:8

    Content type: Research

    Published on:

  11. In this paper, we introduce a quadratic approach for single-channel noise reduction. The desired signal magnitude is estimated by applying a linear filter to a modified version of the observations’ vector. The...

    Authors: Gal Itzhak, Jacob Benesty and Israel Cohen

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:7

    Content type: Research

    Published on:

  12. In order to improve the performance of hand-crafted features to detect playback speech, two discriminative features, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients,...

    Authors: Jichen Yang, Longting Xu, Bo Ren and Yunyun Ji

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:6

    Content type: Research

    Published on:

  13. This paper presents a new approach based on recurrent neural networks (RNN) to the multiclass audio segmentation task whose goal is to classify an audio signal as speech, music, noise or a combination of these...

    Authors: Pablo Gimeno, Ignacio Viñals, Alfonso Ortega, Antonio Miguel and Eduardo Lleida

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:5

    Content type: Research

    Published on:

  14. Binaural sound source localization is an important and widely used perceptually based method and it has been applied to machine learning studies by many researchers based on head-related transfer function (HRT...

    Authors: Jing Wang, Jin Wang, Kai Qian, Xiang Xie and Jingming Kuang

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:4

    Content type: Research

    Published on:

  15. Attention-based encoder-decoder models have recently shown competitive performance for automatic speech recognition (ASR) compared to conventional ASR systems. However, how to employ attention models for onlin...

    Authors: Junfeng Hou, Wu Guo, Yan Song and Li-Rong Dai

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:3

    Content type: Research

    Published on:

  16. Experimental data combining complementary measures based on the oral airflow signal is presented in this paper, exploring the view that European Portuguese voiced stops are produced in a similar fashion to Ger...

    Authors: Luis M. T. Jesus and Maria Conceição Costa

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:2

    Content type: Research

    Published on:

  17. In this paper, we use empirical mode decomposition and Hurst-based mode selection (EMDH) along with deep learning architecture using a convolutional neural network (CNN) to improve the recognition of dysarthri...

    Authors: Mohammed Sidi Yakoub, Sid-ahmed Selouani, Brahim-Fares Zaidi and Asma Bouchair

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2020 2020:1

    Content type: Research

    Published on:

  18. We present a novel model adaptation approach to deal with data variability for speaker diarization in a broadcast environment. Expensive human annotated data can be used to mitigate the domain mismatch by mean...

    Authors: Ignacio Viñals, Alfonso Ortega, Jesús Villalba, Antonio Miguel and Eduardo Lleida

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:24

    Content type: Research

    Published on:

  19. In this paper, we propose a score-informed source separation framework based on non-negative matrix factorization (NMF) and dynamic time warping (DTW) that suits for both offline and online systems. The propos...

    Authors: Antonio Jesús Munoz-Montoro, Julio José Carabias-Orti, Pedro Vera-Candeas, Francisco Jesús Canadas-Quesada and Nicolás Ruiz-Reyes

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:23

    Content type: Research

    Published on:

  20. Text-to-speech (TTS) synthesis systems have been widely used in general-purpose applications based on the generation of speech. Nonetheless, there are some domains, such as storytelling or voice output aid dev...

    Authors: Marc Freixes, Francesc Alías and Joan Claudi Socoró

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:22

    Content type: Research

    Published on:

  21. So-called full-face masks are essential for fire fighters to ensure respiratory protection in smoke diving incidents. While such masks are absolutely necessary for protection purposes on one hand, they impair the...

    Authors: Michael Brodersen, Achim Volmer and Gerhard Schmidt

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:21

    Content type: Research

    Published on:

  22. According to the encoding and decoding mechanism of binaural cue coding (BCC), in this paper, the speech and noise are considered as left channel signal and right channel signal of the BCC framework, respectiv...

    Authors: Xianyun Wang and Changchun Bao

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:20

    Content type: Research

    Published on:

  23. Phonetic information is one of the most essential components of a speech signal, playing an important role for many speech processing tasks. However, it is difficult to integrate phonetic information into spea...

    Authors: Yi Liu, Liang He, Jia Liu and Michael T. Johnson

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:19

    Content type: Research

    Published on:

  24. A method called joint connectionist temporal classification (CTC)-attention-based speech recognition has recently received increasing focus and has achieved impressive performance. A hybrid end-to-end architec...

    Authors: Chu-Xiong Qin, Wen-Lin Zhang and Dan Qu

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:18

    Content type: Research

    Published on:

  25. Voice conversion (VC) is a technique of exclusively converting speaker-specific information in the source speech while preserving the associated phonemic information. Non-negative matrix factorization (NMF)-ba...

    Authors: Yuki Takashima, Toru Nakashika, Tetsuya Takiguchi and Yasuo Ariki

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:17

    Content type: Research

    Published on:

  26. Search on speech (SoS) is a challenging area due to the huge amount of information stored in audio and video repositories. Spoken term detection (STD) is an SoS-related task aiming to retrieve data from a spee...

    Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez, Ana R. Montalvo, Jose M. Ramirez, Mikel Peñagarikano and Luis Javier Rodriguez-Fuentes

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:16

    Content type: Research

    Published on:

  27. Voice-enabled interaction systems in domestic environments have attracted significant interest recently, being the focus of smart home research projects and commercial voice assistant home devices. Within the ...

    Authors: Panagiotis Giannoulis, Gerasimos Potamianos and Petros Maragos

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:15

    Content type: Research

    Published on:

  28. Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may ...

    Authors: Mohit Shah, Ming Tu, Visar Berisha, Chaitali Chakrabarti and Andreas Spanias

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:14

    Content type: Research

    Published on:

  29. The huge amount of information stored in audio and video repositories makes search on speech (SoS) a priority area nowadays. Within SoS, Query-by-Example Spoken Term Detection (QbE STD) aims to retrieve data f...

    Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez, Mikel Peñagarikano, Luis Javier Rodriguez-Fuentes and Antonio Moreno-Sandoval

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:13

    Content type: Research

    Published on:

  30. In this paper, we apply a latent class model (LCM) to the task of speaker diarization. LCM is similar to Patrick Kenny’s variational Bayes (VB) method in that it uses soft information and avoids premature hard...

    Authors: Liang He, Xianhong Chen, Can Xu, Yi Liu, Jia Liu and Michael T. Johnson

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:12

    Content type: Research

    Published on:

  31. We propose a new method for music detection from broadcasting contents using the convolutional neural networks with a Mel-scale kernel. In this detection task, music segments should be annotated from the broad...

    Authors: Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim and Oh-Wook Kwon

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:11

    Content type: Research

    Published on:

  32. Singing voice analysis has been a topic of research to assist several applications in the domain of music information retrieval system. One such major area is singer identification (SID). There has been enormo...

    Authors: Deepali Y. Loni and Shaila Subbaraman

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:10

    Content type: Research

    Published on:

  33. Audio signals represent a wide diversity of acoustic events, from background environmental noise to spoken communication. Machine learning models such as neural networks have already been proposed for audio si...

    Authors: Diego de Benito-Gorron, Alicia Lozano-Diez, Doroteo T. Toledano and Joaquin Gonzalez-Rodriguez

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:9

    Content type: Research

    Published on:

  34. There are many studies on detecting human speech from artificially generated speech and automatic speaker verification (ASV) that aim to detect and identify whether the given speech belongs to a given speaker....

    Authors: Zeyan Oo, Longbiao Wang, Khomdet Phapatanaburi, Meng Liu, Seiichi Nakagawa, Masahiro Iwahashi and Jianwu Dang

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:8

    Content type: Research

    Published on:

  35. In this paper, an adaptive averaging a priori SNR estimation employing critical band processing is proposed. The proposed method modifies the current decision-directed a priori SNR estimation to achieve faster...

    Authors: Lara Nahma, Pei Chee Yong, Hai Huyen Dam and Sven Nordholm

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:7

    Content type: Research

    Published on:

  36. In response to renewed interest in virtual and augmented reality, the need for high-quality spatial audio systems has emerged. The reproduction of immersive and realistic virtual sound requires high resolution...

    Authors: Zamir Ben-Hur, David Lou Alon, Boaz Rafaely and Ravish Mehra

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:5

    Content type: Research

    Published on:

  37. This paper proposes two novel linguistic features extracted from text input for prosody generation in a Mandarin text-to-speech system. The first feature is the punctuation confidence (PC), which measures the ...

    Authors: Chen-Yu Chiang, Yu-Ping Hung, Han-Yun Yeh, I-Bin Liao and Chen-Ming Pan

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:4

    Content type: Research

    Published on:

  38. Current automatic speech recognition (ASR) systems achieve over 90–95% accuracy, depending on the methodology applied and datasets used. However, the level of accuracy decreases significantly when the same ASR...

    Authors: Kacper Radzikowski, Robert Nowak, Le Wang and Osamu Yoshie

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:3

    Content type: Research

    Published on:

  39. Filter banks on spectrums play an important role in many audio applications. Traditionally, the filters are linearly distributed on perceptual frequency scale such as Mel scale. To make the output smoother, th...

    Authors: Teng Zhang and Ji Wu

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:1

    Content type: Research

    Published on:

  40. This paper deals with a project of Automatic Bird Species Recognition Based on Bird Vocalization. Eighteen bird species of 6 different families were analyzed. At first, human factor cepstral coefficients repre...

    Authors: Jiri Stastny, Michal Munk and Lubos Juranek

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:19

    Content type: Research

    Published on:

  41. A transfer learning-based end-to-end speech recognition approach is presented in two levels in our framework. Firstly, a feature extraction approach combining multilingual deep neural network (DNN) training wi...

    Authors: Chu-Xiong Qin, Dan Qu and Lian-Hai Zhang

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:18

    Content type: Research

    Published on:

  42. In this paper, a web-based spoken dialog generation environment which enables users to edit dialogs with a video virtual assistant is developed and to also select the 3D motions and tone of voice for the assis...

    Authors: Ryota Nishimura, Daisuke Yamamoto, Takahiro Uchiya and Ichi Takumi

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:17

    Content type: Research

    Published on:

  43. In this paper, a robust and highly imperceptible audio watermarking technique is presented based on discrete cosine transform (DCT) and singular value decomposition (SVD). The low-frequency components of the a...

    Authors: Aniruddha Kanhe and Aghila Gnanasekaran

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:16

    Content type: Research

    Published on:

  44. The emerging field of computational acoustic monitoring aims at retrieving high-level information from acoustic scenes recorded by some network of sensors. These networks gather large amounts of data requiring...

    Authors: Vincent Lostanlen, Grégoire Lafay, Joakim Andén and Mathieu Lagrange

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:15

    Content type: Research

    Published on:

  45. Several factors contribute to the performance of speaker diarization systems. For instance, the appropriate selection of speech features is one of the key aspects that affect speaker diarization systems. The o...

    Authors: Abraham Woubie Zewoudie, Jordi Luque and Javier Hernando

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:14

    Content type: Research

    Published on:

  46. Recently, sound recognition has been used to identify sounds, such as the sound of a car, or a river. However, sounds have nuances that may be better described by adjective-noun pairs such as “slow car” and ve...

    Authors: Sebastian Säger, Benjamin Elizalde, Damian Borth, Christian Schulze, Bhiksha Raj and Ian Lane

    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:12

    Content type: Research

    Published on:

Latest Tweets

Your browser needs to have JavaScript enabled to view this timeline

Who reads the journal?

Learn more about the impact the EURASIP Journal on Audio, Speech, and Music Processing has worldwide

Annual Journal Metrics

Funding your APC

​​​​​​​Open access funding and policy support by SpringerOpen​​

​​​​We offer a free open access support service to make it easier for you to discover and apply for article-processing charge (APC) funding. Learn more here