Articles

Page 5 of 11

Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition

Voice conversion (VC) is a technique of exclusively converting speaker-specific information in the source speech while preserving the associated phonemic information. Non-negative matrix factorization (NMF)-ba...

Authors: Yuki Takashima, Toru Nakashika, Tetsuya Takiguchi and Yasuo Ariki

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:17

Content type: Research Published on: 11 September 2019
- View Full Text
- View PDF
ALBAYZIN 2018 spoken term detection evaluation: a multi-domain international evaluation in Spanish

Search on speech (SoS) is a challenging area due to the huge amount of information stored in audio and video repositories. Spoken term detection (STD) is an SoS-related task aiming to retrieve data from a spee...

Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez, Ana R. Montalvo, Jose M. Ramirez, Mikel Peñagarikano and Luis Javier Rodriguez-Fuentes

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:16

Content type: Research Published on: 2 September 2019
- View Full Text
- View PDF
Room-localized speech activity detection in multi-microphone smart homes

Voice-enabled interaction systems in domestic environments have attracted significant interest recently, being the focus of smart home research projects and commercial voice assistant home devices. Within the ...

Authors: Panagiotis Giannoulis, Gerasimos Potamianos and Petros Maragos

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:15

Content type: Research Published on: 27 August 2019
- View Full Text
- View PDF
Articulation constrained learning with application to speech emotion recognition

Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may ...

Authors: Mohit Shah, Ming Tu, Visar Berisha, Chaitali Chakrabarti and Andreas Spanias

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:14

Content type: Research Published on: 20 August 2019
- View Full Text
- View PDF
Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation

The huge amount of information stored in audio and video repositories makes search on speech (SoS) a priority area nowadays. Within SoS, Query-by-Example Spoken Term Detection (QbE STD) aims to retrieve data f...

Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez, Mikel Peñagarikano, Luis Javier Rodriguez-Fuentes and Antonio Moreno-Sandoval

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:13

Content type: Research Published on: 19 July 2019
- View Full Text
- View PDF
Latent class model with application to speaker diarization

In this paper, we apply a latent class model (LCM) to the task of speaker diarization. LCM is similar to Patrick Kenny’s variational Bayes (VB) method in that it uses soft information and avoids premature hard...

Authors: Liang He, Xianhong Chen, Can Xu, Yi Liu, Jia Liu and Michael T. Johnson

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:12

Content type: Research Published on: 9 July 2019
- View Full Text
- View PDF
Music detection from broadcast contents using convolutional neural networks with a Mel-scale kernel

We propose a new method for music detection from broadcasting contents using the convolutional neural networks with a Mel-scale kernel. In this detection task, music segments should be annotated from the broad...

Authors: Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim and Oh-Wook Kwon

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:11

Content type: Research Published on: 26 June 2019
- View Full Text
- View PDF
Robust singer identification of Indian playback singers

Singing voice analysis has been a topic of research to assist several applications in the domain of music information retrieval system. One such major area is singer identification (SID). There has been enormo...

Authors: Deepali Y. Loni and Shaila Subbaraman

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:10

Content type: Research Published on: 17 June 2019
- View Full Text
- View PDF
Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset

Audio signals represent a wide diversity of acoustic events, from background environmental noise to spoken communication. Machine learning models such as neural networks have already been proposed for audio si...

Authors: Diego de Benito-Gorron, Alicia Lozano-Diez, Doroteo T. Toledano and Joaquin Gonzalez-Rodriguez

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:9

Content type: Research Published on: 17 June 2019
- View Full Text
- View PDF
Replay attack detection with auditory filter-based relative phase features

There are many studies on detecting human speech from artificially generated speech and automatic speaker verification (ASV) that aim to detect and identify whether the given speech belongs to a given speaker....

Authors: Zeyan Oo, Longbiao Wang, Khomdet Phapatanaburi, Meng Liu, Seiichi Nakagawa, Masahiro Iwahashi and Jianwu Dang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:8

Content type: Research Published on: 10 June 2019
- View Full Text
- View PDF
An adaptive a priori SNR estimator for perceptual speech enhancement

In this paper, an adaptive averaging a priori SNR estimation employing critical band processing is proposed. The proposed method modifies the current decision-directed a priori SNR estimation to achieve faster...

Authors: Lara Nahma, Pei Chee Yong, Hai Huyen Dam and Sven Nordholm

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:7

Content type: Research Published on: 7 June 2019
- View Full Text
- View PDF
Feature trajectory dynamic time warping for clustering of speech segments

Dynamic time warping (DTW) can be used to compute the similarity between two sequences of generally differing length. We propose a modification to DTW that performs individual and independent pairwise alignmen...

Authors: Lerato Lerato and Thomas Niesler

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:6

Content type: Research Published on: 4 April 2019
- View Full Text
- View PDF
Loudness stability of binaural sound with spherical harmonic representation of sparse head-related transfer functions

In response to renewed interest in virtual and augmented reality, the need for high-quality spatial audio systems has emerged. The reproduction of immersive and realistic virtual sound requires high resolution...

Authors: Zamir Ben-Hur, David Lou Alon, Boaz Rafaely and Ravish Mehra

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:5

Content type: Research Published on: 15 March 2019
- View Full Text
- View PDF
Punctuation-generation-inspired linguistic features for Mandarin prosody generation

This paper proposes two novel linguistic features extracted from text input for prosody generation in a Mandarin text-to-speech system. The first feature is the punctuation confidence (PC), which measures the ...

Authors: Chen-Yu Chiang, Yu-Ping Hung, Han-Yun Yeh, I-Bin Liao and Chen-Ming Pan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:4

Content type: Research Published on: 21 February 2019
- View Full Text
- View PDF
Dual supervised learning for non-native speech recognition

Current automatic speech recognition (ASR) systems achieve over 90–95% accuracy, depending on the methodology applied and datasets used. However, the level of accuracy decreases significantly when the same ASR...

Authors: Kacper Radzikowski, Robert Nowak, Le Wang and Osamu Yoshie

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:3

Content type: Research Published on: 14 January 2019
- View Full Text
- View PDF
Decision tree SVM model with Fisher feature selection for speech emotion recognition

The overall recognition rate will reduce due to the increase of emotional confusion in multiple speech emotion recognition. To solve the problem, we propose a speech emotion recognition method based on the dec...

Authors: Linhui Sun, Sheng Fu and Fu Wang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:2

Content type: Research Published on: 7 January 2019
- View Full Text
- View PDF
Discriminative frequency filter banks learning with neural networks

Filter banks on spectrums play an important role in many audio applications. Traditionally, the filters are linearly distributed on perceptual frequency scale such as Mel scale. To make the output smoother, th...

Authors: Teng Zhang and Ji Wu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2019 2019:1

Content type: Research Published on: 3 January 2019
- View Full Text
- View PDF
Automatic bird species recognition based on birds vocalization

This paper deals with a project of Automatic Bird Species Recognition Based on Bird Vocalization. Eighteen bird species of 6 different families were analyzed. At first, human factor cepstral coefficients repre...

Authors: Jiri Stastny, Michal Munk and Lubos Juranek

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:19

Content type: Research Published on: 14 December 2018
- View Full Text
- View PDF
Towards end-to-end speech recognition with transfer learning

A transfer learning-based end-to-end speech recognition approach is presented in two levels in our framework. Firstly, a feature extraction approach combining multilingual deep neural network (DNN) training wi...

Authors: Chu-Xiong Qin, Dan Qu and Lian-Hai Zhang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:18

Content type: Research Published on: 21 November 2018
- View Full Text
- View PDF
Web-based environment for user generation of spoken dialog for virtual assistants

In this paper, a web-based spoken dialog generation environment which enables users to edit dialogs with a video virtual assistant is developed and to also select the 3D motions and tone of voice for the assis...

Authors: Ryota Nishimura, Daisuke Yamamoto, Takahiro Uchiya and Ichi Takumi

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:17

Content type: Research Published on: 16 November 2018
- View Full Text
- View PDF
Robust image-in-audio watermarking technique based on DCT-SVD transform

In this paper, a robust and highly imperceptible audio watermarking technique is presented based on discrete cosine transform (DCT) and singular value decomposition (SVD). The low-frequency components of the a...

Authors: Aniruddha Kanhe and Aghila Gnanasekaran

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:16

Content type: Research Published on: 1 October 2018
- View Full Text
- View PDF
Relevance-based quantization of scattering features for unsupervised mining of environmental audio

The emerging field of computational acoustic monitoring aims at retrieving high-level information from acoustic scenes recorded by some network of sensors. These networks gather large amounts of data requiring...

Authors: Vincent Lostanlen, Grégoire Lafay, Joakim Andén and Mathieu Lagrange

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:15

Content type: Research Published on: 29 September 2018
- View Full Text
- View PDF
The use of long-term features for GMM- and i-vector-based speaker diarization systems

Several factors contribute to the performance of speaker diarization systems. For instance, the appropriate selection of speech features is one of the key aspects that affect speaker diarization systems. The o...

Authors: Abraham Woubie Zewoudie, Jordi Luque and Javier Hernando

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:14

Content type: Research Published on: 26 September 2018
- View Full Text
- View PDF
From raw audio to a seamless mix: creating an automated DJ system for Drum and Bass

We present the open-source implementation of the first fully automatic and comprehensive DJ system, able to generate seamless music mixes using songs from a given library much like a human DJ does.

Authors: Len Vande Veire and Tijl De Bie

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:13

Content type: Research Published on: 24 September 2018
- View Full Text
- View PDF
AudioPairBank: towards a large-scale tag-pair-based audio content analysis

Recently, sound recognition has been used to identify sounds, such as the sound of a car, or a river. However, sounds have nuances that may be better described by adjective-noun pairs such as “slow car” and ve...

Authors: Sebastian Säger, Benjamin Elizalde, Damian Borth, Christian Schulze, Bhiksha Raj and Ian Lane

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:12

Content type: Research Published on: 15 September 2018
- View Full Text
- View PDF
Piano multipitch estimation using sparse coding embedded deep learning

As the foundation of many applications, multipitch estimation problem has always been the focus of acoustic music processing; however, existing algorithms perform deficiently due to its complexity. In this pap...

Authors: Xingda Li, Yujing Guan, Yingnian Wu and Zhongbo Zhang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:11

Content type: Research Published on: 12 September 2018
- View Full Text
- View PDF
Enhancement of speech dynamics for voice activity detection using DNN

Voice activity detection (VAD) is an important preprocessing step for various speech applications to identify speech and non-speech periods in input signals. In this paper, we propose a deep neural network (DN...

Authors: Suci Dwijayanti, Kei Yamamori and Masato Miyoshi

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:10

Content type: Research Published on: 12 September 2018
- View Full Text
- View PDF
Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments

The performance of automatic speech recognition systems degrades in the presence of emotional states and in adverse environments (e.g., noisy conditions). This greatly limits the deployment of speech recogniti...

Authors: Meysam Bashirpour and Masoud Geravanchizadeh

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:9

Content type: Research Published on: 28 August 2018
- View Full Text
- View PDF
An artificial patient for pure-tone audiometry

The successful treatment of hearing loss depends on the individual practitioner’s experience and skill. So far, there is no standard available to evaluate the practitioner’s testing skills. To assess every pra...

Authors: Alexander Kocian, Guido Cattani, Stefano Chessa and Wilko Grolman

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:8

Content type: Research Published on: 27 July 2018
- View Full Text
- View PDF
Wind noise reduction for a closely spaced microphone array in a car environment

This work studies a wind noise reduction approach for communication applications in a car environment. An endfire array consisting of two microphones is considered as a substitute for an ordinary cardioid micr...

Authors: Simon Grimm and Jürgen Freudenberger

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:7

Content type: Research Published on: 27 July 2018
- View Full Text
- View PDF
Advanced recurrent network-based hybrid acoustic models for low resource speech recognition

Recurrent neural networks (RNNs) have shown an ability to model temporal dependencies. However, the problem of exploding or vanishing gradients has limited their application. In recent years, long short-term m...

Authors: Jian Kang, Wei-Qiang Zhang, Wei-Wei Liu, Jia Liu and Michael T. Johnson

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:6

Content type: Research Published on: 17 July 2018
- View Full Text
- View PDF
A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model

In this paper, a novel parametric prosody coding approach for Mandarin speech is proposed. It employs a hierarchical prosodic model (HPM) as a prosody-generating model in the encoder to analyze the speech pros...

Authors: Chen-Yu Chiang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:5

Content type: Research Published on: 11 July 2018
- View Full Text
- View PDF
Learning long-term filter banks for audio source separation and audio scene classification

Filter banks on short-time Fourier transform (STFT) spectrogram have long been studied to analyze and process audios. The frameshift in STFT procedure determines the temporal resolution. However, in many discr...

Authors: Teng Zhang and Ji Wu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:4

Content type: Research Published on: 30 May 2018
- View Full Text
- View PDF
Speech intelligibility improvement in noisy reverberant environments based on speech enhancement and inverse filtering

The speech intelligibility of indoor public address systems is degraded by reverberation and background noise. This paper proposes a preprocessing method that combines speech enhancement and inverse filtering ...

Authors: Huan-Yu Dong and Chang-Myung Lee

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:3

Content type: Research Published on: 23 May 2018
- View Full Text
- View PDF
ALBAYZIN Query-by-example Spoken Term Detection 2016 evaluation

Query-by-example Spoken Term Detection (QbE STD) aims to retrieve data from a speech repository given an acoustic (spoken) query containing the term of interest as the input. This paper presents the systems su...

Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez, Jorge Proença, Fernando Perdigão, Fernando García-Granada, Emilio Sanchis, Anna Pompili and Alberto Abad

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:2

Content type: Research Published on: 13 April 2018
- View Full Text
- View PDF
Automatic segmentation of infant cry signals using hidden Markov models

Automatic extraction of acoustic regions of interest from recordings captured in realistic clinical environments is a necessary preprocessing step in any cry analysis system. In this study, we propose a hidden...

Authors: Gaurav Naithani, Jaana Kivinummi, Tuomas Virtanen, Outi Tammela, Mikko J. Peltola and Jukka M. Leppänen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2018 2018:1

Content type: Research Published on: 26 January 2018
- View Full Text
- View PDF
Clustering algorithm for audio signals based on the sequential Psim matrix and Tabu Search

Audio signals are a type of high-dimensional data, and their clustering is critical. However, distance calculation failures, inefficient index trees, and cluster overlaps, derived from the equidistance, redund...

Authors: Wenfa Li, Gongming Wang and Ke Li

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:26

Content type: Research Published on: 4 December 2017
- View Full Text
- View PDF
Robust noise power spectral density estimation for binaural speech enhancement in time-varying diffuse noise field

In speech enhancement, noise power spectral density (PSD) estimation plays a key role in determining appropriate de-nosing gains. In this paper, we propose a robust noise PSD estimator for binaural speech enha...

Authors: Youna Ji, Yonghyun Baek and Young-cheol Park

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:25

Content type: Research Published on: 29 November 2017
- View Full Text
- View PDF
Classification-based spoken text selection for LVCSR language modeling

Large vocabulary continuous speech recognition (LVCSR) has naturally been demanded for transcribing daily conversations, while developing spoken text data to train LVCSR is costly and time-consuming. In this p...

Authors: Vataya Chunwijitra and Chai Wutiwiwatchai

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:24

Content type: Research Published on: 17 October 2017
- View Full Text
- View PDF
A robust polynomial regression-based voice activity detector for speaker verification

Robustness against background noise is a major research area for speech-related applications such as speech recognition and speaker recognition. One of the many solutions for this problem is to detect speech-d...

Authors: Gökay Dişken, Zekeriya Tüfekci and Ulus Çevik

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:23

Content type: Research Published on: 11 October 2017
- View Full Text
- View PDF
ALBAYZIN 2016 spoken term detection evaluation: an international open competitive evaluation in Spanish

Within search-on-speech, Spoken Term Detection (STD) aims to retrieve data from a speech repository given a textual representation of a search term. This paper presents an international open evaluation for sea...

Authors: Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, Laura Docio-Fernandez, Luis Serrano, Inma Hernaez, Alejandro Coucheiro-Limeres, Javier Ferreiros, Julia Olcoz and Jorge Llombart

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:22

Content type: Research Published on: 29 September 2017
- View Full Text
- View PDF
Integration of evolutionary computation algorithms and new AUTO-TLBO technique in the speaker clustering stage for speaker diarization of broadcast news

The task of speaker diarization is to answer the question "who spoke when?" In this paper, we present different clustering approaches which consist of Evolutionary Computation Algorithms (ECAs) such as Genetic...

Authors: Karim Dabbabi, Salah Hajji and Adnen Cherif

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:21

Content type: Research Published on: 19 September 2017
- View Full Text
- View PDF
Speech encryption using chaotic shift keying for secured speech communication

This paper throws light on chaotic shift keying-based speech encryption and decryption method. In this method, the input speech signals are sampled and its values are segmented into four levels, namely L ...

Authors: P. Sathiyamurthi and S. Ramakrishnan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:20

Content type: Research Published on: 7 September 2017
- View Full Text
- View PDF
On the perception of “segmental intonation”: F0 context effects on sibilant identification in German

In normal modally voiced utterances, voiceless fricatives like [s], [ʃ], [f], and [x] vary such that their aperiodic pitch impressions mirror the pitch level of the adjacent F0 contour. For instance, if the F0...

Authors: Oliver Niebuhr

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:19

Content type: Research Published on: 2 August 2017
- View Full Text
- View PDF
Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform

An artificial neural network is an important model for training features of voice conversion (VC) tasks. Typically, neural networks (NNs) are very effective in processing nonlinear features, such as Mel Cepstr...

Authors: Zhaojie Luo, Jinhui Chen, Tetsuya Takiguchi and Yasuo Ariki

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:18

Content type: Research Published on: 1 August 2017
- View Full Text
- View PDF
Efficient music identification using ORB descriptors of the spectrogram image

Audio fingerprinting has been an active research field typically used for music identification. Robust audio fingerprinting technology is used to successfully perform content-based audio identification regardl...

Authors: Dominic Williams, Akash Pooransingh and Jesse Saitoo

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:17

Content type: Research Published on: 11 July 2017
- View Full Text
- View PDF
Speaker-adaptive-trainable Boltzmann machine and its application to non-parallel voice conversion

In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. Voice conversion is a technique where only speaker-specific information in the source spee...

Authors: Toru Nakashika and Yasuhiro Minami

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:16

Content type: Research Published on: 29 June 2017
- View Full Text
- View PDF
Interactive user correction of automatically detected onsets: approach and evaluation

Onset detection still has room for improvement, especially when dealing with polyphonic music signals. For certain purposes in which the correctness of the result is a must, user intervention is hence required...

Authors: Jose J. Valero-Mas and José M. Iñesta

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:15

Content type: Research Published on: 27 June 2017
- View Full Text
- View PDF
Reducing over-smoothness in HMM-based speech synthesis using exemplar-based voice conversion

Speech synthesis has been applied in many kinds of practical applications. Currently, state-of-the-art speech synthesis uses statistical methods based on hidden Markov model (HMM). Speech synthesized by statis...

Authors: Gia-Nhu Nguyen and Trung-Nghia Phung

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:14

Content type: Research Published on: 24 June 2017
- View Full Text
- View PDF
Autocorrelation-based noise subtraction method with smoothing, overestimation, energy, and cepstral mean and variance normalization for noisy speech recognition

Autocorrelation domain is a proper domain for clean speech signal and noise separation. In this paper, a method is proposed to decrease effects of noise on the clean speech signal, autocorrelation-based noise ...

Authors: Gholamreza Farahani

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2017 2017:13

Content type: Research Published on: 21 June 2017
- View Full Text
- View PDF

How was your experience today?

Rating Please select one rating

Awful

Bad

Good

Great

Thank you for your feedback.

Tell us why (opens in a new tab)

Articles

Follow

Affiliated with

Annual Journal Metrics

Funding your APC

​​​​​​​​​