Articles

Page 8 of 11

Music selection interface for car audio system using SOM with personal distance function

Devices such as smart phones and tablet PCs of various sizes have become increasingly popular, finding new applications, including in-car audio systems. This paper proposes a new car audio system. In the archi...

Authors: Ning-Han Liu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:20

Content type: Research Published on: 10 July 2013
- View Full Text
- View PDF
On the use of speech parameter contours for emotion recognition

Many features have been proposed for speech-based emotion recognition, and a majority of them are frame based or statistics estimated from frame-based features. Temporal information is typically modelled on a ...

Authors: Vidhyasaharan Sethu, Eliathamby Ambikairajah and Julien Epps

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:19

Content type: Research Published on: 10 July 2013
- View Full Text
- View PDF
Bayesian group sparse learning for music source separation

Nonnegative matrix factorization (NMF) is developed for parts-based representation of nonnegative signals with the sparseness constraint. The signals are adequately represented by a set of basis vectors and th...

Authors: Jen-Tzung Chien and Hsin-Lung Hsieh

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:18

Content type: Research Published on: 5 July 2013
- View Full Text
- View PDF
Classification of speech under stress based on modeling of the vocal folds and vocal tract

In this study, we focus on the classification of neutral and stressed speech based on a physical model. In order to represent the characteristics of the vocal folds and vocal tract during the process of speech...

Authors: Xiao Yao, Takatoshi Jitsuhiro, Chiyomi Miyajima, Norihide Kitaoka and Kazuya Takeda

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:17

Content type: Research Published on: 5 July 2013
- View Full Text
- View PDF
Acoustic-visual synthesis technique using bimodal unit-selection

This paper presents a bimodal acoustic-visual synthesis technique that concurrently generates the acoustic speech signal and a 3D animation of the speaker’s outer face. This is done by concatenating bimodal di...

Authors: Slim Ouni, Vincent Colotte, Utpala Musti, Asterios Toutios, Brigitte Wrobel-Dautcourt, Marie-Odile Berger and Caroline Lavecchia

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:16

Content type: Research Published on: 27 June 2013
- View Full Text
- View PDF
Reassigned spectrum-based feature extraction for GMM-based automatic chord recognition

Most existing automatic chord recognition systems use a chromagram in front-end processing and some sort of classifier (e.g., hidden Markov model, Gaussian mixture model (GMM), support vector machine, or other...

Authors: Maksim Khadkevich and Maurizio Omologo

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:15

Content type: Research Published on: 27 June 2013
- View Full Text
- View PDF
An iterative model-based approach to cochannel speech separation

Cochannel speech separation aims to separate two speech signals from a single mixture. In a supervised scenario, the identities of two speakers are given, and current methods use pre-trained speaker models for...

Authors: Ke Hu and DeLiang Wang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:14

Content type: Research Published on: 26 June 2013
- View Full Text
- View PDF
Music classification by low-rank semantic mappings

A challenging open question in music classification is which music representation (i.e., audio features) and which machine learning algorithm is appropriate for a specific music classification task. To address...

Authors: Yannis Panagakis and Constantine Kotropoulos

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:13

Content type: Research Published on: 24 June 2013
- View Full Text
- View PDF
Compensation of SNR and noise type mismatch using an environmental sniffing based speech recognition solution

Multiple-model based speech recognition (MMSR) has been shown to be quite successful in noisy speech recognition. Since it employs multiple hidden Markov model (HMM) sets that correspond to various noise types...

Authors: Yongjoo Chung and John HL Hansen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:12

Content type: Research Published on: 20 June 2013
- View Full Text
- View PDF
Music content authentication based on beat segmentation and fuzzy classification

Digital audio has been ubiquitous over the past decade. Since it can be easily modified by editing tools, there has been a strong need to protect its content for secure multimedia applications. Previous audio ...

Authors: Wei Li, Xiu Zhang and Zhurong Wang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:11

Content type: Research Published on: 18 June 2013
- View Full Text
- View PDF
An audio watermark-based speech bandwidth extension method

A novel speech bandwidth extension method based on audio watermark is presented in this paper. The time-domain and frequency-domain envelope parameters are extracted from the high-frequency components of speec...

Authors: Zhe Chen, Chengyong Zhao, Guosheng Geng and Fuliang Yin

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:10

Content type: Research Published on: 6 June 2013
- View Full Text
- View PDF
Context-based adaptive arithmetic coding in time and frequency domain for the lossless compression of audio coding parameters at variable rate

This paper presents a novel lossless compression technique of the context-based adaptive arithmetic coding which can be used to further compress the quantized parameters in audio codec. The key feature of the ...

Authors: Jing Wang, Xuan Ji, Shenghui Zhao, Xiang Xie and Jingming Kuang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:9

Content type: Research Published on: 21 May 2013
- View Full Text
- View PDF
Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech

This article analyzes and compares influence of different types of spectral and prosodic features for Czech and Slovak emotional speech classification based on Gaussian mixture models (GMM). Influence of initi...

Authors: Jiří Přibil and Anna Přibilová

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:8

Content type: Research Published on: 24 April 2013
- View Full Text
- View PDF
Speaker adaptation in the maximum a posteriori framework based on the probabilistic 2-mode analysis of training models

In this article, we describe a speaker adaptation method based on the probabilistic 2-mode analysis of training models. Probabilistic 2-mode analysis is a probabilistic extension of multilinear analysis. We ap...

Authors: Yongwon Jeong

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:7

Content type: Research Published on: 11 April 2013
- View Full Text
- View PDF
High level feature extraction for the self-taught learning algorithm

Availability of large amounts of raw unlabeled data has sparked the recent surge in semi-supervised learning research. In most works, however, it is assumed that labeled and unlabeled data come from the same d...

Authors: Konstantin Markov and Tomoko Matsui

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:6

Content type: Research Published on: 9 April 2013
- View Full Text
- View PDF
A comprehensive system for facial animation of generic 3D head models driven by speech

A comprehensive system for facial animation of generic 3D head models driven by speech is presented in this article. In the training stage, audio-visual information is extracted from audio-visual training data...

Authors: Lucas D Terissi, Mauricio Cerda, Juan C Gómez, Nancy Hitschfeld-Kahler and Bernard Girau

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:5

Content type: Research Published on: 1 February 2013
- View Full Text
- View PDF
Nonparametric Bayesian sparse factor analysis for frequency domain blind source separation without permutation ambiguity

Blind source separation (BSS) and sound activity detection (SAD) from a sound source mixture with minimum prior information are two major requirements for computational auditory scene analysis that recognizes ...

Authors: Kohei Nagira, Takuma Otsuka and Hiroshi G Okuno

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:4

Content type: Research Published on: 22 January 2013
- View Full Text
- View PDF
An efficient solution to sparse linear prediction analysis of speech

We propose an efficient solution to the problem of sparse linear prediction analysis of the speech signal. Our method is based on minimization of a weighted l₂-norm of the prediction error. The weighting function...

Authors: Vahid Khanagha and Khalid Daoudi

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:3

Content type: Research Published on: 22 January 2013
- View Full Text
- View PDF
Improved monaural speech segregation based on computational auditory scene analysis

A lot of effort has been made in Computational Auditory Scene Analysis (CASA) to segregate target speech from monaural mixtures. Based on the principle of CASA, this article proposes an improved algorithm for ...

Authors: Wang Yu, Lin Jiajun, Chen Ning and Yuan Wenhao

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:2

Content type: Research Published on: 12 January 2013
- View Full Text
- View PDF
Context-dependent sound event detection

The work presented in this article studies how the context information can be used in the automatic sound event detection process, and how the detection system can benefit from such information. Humans are usi...

Authors: Toni Heittola, Annamaria Mesaros, Antti Eronen and Tuomas Virtanen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2013 2013:1

Content type: Research Published on: 9 January 2013
- View Full Text
- View PDF
A perceptual masking approach for noise robust speech recognition

This article describes a modified technique for enhancing noisy speech to improve automatic speech recognition (ASR) performance. The proposed approach improves the widely used spectral subtraction which inher...

Authors: Hari Krishna Maganti and Marco Matassoni

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:29

Content type: Research Published on: 22 December 2012
- View Full Text
- View PDF
A study on the consistency analysis of energy parameter for Mandarin speech

In this study, a consistency analysis of energy parameter for Mandarin speech is presented. Identified as a result of inspection of the human pronunciation process, the consistency can be interpreted as a high...

Authors: Li-Te Shen, Cheng-Yu Yeh and Shaw-Hwa Hwang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:28

Content type: Research Published on: 17 December 2012
- View Full Text
- View PDF
Estimation and quantization of ICC-dependent phase parameters for parametric stereo audio coding

Conventional parametric stereo (PS) audio coding employs inter-channel phase difference and overall phase difference as phase parameters. In this article, it is shown that those parameters cannot correctly rep...

Authors: Dong-il Hyun, Young-cheol Park and Dae Hee Youn

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:27

Content type: Research Published on: 16 November 2012
- View Full Text
- View PDF
Stereophonic hands-free communication system based on microphone array fixed beamforming: real-time implementation and evaluation

In this article, the authors propose an optimally designed fixed beamformer (BF) for stereophonic acoustic echo cancelation (SAEC) in real hands-free communication applications. Several contributions related t...

Authors: Matteo Pirro, Stefano Squartini, Laura Romoli and Francesco Piazza

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:26

Content type: Research Published on: 22 October 2012
- View Full Text
- View PDF
Comparative study of digital audio steganography techniques

The rapid spread in digital data usage in many real life applications have urged new and effective ways to ensure their security. Efficient secrecy can be achieved, at least in part, by implementing steganogra...

Authors: Fatiha Djebbar, Beghdad Ayad, Karim Abed Meraim and Habib Hamam

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:25

Content type: Review Published on: 9 October 2012
- View Full Text
- View PDF
Expressed music mood classification compared with valence and arousal ratings

Mood is an important aspect of music and knowledge of mood can be used as a basic feature in music recommender and retrieval systems. A listening experiment was carried out establishing ratings for various moo...

Authors: Bert den Brinker, Ralph van Dinther and Janto Skowronek

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:24

Content type: Research Published on: 3 October 2012
- View Full Text
- View PDF
An evolutionary feature synthesis approach for content-based audio retrieval

A vast amount of audio features have been proposed in the literature to characterize the content of audio signals. In order to overcome specific problems related to the existing features (such as lack of discrimi...

Authors: Toni Mäkinen, Serkan Kiranyaz, Jenni Raitoharju and Moncef Gabbouj

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:23

Content type: Research Published on: 11 September 2012
- View Full Text
- View PDF
Biomimetic multi-resolution analysis for robust speaker recognition

Humans exhibit a remarkable ability to reliably classify sound sources in the environment even in presence of high levels of noise. In contrast, most engineering systems suffer a drastic drop in performance wh...

Authors: Sridhar Krishna Nemala, Dmitry N Zotkin, Ramani Duraiswami and Mounya Elhilali

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:22

Content type: Research Published on: 7 September 2012
- View Full Text
- View PDF
Speaker-dependent model interpolation for statistical emotional speech synthesis

In this article, we propose a speaker-dependent model interpolation method for statistical emotional speech synthesis. The basic idea is to combine the neutral model set of the target speaker and an emotional ...

Authors: Chih-Yu Hsu and Chia-Ping Chen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:21

Content type: Research Published on: 16 August 2012
- View Full Text
- View PDF
Speech steganography using wavelet and Fourier transforms

A new method to secure speech communication using the discrete wavelet transforms (DWT) and the fast Fourier transform is presented in this article. In the first phase of the hiding technique, we separate the ...

Authors: Siwar Rekik, Driss Guerchi, Sid-Ahmed Selouani and Habib Hamam

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:20

Content type: Research Published on: 8 August 2012
- View Full Text
- View PDF
Speaker diarization of broadcast news in Albayzin 2010 evaluation campaign

In this article, we present the evaluation results for the task of speaker diarization of broadcast news, which was part of the Albayzin 2010 evaluation campaign of language and speech technologies. The evalua...

Authors: Martin Zelenák, Henrik Schulz and Javier Hernando

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:19

Content type: Research Published on: 31 July 2012
- View Full Text
- View PDF
A parameterizable spatiotemporal representation of popular dance styles for humanoid dancing characters

Dance movements are a complex class of human behavior which convey forms of non-verbal and subjective communication that are performed as cultural vocabularies in all human cultures. The singularity of dance f...

Authors: João Lobato Oliveira, Luiz Naveda, Fabien Gouyon, Luis Paulo Reis, Paulo Sousa and Marc Leman

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:18

Content type: Research Published on: 19 June 2012
- View Full Text
- View PDF
Multi-candidate missing data imputation for robust speech recognition

The application of Missing Data Techniques (MDT) to increase the noise robustness of HMM/GMM-based large vocabulary speech recognizers is hampered by a large computational burden. The likelihood evaluations im...

Authors: Yujun Wang and Hugo Van hamme

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:17

Content type: Research Published on: 29 May 2012
- View Full Text
- View PDF
Perceptual audio features for emotion detection

In this article, we propose a new set of acoustic features for automatic emotion recognition from audio. The features are based on the perceptual quality metrics that are given in perceptual evaluation of audi...

Authors: Mehmet Cenk Sezgin, Bilge Gunsel and Gunes Karabulut Kurt

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:16

Content type: Research Published on: 4 May 2012
- View Full Text
- View PDF
Phone lattice reconstruction for embedded language recognition in LVCSR

An increasing number of multilingual applications require language recognition (LRE) as a frontend, but desire low additional computational cost. This article demonstrates a novel architecture for embedding ph...

Authors: Yuxiang Shan, Yan Deng, Jia Liu and Michael T Johnson

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:15

Content type: Research Published on: 13 April 2012
- View Full Text
- View PDF
Using information theoretic distance measures for solving the permutation problem of blind source separation of speech signals

The problem of blind source separation (BSS) of convolved acoustic signals is of great interest for many classes of applications. Due to the convolutive mixing process, the source separation is performed in th...

Authors: Eugen Hoffmann, Dorothea Kolossa, Bert-Uwe Köhler and Reinhold Orglmeister

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:14

Content type: Research Published on: 3 April 2012
- View Full Text
- View PDF
Robust dialogue act detection based on partial sentence tree, derivation rule, and spectral clustering algorithm

A novel approach for robust dialogue act detection in a spoken dialogue system is proposed. Shallow representation named partial sentence trees are employed to represent automatic speech recognition outputs. P...

Authors: Chia-Ping Chen, Chung-Hsien Wu and Wei-Bin Liang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:13

Content type: Research Published on: 3 March 2012
- View Full Text
- View PDF
A role of multi-modal rhythms in physical interaction and cooperation

As fundamental research for human-robot interaction, this paper addresses the rhythmic reference of a human while turning a rope with another human. We hypothyzed that when interpreting rhythm cues to make a r...

Authors: Kenta Yonekura, Chyon Hae Kim, Kazuhiro Nakadai, Hiroshi Tsujino and Shigeki Sugano

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:12

Content type: Research Published on: 2 March 2012
- View Full Text
- View PDF
Transcribing Bach chorales: Limitations and potentials of non-negative matrix factorisation

This article discusses our research on polyphonic music transcription using non-negative matrix factorisation (NMF). The application of NMF in polyphonic transcription offers an alternative approach in which obse...

Authors: Somnuk Phon-Amnuaisuk

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:11

Content type: Research Published on: 27 February 2012
- View Full Text
- View PDF
Decision tree-based acoustic models for speech recognition

This article proposes a new acoustic model using decision trees (DTs) as replacements for Gaussian mixture models (GMM) to compute the observation likelihoods for a given hidden Markov model state in a speech ...

Authors: Masami Akamine and Jitendra Ajmera

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:10

Content type: Research Published on: 17 February 2012
- View Full Text
- View PDF
Force-feedback interaction with a neural oscillator model: for shared human-robot control of a virtual percussion instrument

A study on force-feedback interaction with a model of a neural oscillator provides insight into enhanced human-robot interactions for controlling musical sound. We provide differential equations and discrete-t...

Authors: Edgar Berdahl, Claude Cadoz and Nicolas Castagné

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:9

Content type: Research Published on: 8 February 2012
- View Full Text
- View PDF
Combined perception and control for timing in robotic music performances

Interaction with human musicians is a challenging task for robots as it involves online perception and precise synchronization. In this paper, we present a consistent and theoretically sound framework for comb...

Authors: Umut Şimşekli, Orhan Sönmez, Barış Kurt Kurt and Ali Taylan Cemgil

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:8

Content type: Research Published on: 3 February 2012
- View Full Text
- View PDF
DWT and LPC based feature extraction methods for isolated word recognition

In this article, new feature extraction methods, which utilize wavelet decomposition and reduced order linear predictive coding (LPC) coefficients, have been proposed for speech recognition. The coefficients h...

Authors: Navnath S Nehe and Raghunath S Holambe

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:7

Content type: Research Published on: 30 January 2012
- View Full Text
- View PDF
A multimodal tempo and beat-tracking system based on audiovisual information from live guitar performances

The aim of this paper is to improve beat-tracking for live guitar performances. Beat-tracking is a function to estimate musical measurements, for example musical tempo and phase. This method is critical to ach...

Authors: Tatsuhiko Itohara, Takuma Otsuka, Takeshi Mizumoto, Angelica Lim, Tetsuya Ogata and Hiroshi G Okuno

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:6

Content type: Research Published on: 20 January 2012
- View Full Text
- View PDF
Music-aided affective interaction between human and service robot

This study proposes a music-aided framework for affective interaction of service robots with humans. The framework consists of three systems, respectively, for perception, memory, and expression on the basis o...

Authors: Jeong-Sik Park, Gil-Jin Jang and Yong-Ho Seo

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:5

Content type: Research Published on: 19 January 2012
- View Full Text
- View PDF
Performance of the audio signals transmission over wireless networks with the channel interleaving considerations

This article studies a vital issue in wireless communications, which is the transmission of audio signals over wireless networks. It presents a novel interleaver scheme for protection against error bursts and ...

Authors: Mohsen Ahmed Mahmoud Mohamed El-Bendary, Atef E Abou-El-azm, Nawal A El-Fishawy, Farid Shawki, Fathi E Abd-ElSamie, Mostafa Ali Refai El-Tokhy and Hassan B Kazemian

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:4

Content type: Research Published on: 17 January 2012
- View Full Text
- View PDF
Towards expressive musical robots: a cross-modal framework for emotional gesture, voice and music

It has been long speculated that expression of emotions from different modalities have the same underlying 'code', whether it be a dance step, musical phrase, or tone of voice. This is the first attempt to imp...

Authors: Angelica Lim, Tetsuya Ogata and Hiroshi G Okuno

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:3

Content type: Research Published on: 17 January 2012
- View Full Text
- View PDF
Music expression with a robot manipulator used as a bidirectional tangible interface

The availability of haptic interfaces in music content processing offers interesting possibilities of performer-instrument interaction for musical expression. These new musical instruments can precisely modulate ...

Authors: Victor Zappi, Antonio Pistillo, Sylvain Calinon, Andrea Brogni and Darwin Caldwell

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:2

Content type: Research Published on: 13 January 2012
- View Full Text
- View PDF
A novel voice activity detection based on phoneme recognition using statistical model

In this article, a novel voice activity detection (VAD) approach based on phoneme recognition using Gaussian Mixture Model based Hidden Markov Model (HMM/GMM) is proposed. Some sophisticated speech features su...

Authors: Xulei Bao and Jie Zhu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:1

Content type: Research Published on: 9 January 2012
- View Full Text
- View PDF
Voice activity detection based on conjugate subspace matching pursuit and likelihood ratio test

Most of voice activity detection (VAD) schemes are operated in the discrete Fourier transform (DFT) domain by classifying each sound frame into speech or noise based on the DFT coefficients. These coefficients...

Authors: Shiwen Deng and Jiqing Han

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2011:12

Content type: Research Published on: 21 December 2011
- View Full Text
- View PDF

How was your experience today?

Rating Please select one rating

Awful

Bad

Good

Great

Thank you for your feedback.

Tell us why (opens in a new tab)

Articles

Follow

Affiliated with

Annual Journal Metrics

Funding your APC

​​​​​​​​​