Articles

Page 9 of 11

Robust dialogue act detection based on partial sentence tree, derivation rule, and spectral clustering algorithm

A novel approach for robust dialogue act detection in a spoken dialogue system is proposed. Shallow representation named partial sentence trees are employed to represent automatic speech recognition outputs. P...

Authors: Chia-Ping Chen, Chung-Hsien Wu and Wei-Bin Liang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:13

Content type: Research Published on: 3 March 2012
- View Full Text
- View PDF
A role of multi-modal rhythms in physical interaction and cooperation

As fundamental research for human-robot interaction, this paper addresses the rhythmic reference of a human while turning a rope with another human. We hypothyzed that when interpreting rhythm cues to make a r...

Authors: Kenta Yonekura, Chyon Hae Kim, Kazuhiro Nakadai, Hiroshi Tsujino and Shigeki Sugano

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:12

Content type: Research Published on: 2 March 2012
- View Full Text
- View PDF
Transcribing Bach chorales: Limitations and potentials of non-negative matrix factorisation

This article discusses our research on polyphonic music transcription using non-negative matrix factorisation (NMF). The application of NMF in polyphonic transcription offers an alternative approach in which obse...

Authors: Somnuk Phon-Amnuaisuk

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:11

Content type: Research Published on: 27 February 2012
- View Full Text
- View PDF
Decision tree-based acoustic models for speech recognition

This article proposes a new acoustic model using decision trees (DTs) as replacements for Gaussian mixture models (GMM) to compute the observation likelihoods for a given hidden Markov model state in a speech ...

Authors: Masami Akamine and Jitendra Ajmera

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:10

Content type: Research Published on: 17 February 2012
- View Full Text
- View PDF
Force-feedback interaction with a neural oscillator model: for shared human-robot control of a virtual percussion instrument

A study on force-feedback interaction with a model of a neural oscillator provides insight into enhanced human-robot interactions for controlling musical sound. We provide differential equations and discrete-t...

Authors: Edgar Berdahl, Claude Cadoz and Nicolas Castagné

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:9

Content type: Research Published on: 8 February 2012
- View Full Text
- View PDF
Combined perception and control for timing in robotic music performances

Interaction with human musicians is a challenging task for robots as it involves online perception and precise synchronization. In this paper, we present a consistent and theoretically sound framework for comb...

Authors: Umut Şimşekli, Orhan Sönmez, Barış Kurt Kurt and Ali Taylan Cemgil

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:8

Content type: Research Published on: 3 February 2012
- View Full Text
- View PDF
DWT and LPC based feature extraction methods for isolated word recognition

In this article, new feature extraction methods, which utilize wavelet decomposition and reduced order linear predictive coding (LPC) coefficients, have been proposed for speech recognition. The coefficients h...

Authors: Navnath S Nehe and Raghunath S Holambe

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:7

Content type: Research Published on: 30 January 2012
- View Full Text
- View PDF
A multimodal tempo and beat-tracking system based on audiovisual information from live guitar performances

The aim of this paper is to improve beat-tracking for live guitar performances. Beat-tracking is a function to estimate musical measurements, for example musical tempo and phase. This method is critical to ach...

Authors: Tatsuhiko Itohara, Takuma Otsuka, Takeshi Mizumoto, Angelica Lim, Tetsuya Ogata and Hiroshi G Okuno

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:6

Content type: Research Published on: 20 January 2012
- View Full Text
- View PDF
Music-aided affective interaction between human and service robot

This study proposes a music-aided framework for affective interaction of service robots with humans. The framework consists of three systems, respectively, for perception, memory, and expression on the basis o...

Authors: Jeong-Sik Park, Gil-Jin Jang and Yong-Ho Seo

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:5

Content type: Research Published on: 19 January 2012
- View Full Text
- View PDF
Performance of the audio signals transmission over wireless networks with the channel interleaving considerations

This article studies a vital issue in wireless communications, which is the transmission of audio signals over wireless networks. It presents a novel interleaver scheme for protection against error bursts and ...

Authors: Mohsen Ahmed Mahmoud Mohamed El-Bendary, Atef E Abou-El-azm, Nawal A El-Fishawy, Farid Shawki, Fathi E Abd-ElSamie, Mostafa Ali Refai El-Tokhy and Hassan B Kazemian

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:4

Content type: Research Published on: 17 January 2012
- View Full Text
- View PDF
Towards expressive musical robots: a cross-modal framework for emotional gesture, voice and music

It has been long speculated that expression of emotions from different modalities have the same underlying 'code', whether it be a dance step, musical phrase, or tone of voice. This is the first attempt to imp...

Authors: Angelica Lim, Tetsuya Ogata and Hiroshi G Okuno

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:3

Content type: Research Published on: 17 January 2012
- View Full Text
- View PDF
Music expression with a robot manipulator used as a bidirectional tangible interface

The availability of haptic interfaces in music content processing offers interesting possibilities of performer-instrument interaction for musical expression. These new musical instruments can precisely modulate ...

Authors: Victor Zappi, Antonio Pistillo, Sylvain Calinon, Andrea Brogni and Darwin Caldwell

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:2

Content type: Research Published on: 13 January 2012
- View Full Text
- View PDF
A novel voice activity detection based on phoneme recognition using statistical model

In this article, a novel voice activity detection (VAD) approach based on phoneme recognition using Gaussian Mixture Model based Hidden Markov Model (HMM/GMM) is proposed. Some sophisticated speech features su...

Authors: Xulei Bao and Jie Zhu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2012 2012:1

Content type: Research Published on: 9 January 2012
- View Full Text
- View PDF
Voice activity detection based on conjugate subspace matching pursuit and likelihood ratio test

Most of voice activity detection (VAD) schemes are operated in the discrete Fourier transform (DFT) domain by classifying each sound frame into speech or noise based on the DFT coefficients. These coefficients...

Authors: Shiwen Deng and Jiqing Han

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2011:12

Content type: Research Published on: 21 December 2011
- View Full Text
- View PDF
Semantic structures of timbre emerging from social and acoustic descriptions of music

The perceptual attributes of timbre have inspired a considerable amount of multidisciplinary research, but because of the complexity of the phenomena, the approach has traditionally been confined to laboratory...

Authors: Rafael Ferrer and Tuomas Eerola

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2011:11

Content type: Research Published on: 7 December 2011
- View Full Text
- View PDF
System for fast lexical and phonetic spoken term detection in a Czech cultural heritage archive

The main objective of the work presented in this paper was to develop a complete system that would accomplish the original visions of the MALACH project. Those goals were to employ automatic speech recognition...

Authors: Josef Psutka, Jan Švec, Josef V Psutka, Jan Vaněk, Aleš Pražák, Luboš Šmídl and Pavel Ircing

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2011:10

Content type: Research Published on: 5 December 2011
- View Full Text
- View PDF
Noise-robust speech feature processing with empirical mode decomposition

In this article, a novel technique based on the empirical mode decomposition methodology for processing speech features is proposed and investigated. The empirical mode decomposition generalizes the Fourier an...

Authors: Kuo-Hau Wu, Chia-Ping Chen and Bing-Feng Yeh

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2011:9

Content type: Research Published on: 15 November 2011
- View Full Text
- View PDF
Correlation analysis of the speech multiscale product for the open quotient estimation

This article proposes a multiscale product (MP)-based method for estimating the open quotient (OQ) from the speech waveform. The MP is operated by calculating the wavelet transform coefficients of the speech s...

Authors: Wafa Saidi, Aicha Bouzid and Noureddine Ellouze

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2011:8

Content type: Research Published on: 10 November 2011
- View Full Text
- View PDF
An improved adaptive gain equalizer for noise reduction with low speech distortion

In high-quality conferencing systems, it is desired to perform noise reduction with as limited speech distortion as possible. Previous work, based on time varying amplification controlled by signal-to-noise ra...

Authors: Markus Borgh, Magnus Berggren, Christian Schüldt, Fredric Lindström and Ingvar Claesson

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2011:7

Content type: Research Published on: 26 October 2011
- View Full Text
- View PDF
A large vocabulary continuous speech recognition system for Persian language

The first large vocabulary speech recognition system for the Persian language is introduced in this paper. This continuous speech recognition system uses most standard and state-of-the-art speech and language ...

Authors: Hossein Sameti, Hadi Veisi, Mohammad Bahrani, Bagher Babaali and Khosro Hosseinzadeh

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2011:6

Content type: Research Published on: 5 October 2011
- View Full Text
- View PDF
Noise reduction for periodic signals using high-resolution frequency analysis

The spectrum subtraction method is one of the most common methods by which to remove noise from a spectrum. Like many noise reduction methods, the spectrum subtraction method uses discrete Fourier transform (D...

Authors: Toshio Yoshizawa, Shigeki Hirobayashi and Tadanobu Misawa

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2011:5

Content type: Research Published on: 21 September 2011
- View Full Text
- View PDF
Multi-label classification of music by emotion

This work studies the task of automatic emotion detection in music. Music may evoke more than one different emotion at the same time. Single-label classification and regression cannot model this multiplicity. ...

Authors: Konstantinos Trohidis, Grigorios Tsoumakas, George Kalliris and Ioannis Vlahavas

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2011:4

Content type: Research Published on: 18 September 2011
- View Full Text
- View PDF
Robust time delay estimation for speech signals using information theory: A comparison study

Time delay estimation (TDE) is a fundamental subsystem for a speaker localization and tracking system. Most of the traditional TDE methods are based on second-order statistics (SOS) under Gaussian assumption f...

Authors: Fei Wen and Qun Wan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2011:3

Content type: Research Published on: 29 July 2011
- View Full Text
- View PDF
Semitone frequency mapping to improve music representation for nucleus cochlear implants

The frequency-to-channel mapping for Cochlear implant (CI) signal processors was originally designed to optimize speech perception and generally does not preserve the harmonic structure of music sounds. An alg...

Authors: Sherif Abdellatif Omran, Waikong Lai, Michael Büchler and Norbert Dillier

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2011:2

Content type: Research Published on: 21 June 2011
- View Full Text
- View PDF
Audio segmentation of broadcast news in the Albayzin-2010 evaluation: overview, results, and discussion

Recently, audio segmentation has attracted research interest because of its usefulness in several applications like audio indexing and retrieval, subtitling, monitoring of acoustic scenes, etc. Moreover, a pre...

Authors: Taras Butko and Climent Nadeu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2011:1

Content type: Research Published on: 17 June 2011
- View Full Text
- View PDF
Scalable Audio-Content Analysis

Authors: Bhiksha Raj, Paris Smaragdis, Malcolm Slaney, Chung-Hsien Wu, Liming Chen and Hyoung-Gook Kim

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2010:467278

Content type: Editorial Published on: 13 March 2011
- View Full Text
- View PDF
Environmental Sound Synthesis, Processing, and Retrieval

Authors: Andrea Valle

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2010:178164

Content type: Editorial Published on: 16 February 2011
- View Full Text
- View PDF
Phoneme and Sentence-Level Ensembles for Speech Recognition

We address the question of whether and how boosting and bagging can be used for speech recognition. In order to do this, we compare two different boosting schemes, one at the phoneme level and one at the utter...

Authors: Christos Dimitrakakis and Samy Bengio

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2011:426792

Content type: Research Published on: 7 February 2011
- View Full Text
- View PDF
Pitch Ranking, Melody Contour and Instrument Recognition Tests Using Two Semitone Frequency Maps for Nucleus Cochlear Implants

To overcome harmonic structure distortions of complex tones in the low frequency range due to the frequency to electrode mapping function used in Nucleus cochlear implants, two modified frequency maps based on...

Authors: Sherif A. Omran, Waikong Lai and Norbert Dillier

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2010:948565

Content type: Research Article Published on: 10 January 2011
- View Full Text
- View PDF
Physically Motivated Environmental Sound Synthesis for Virtual Worlds

A system is described for simulating environmental sound in interactive virtual worlds, using the physical state of objects as control parameters. It contains a unified framework for integration with physics s...

Authors: Dylan Menzies

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2011 2010:137878

Content type: Research Article Published on: 2 January 2011
- View Full Text
- View PDF
Multiple Source Localization Based on Acoustic Map De-Emphasis

This paper describes a novel approach for localization of multiple sources overlapping in time. The proposed algorithm relies on acoustic maps computed in multi-microphone settings, which are descriptions of t...

Authors: Alessio Brutti, Maurizio Omologo and Piergiorgio Svaizer

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:147495

Content type: Research Article Published on: 26 December 2010
- View Full Text
- View PDF
Monaural Voiced Speech Segregation Based on Dynamic Harmonic Function

Correlogram is an important representation for periodic signals. It is widely used in pitch estimation and source separation. For these applications, major problems of correlogram are its low resolution and re...

Authors: Xueliang Zhang, Wenju Liu and Bo Xu

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:252374

Content type: Research Article Published on: 12 December 2010
- View Full Text
- View PDF
Ecological Acoustics Perspective for Content-Based Retrieval of Environmental Sounds

In this paper we present a method to search for environmental sounds in large unstructured databases of user-submitted audio, using a general sound events taxonomy from ecological acoustics. We discuss the use...

Authors: Gerard Roma, Jordi Janer, Stefan Kersten, Mattia Schirosa, Perfecto Herrera and Xavier Serra

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:960863

Content type: Research Article Published on: 5 December 2010
- View Full Text
- View PDF
A Novel MPEG Audio Degrouping Algorithm and Its Architecture Design

Degrouping is the key component in MPEG Layer II audio decoding. It mainly contains the arithmetic operations of division and modulo. So far no dedicated degrouping algorithm and architecture is well realized....

Authors: Tsung-Han Tsai

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:737450

Content type: Research Article Published on: 25 November 2010
- View Full Text
- View PDF
An Ontological Framework for Retrieving Environmental Sounds Using Semantics and Acoustic Content

Organizing a database of user-contributed environmental sound recordings allows sound files to be linked not only by the semantic tags and labels applied to them, but also to other sounds with similar acoustic...

Authors: Gordon Wichern, Brandon Mechtley, Alex Fink, Harvey Thornburg and Andreas Spanias

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:192363

Content type: Research Article Published on: 21 October 2010
- View Full Text
- View PDF
The Effect of a Voice Activity Detector on the Speech Enhancement Performance of the Binaural Multichannel Wiener Filter

A multimicrophone speech enhancement algorithm for binaural hearing aids that preserves interaural time delays was proposed recently. The algorithm is based on multichannel Wiener filtering and relies on a voi...

Authors: Jasmina Catic, Torsten Dau, JörgM Buchholz and Fredrik Gran

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:840294

Content type: Research Article Published on: 13 October 2010
- View Full Text
- View PDF
Evaluating Environmental Sounds from a Presence Perspective for Virtual Reality Applications

We propose a methodology to design and evaluate environmental sounds for virtual environments. We propose to combine physically modeled sound events with recorded soundscapes. Physical models are used to provi...

Authors: Rolf Nordahl

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:426937

Content type: Research Article Published on: 11 October 2010
- View Full Text
- View PDF
Instrumental Estimation of E-Model Parameters for Wideband Speech Codecs

A method is described for quantifying the quality of wideband speech codecs. Two parameters are derived from signal-based speech quality model estimations: (i) a wideband equipment impairment factor

Authors: Sebastian Möller, Nicolas Côté, Valérie Gautier-Turbin, Nobuhiko Kitawaki and Akira Takahashi

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:782731

Content type: Research Article Published on: 5 October 2010
- View Full Text
- View PDF
Optimizing the Directivity of Multiway Loudspeaker Systems

In multiway loudspeaker systems, digital signal processing techniques have been used to correct the frequency response, the propagation time, and the lobbing errors. These solutions are mainly based on correct...

Authors: Hmaied Shaiek and JeanMarc Boucher

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:928439

Content type: Research Article Published on: 23 August 2010
- View Full Text
- View PDF
Comparisons of Auditory Impressions and Auditory Imagery Associated with Onomatopoeic Representation for Environmental Sounds

Humans represent sounds to others and receive information about sounds from others using onomatopoeia. Such representation is useful for obtaining and reporting the acoustic features and impressions of actual ...

Authors: Masayuki Takada, Nozomu Fujisawa, Fumino Obata and Shin-ichiro Iwamiya

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:674248

Content type: Research Article Published on: 11 August 2010
- View Full Text
- View PDF
On the Characterization of Slowly Varying Sinusoids

We give a brief discussion on the amplitude and frequency variation rates of the sinusoid representation of signals. In particular, we derive three inequalities that show that these rates are upper bounded by ...

Authors: Xue Wen and Mark Sandler

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:941732

Content type: Research Article Published on: 26 July 2010
- View Full Text
- View PDF
Correlation-Based Amplitude Estimation of Coincident Partials in Monaural Musical Signals

This paper presents a method for estimating the amplitude of coincident partials generated by harmonic musical sources (instruments and vocals). It was developed as an alternative to the commonly used interpol...

Authors: JaymeGarciaArnal Barbedo and George Tzanetakis

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:523791

Content type: Research Article Published on: 20 July 2010
- View Full Text
- View PDF
Efficient Advertisement Discovery for Audio Podcast Content Using Candidate Segmentation

Nowadays, audio podcasting has been widely used by many online sites such as newspapers, web portals, journals, and so forth, to deliver audio content to users through download or subscription. Within 1 to 30 ...

Authors: MN Nguyen, Qi Tian and Ping Xue

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:572571

Content type: Research Article Published on: 14 July 2010
- View Full Text
- View PDF
Combining Superdirective Beamforming and Frequency-Domain Blind Source Separation for Highly Reverberant Signals

Frequency-domain blind source separation (BSS) performs poorly in high reverberation because the independence assumption collapses at each frequency bins when the number of bins increases. To improve the separ...

Authors: Lin Wang, Heping Ding and Fuliang Yin

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:797962

Content type: Research Article Published on: 24 June 2010
- View Full Text
- View PDF
Employing Second-Order Circular Suprasegmental Hidden Markov Models to Enhance Speaker Identification Performance in Shouted Talking Environments

Speaker identification performance is almost perfect in neutral talking environments. However, the performance is deteriorated significantly in shouted talking environments. This work is devoted to proposing, ...

Authors: Ismail Shahin

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:862138

Content type: Research Article Published on: 16 June 2010
- View Full Text
- View PDF
Development of the Database for Environmental Sound Research and Application (DESRA): Design, Functionality, and Retrieval Considerations

Theoretical and applied environmental sounds research is gaining prominence but progress has been hampered by the lack of a comprehensive, high quality, accessible database of environmental sounds. An ongoing ...

Authors: Brian Gygi and Valeriy Shafiro

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:654914

Content type: Research Article Published on: 14 June 2010
- View Full Text
- View PDF
Adaptive Long-Term Coding of LSF Parameters Trajectories for Large-Delay/Very- to Ultra-Low Bit-Rate Speech Coding

This paper presents a model-based method for coding the LSF parameters of LPC speech coders on a "long-term" basis, that is, beyond the usual 20–30 ms frame duration. The objective is to provide efficient LSF ...

Authors: Laurent Girin

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:597039

Content type: Research Article Published on: 2 June 2010
- View Full Text
- View PDF
Atypical Speech

Authors: Georg Stemmer, Elmar Nöth and Vijay Parsa

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:835974

Content type: Editorial Published on: 11 May 2010
- View Full Text
- View PDF
Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions

When a number of speakers are simultaneously active, for example in meetings or noisy public places, the sources of interest need to be separated from interfering speakers and from each other in order to be ro...

Authors: Dorothea Kolossa, Ramon Fernandez Astudillo, Eugen Hoffmann and Reinhold Orglmeister

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:651420

Content type: Research Article Published on: 10 May 2010
- View Full Text
- View PDF
Environmental Sound Perception: Metadescription and Modeling Based on Independent Primary Studies

The aim of the study is to transpose and extend to a set of environmental sounds the notion of sound descriptors usually used for musical sounds. Four separate primary studies dealing with interior car sounds,...

Authors: Nicolas Misdariis, Antoine Minard, Patrick Susini, Guillaume Lemaitre, Stephen McAdams and Etienne Parizet

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:362013

Content type: Research Article Published on: 10 May 2010
- View Full Text
- View PDF

How was your experience today?

Rating Please select one rating

Awful

Bad

Good

Great

Thank you for your feedback.

Tell us why (opens in a new tab)

Articles

Follow

Affiliated with

Annual Journal Metrics

Funding your APC

​​​​​​​​​