Articles

Page 10 of 11

Drum Sound Detection in Polyphonic Music with Hidden Markov Models

This paper proposes a method for transcribing drums from polyphonic music using a network of connected hidden Markov models (HMMs). The task is to detect the temporal locations of unpitched percussive sounds (...

Authors: Jouni Paulus and Anssi Klapuri

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:497292

Content type: Research Article Published on: 14 December 2009
- View Full Text
- View PDF
Compact Acoustic Models for Embedded Speech Recognition

Speech recognition applications are known to require a significant amount of resources. However, embedded speech recognition only authorizes few KB of memory, few MIPS, and small amount of training data. In or...

Authors: Christophe Lévy, Georges Linarès and Jean-François Bonastre

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:806186

Content type: Research Article Published on: 13 December 2009
- View Full Text
- View PDF
SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support

This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animat...

Authors: Giampiero Salvi, Jonas Beskow, Samer Al Moubayed and Björn Granström

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:191940

Content type: Research Article Published on: 16 November 2009
- View Full Text
- View PDF
Lip-Synching Using Speaker-Specific Articulation, Shape and Appearance Models

We describe here the control, shape and appearance models that are built using an original photogrammetric method to capture characteristics of speaker-specific facial articulation, anatomy, and texture. Two o...

Authors: Gérard Bailly, Oxana Govokhina, Frédéric Elisei and Gaspard Breton

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:769494

Content type: Research Article Published on: 15 November 2009
- View Full Text
- View PDF
Model-Based Synthesis of Visual Speech Movements from 3D Video

We describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. Speech lip movements are captured using a 3D stereo face capture system and split up into p...

Authors: JamesD Edge, Adrian Hilton and Philip Jackson

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:597267

Content type: Research Article Published on: 15 November 2009
- View Full Text
- View PDF
Optimizing Automatic Speech Recognition for Low-Proficient Non-Native Speakers

Computer-Assisted Language Learning (CALL) applications for improving the oral skills of low-proficient learners have to cope with non-native speech that is particularly challenging. Since unconstrained non-na...

Authors: Joost van Doremalen, Catia Cucchiarini and Helmer Strik

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2010:973954

Content type: Research Article Published on: 1 November 2009
- View Full Text
- View PDF
An Adaptive Framework for Acoustic Monitoring of Potential Hazards

Robust recognition of general audio events constitutes a topic of intensive research in the signal processing community. This work presents an efficient methodology for acoustic surveillance of atypical situat...

Authors: Stavros Ntalampiras, Ilyas Potamitis and Nikos Fakotakis

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:594103

Content type: Research Article Published on: 20 October 2009
- View Full Text
- View PDF
Performance Study of Objective Speech Quality Measurement for Modern Wireless-VoIP Communications

Wireless-VoIP communications introduce perceptual degradations that are not present with traditional VoIP communications. This paper investigates the effects of such degradations on the performance of three st...

Authors: TiagoH Falk and Wai-Yip Chan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:104382

Content type: Research Article Published on: 18 October 2009
- View Full Text
- View PDF
Optimization of an Image-Based Talking Head System

This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a ...

Authors: Kang Liu and Joern Ostermann

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:174192

Content type: Research Article Published on: 30 September 2009
- View Full Text
- View PDF
On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech

Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either...

Authors: Wesley Mattheyses, Lukas Latacz and Werner Verhelst

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:169819

Content type: Research Article Published on: 22 September 2009
- View Full Text
- View PDF
Adaptive V/UV Speech Detection Based on Characterization of Background Noise

The paper presents an adaptive system for Voiced/Unvoiced (V/UV) speech detection in the presence of background noise. Genetic algorithms were used to select the features that offer the best V/UV detection acc...

Authors: F Beritelli, S Casale, A Russo and S Serrano

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:965436

Content type: Research Article Published on: 9 September 2009
- View Full Text
- View PDF
Signal Processing Implementation and Comparison of Automotive Spatial Sound Rendering Strategies

Design and implementation strategies of spatial sound rendering are investigated in this paper for automotive scenarios. Six design methods are implemented for various rendering modes with different number of ...

Authors: MingsianR Bai and Jhih-Ren Hong

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:876297

Content type: Research Article Published on: 24 August 2009
- View Full Text
- View PDF
Automatic Speech Recognition Systems for the Evaluation of Voice and Speech Disorders in Head and Neck Cancer

In patients suffering from head and neck cancer, speech intelligibility is often restricted. For assessment and outcome measurements, automatic speech recognition systems have previously been shown to be appro...

Authors: Andreas Maier, Tino Haderlein, Florian Stelzle, Elmar Nöth, Emeka Nkenke, Frank Rosanowski, Anne Schützenberger and Maria Schuster

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2010:926951

Content type: Research Article Published on: 19 August 2009
- View Full Text
- View PDF
Musical Sound Separation Based on Binary Time-Frequency Masking

The problem of overlapping harmonics is particularly acute in musical sound separation and has not been addressed adequately. We propose a monaural system based on binary time-frequency masking with an emphasi...

Authors: Yipeng Li and DeLiang Wang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:130567

Content type: Research Article Published on: 19 July 2009
- View Full Text
- View PDF
Analysis of Salient Feature Jitter in the Cochlea for Objective Prediction of Temporally Localized Distortion in Synthesized Speech

Temporally localized distortions account for the highest variance in subjective evaluation of coded speech signals (Sen (2001) and Hall (2001). The ability to discern and decompose perceptually relevant tempor...

Authors: Wenliang Lu and D Sen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:865723

Content type: Research Article Published on: 14 July 2009
- View Full Text
- View PDF
Tracking Intermittently Speaking Multiple Speakers Using a Particle Filter

The problem of tracking multiple intermittently speaking speakers is difficult as some distinct problems must be addressed. The number of active speakers must be estimated, these active speakers must be identi...

Authors: Angela Quinlan, Mitsuru Kawamoto, Yosuke Matsusaka, Hideki Asoh and Futoshi Asano

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:673202

Content type: Research Article Published on: 23 June 2009
- View Full Text
- View PDF
A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often appl...

Authors: Yizhar Lavner and Dima Ruinskiy

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:239892

Content type: Research Article Published on: 17 June 2009
- View Full Text
- View PDF
Integrated Phoneme Subspace Method for Speech Feature Extraction

Speech feature extraction has been a key focus in robust speech recognition research. In this work, we discuss data-driven linear feature transformations applied to feature vectors in the logarithmic mel-frequ...

Authors: Hyunsin Park, Tetsuya Takiguchi and Yasuo Ariki

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:690451

Content type: Research Article Published on: 16 June 2009
- View Full Text
- View PDF
Analysis of Damped Mass-Spring Systems for Sound Synthesis

There are many ways of synthesizing sound on a computer. The method that we consider, called a mass-spring system, synthesizes sound by simulating the vibrations of a network of interconnected masses, springs, an...

Authors: Don Morgan and Sanzheng Qiao

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:947823

Content type: Research Article Published on: 8 June 2009
- View Full Text
- View PDF
An Overview of the Coding Standard MPEG-4 Audio Amendments 1 and 2: HE-AAC, SSC, and HE-AAC v2

In 2003 and 2004, the ISO/IEC MPEG standardization committee added two amendments to their MPEG-4 audio coding standard. These amendments concern parametric coding techniques and encompass Spectral Band Replic...

Authors: AC den Brinker, J Breebaart, P Ekstrand, J Engdegård, F Henn, K Kjörling, W Oomen and H Purnhagen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:468971

Content type: Review Article Published on: 3 June 2009
- View Full Text
- View PDF
Recognition of Noisy Speech: A Comparative Survey of Robust Model Architecture and Feature Enhancement

Performance of speech recognition systems strongly degrades in the presence of background noise, like the driving noise inside a car. In contrast to existing works, we aim to improve noise robustness focusing ...

Authors: Björn Schuller, Martin Wöllmer, Tobias Moosmayr and Gerhard Rigoll

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:942617

Content type: Research Article Published on: 24 May 2009
- View Full Text
- View PDF
Analytical Features: A Knowledge-Based Approach to Audio Feature Generation

We present a feature generation system designed to create audio features for supervised classification tasks. The main contribution to feature generation studies is the notion of analytical features (AFs), a cons...

Authors: François Pachet and Pierre Roy

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:153017

Content type: Research Article Published on: 8 April 2009
- View Full Text
- View PDF
Comparison of Linear Prediction Models for Audio Signals

While linear prediction (LP) has become immensely popular in speech modeling, it does not seem to provide a good approach for modeling audio signals. This is somewhat surprising, since a tonal signal consistin...

Authors: Toon van Waterschoot and Marc Moonen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2008:706935

Content type: Research Article Published on: 18 March 2009
- View Full Text
- View PDF
Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages

Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique...

Authors: ArnarThor Jensson, Koji Iwano and Sadaoki Furui

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2008:573832

Content type: Research Article Published on: 27 January 2009
- View Full Text
- View PDF
Using SVM as Back-End Classifier for Language Identification

Robust automatic language identification (LID) is a task of identifying the language from a short utterance spoken by an unknown speaker. One of the mainstream approaches named parallel phone recognition langu...

Authors: Hongbin Suo, Ming Li, Ping Lu and Yonghong Yan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:674859

Content type: Research Article Published on: 10 November 2008
- View Full Text
- View PDF
Intelligent Audio, Speech, and Music Processing Applications

Authors: WoonS Gan, SenM Kuo and JohnHL Hansen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:854716

Content type: Editorial Published on: 5 November 2008
- View Full Text
- View PDF
Auditory Sparse Representation for Robust Speaker Recognition Based on Tensor Structure

This paper investigates the problem of speaker recognition in noisy conditions. A new approach called nonnegative tensor principal component analysis (NTPCA) with sparse constraint is proposed for speech featu...

Authors: Qiang Wu and Liqing Zhang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:578612

Content type: Research Article Published on: 2 November 2008
- View Full Text
- View PDF
Beamforming under Quantization Errors in Wireless Binaural Hearing Aids

Improving the intelligibility of speech in different environments is one of the main objectives of hearing aid signal processing algorithms. Hearing aids typically employ beamforming techniques using multiple ...

Authors: Sriram Srinivasan, Ashish Pandharipande and Kees Janse

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:824797

Content type: Research Article Published on: 6 July 2008
- View Full Text
- View PDF
Online Personalization of Hearing Instruments

Online personalization of hearing instruments refers to learning preferred tuning parameter values from user feedback through a control wheel (or remote control), during normal operation of the hearing aid. We...

Authors: Alexander Ypma, Job Geurts, Serkan Özer, Erik van der Werf and Bert de Vries

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:183456

Content type: Research Article Published on: 25 June 2008
- View Full Text
- View PDF
Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

A proven method for achieving effective automatic speech recognition (ASR) due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed ...

Authors: Umit H. Yapanel and John H.L. Hansen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:148967

Content type: Research Article Published on: 19 June 2008
- View Full Text
- View PDF
Real-Time Perceptual Simulation of Moving Sources: Application to the Leslie Cabinet and 3D Sound Immersion

Perception of moving sound sources obeys different brain processes from those mediating the localization of static sound events. In view of these specificities, a preprocessing model was designed, based on the...

Authors: R Kronland-Martinet and T Voinier

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:849696

Content type: Research Article Published on: 15 June 2008
- View Full Text
- View PDF
Automatic Music Boundary Detection Using Short Segmental Acoustic Similarity in a Music Piece

The present paper proposes a new approach for detecting music boundaries, such as the boundary between music pieces or the boundary between a music piece and a speech section for automatic segmentation of musi...

Authors: Yoshiaki Itoh, Akira Iwabuchi, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka and Shi-Wook Lee

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:480786

Content type: Research Article Published on: 11 June 2008
- View Full Text
- View PDF
Quality Enhancement of Compressed Audio Based on Statistical Conversion

Most audio compression formats are based on the idea of low bit rate transparent encoding. As these types of audio signals are starting to migrate from portable players with inexpensive headphones to higher qu...

Authors: Demetrios Cantzos, Athanasios Mouchtaris and Chris Kyriakakis

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:462830

Content type: Research Article Published on: 5 June 2008
- View Full Text
- View PDF
Fast Noise Compensation and Adaptive Enhancement for Speech Separation

We propose a novel approach to improve adaptive decorrelation filtering- (ADF-) based speech source separation in diffuse noise. The effects of noise on system adaptation and separation outputs are handled sep...

Authors: Rong Hu and Yunxin Zhao

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:349214

Content type: Research Article Published on: 5 June 2008
- View Full Text
- View PDF
On a Method for Improving Impulsive Sounds Localization in Hearing Defenders

This paper proposes a new algorithm for a directional aid with hearing defenders. Users of existing hearing defenders experience distorted information, or in worst cases, directional information may not be per...

Authors: Benny Sällberg, Farook Sattar and Ingvar Claesson

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:274684

Content type: Research Article Published on: 25 May 2008
- View Full Text
- View PDF
Frequency-Domain Adaptive Algorithm for Network Echo Cancellation in VoIP

We propose a new low complexity, low delay, and fast converging frequency-domain adaptive algorithm for network echo cancellation in VoIP exploiting MMax and sparse partial (SP) tap-selection criteria in the f...

Authors: Xiang(Shawn) Lin, Andy W.H. Khong, Milŏs Doroslovăcki and Patrick A. Naylor

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:156960

Content type: Research Article Published on: 22 April 2008
- View Full Text
- View PDF
Estimation of Interchannel Time Difference in Frequency Subbands Based on Nonuniform Discrete Fourier Transform

Binaural cue coding (BCC) is an efficient technique for spatial audio rendering by using the side information such as interchannel level difference (ICLD), interchannel time difference (ICTD), and interchannel...

Authors: Bo Qiu, Yong Xu, Yadong Lu and Jun Yang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:618104

Content type: Research Article Published on: 13 April 2008
- View Full Text
- View PDF
Measurement Combination for Acoustic Source Localization in a Room Environment

The behavior of time delay estimation (TDE) is well understood and therefore attractive to apply in acoustic source localization (ASL). A time delay between microphones maps into a hyperbola. Furthermore, the ...

Authors: Pasi Pertilä, Teemu Korhonen and Ari Visa

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:278185

Content type: Research Article Published on: 7 April 2008
- View Full Text
- View PDF
Tango or Waltz?: Putting Ballroom Dance Style into Tempo Detection

Rhythmic information plays an important role in Music Information Retrieval. Example applications include automatically annotating large databases by genre, meter, ballroom dance style or tempo, fully automate...

Authors: Björn Schuller, Florian Eyben and Gerhard Rigoll

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:846135

Content type: Research Article Published on: 1 April 2008
- View Full Text
- View PDF
Phasor Representation for Narrowband Active Noise Control Systems

The phasor representation is introduced to identify the characteristic of the active noise control (ANC) systems. The conventional representation, transfer function, cannot explain the fact that the performanc...

Authors: Fu-Kun Chen, Ding-Horng Chen and Yue-Dar Jou

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:126859

Content type: Research Article Published on: 31 March 2008
- View Full Text
- View PDF
Multiresolution Source/Filter Model for Low Bitrate Coding of Spot Microphone Signals

A multiresolution source/filter model for coding of audio source signals (spot recordings) is proposed. Spot recordings are a subset of the multimicrophone recordings of a music performance, before the mixing ...

Authors: Athanasios Mouchtaris, Kiki Karadimou and Panagiotis Tsakalides

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:624321

Content type: Research Article Published on: 23 March 2008
- View Full Text
- View PDF
Experiments on Automatic Recognition of Nonnative Arabic Speech

The automatic recognition of foreign-accented Arabic speech is a challenging task since it involves a large number of nonnative accents. As well, the nonnative speech data available for training are generally ...

Authors: YousefAjami Alotaibi, Sid-Ahmed Selouani and Douglas O'Shaughnessy

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:679831

Content type: Research Article Published on: 24 February 2008
- View Full Text
- View PDF
Practical Gammatone-Like Filters for Auditory Processing

This paper deals with continuous-time filter transfer functions that resemble tuning curves at particular set of places on the basilar membrane of the biological cochlea and that are suitable for practical VLS...

Authors: AG Katsiamis, EM Drakakis and RF Lyon

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2007 2007:063685

Content type: Research Article Published on: 13 December 2007
- View Full Text
- View PDF
Perceptual Models for Speech, Audio, and Music Processing

Authors: Jont B Allen, Wai-Yip Geoffrey Chan and Stephen Voran

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2007 2007:012687

Content type: Editorial Published on: 13 December 2007
- View Full Text
- View PDF
Electrophysiological Study of Algorithmically Processed Metric/Rhythmic Variations in Language and Music

This work is the result of an interdisciplinary collaboration between scientists from the fields of audio signal processing, phonetics and cognitive neuroscience aiming at studying the perception of modificati...

Authors: Sølvi Ystad, Cyrille Magne, Snorre Farner, Gregory Pallone, Mitsuko Aramaki, Mireille Besson and Richard Kronland-Martinet

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2007 2007:030194

Content type: Research Article Published on: 10 December 2007
- View Full Text
- View PDF
Multiple-Description Multistage Vector Quantization

Multistage vector quantization (MSVQ) is a technique for low complexity implementation of high-dimensional quantizers, which has found applications within speech, audio, and image coding. In this paper, a mult...

Authors: Pradeepa Yahampath

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2007 2007:067146

Content type: Research Article Published on: 3 December 2007
- View Full Text
- View PDF
The Effect of Listener Accent Background on Accent Perception and Comprehension

Variability of speaker accent is a challenge for effective human communication as well as speech technology including automatic speech recognition and accent identification. The motivation of this study is to ...

Authors: Ayako Ikeno and John HL Hansen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2007 2007:076030

Content type: Research Article Published on: 15 November 2007
- View Full Text
- View PDF
Denoising in the Domain of Spectrotemporal Modulations

A noise suppression algorithm is proposed based on filtering the spectrotemporal modulations of noisy signals. The modulations are estimated from a multiscale representation of the signal spectrogram generated...

Authors: Nima Mesgarani and Shihab Shamma

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2007 2007:042357

Content type: Research Article Published on: 15 November 2007
- View Full Text
- View PDF
Voice-to-Phoneme Conversion Algorithms for Voice-Tag Applications in Embedded Platforms

We describe two voice-to-phoneme conversion algorithms for speaker-independent voice-tag creation specifically targeted at applications on embedded platforms. These algorithms (batch mode and sequential) are comp...

Authors: YanMing Cheng, Changxue Ma and Lynette Melnar

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2007 2008:568737

Content type: Research Article Published on: 4 October 2007
- View Full Text
- View PDF
Perceptual Continuity and Naturalness of Expressive Strength in Singing Voices Based on Speech Morphing

This paper experimentally shows the importance of perceptual continuity of the expressive strength in vocal timbre for natural change in vocal expression. In order to synthesize various and continuous expressi...

Authors: Tomoko Yonezawa, Noriko Suzuki, Shinji Abe, Kenji Mase and Kiyoshi Kogure

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2007 2007:023807

Content type: Research Article Published on: 1 October 2007
- View Full Text
- View PDF

How was your experience today?

Rating Please select one rating

Awful

Bad

Good

Great

Thank you for your feedback.

Tell us why (opens in a new tab)

Articles

Follow

Affiliated with

Annual Journal Metrics

Funding your APC

​​​​​​​​​