Articles

Page 10 of 11

Determination of Nonprototypical Valence and Arousal in Popular Music: Features and Performances

Mood of Music is among the most relevant and commercially promising, yet challenging attributes for retrieval in large music collections. In this respect this article first provides a short overview on methods...

Authors: Björn Schuller, Johannes Dorfner and Gerhard Rigoll

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:735854

Content type: Research Article Published on: 11 April 2010
- View Full Text
- View PDF
Exploring the Effect of Differences in the Acoustic Correlates of Adults' and Children's Speech in the Context of Automatic Speech Recognition

This work explores the effect of mismatches between adults' and children's speech due to differences in various acoustic correlates on the automatic speech recognition performance under mismatched conditions. ...

Authors: Shweta Ghai and Rohit Sinha

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:318785

Content type: Research Article Published on: 21 March 2010
- View Full Text
- View PDF
Animating Virtual Speakers or Singers from Audio: Lip-Synching Facial Animation

Authors: Sascha Fagel, Gérard Bailly and Barry-John Theobald

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2009:826091

Content type: Editorial Published on: 2 March 2010
- View Full Text
- View PDF
Semantic Labeling of Nonspeech Audio Clips

Human communication about entities and events is primarily linguistic in nature. While visual representations of information are shown to be highly effective as well, relatively little is known about the commu...

Authors: Xiaojuan Ma, Christiane Fellbaum and Perry Cook

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:404860

Content type: Research Article Published on: 1 March 2010
- View Full Text
- View PDF
Automatic Recognition of Lyrics in Singing

The paper considers the task of recognizing phonemes and words from a singing input by using a phonetic hidden Markov model recognizer. The system is targeted to both monophonic singing and singing in polyphon...

Authors: Annamaria Mesaros and Tuomas Virtanen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:546047

Content type: Research Article Published on: 23 February 2010
- View Full Text
- View PDF
Ageing Voices: The Effect of Changes in Voice Parameters on ASR Performance

With ageing, human voices undergo several changes which are typically characterized by increased hoarseness and changes in articulation patterns. In this study, we have examined the effect on Automatic Speech ...

Authors: Ravichander Vipperla, Steve Renals and Joe Frankel

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:525783

Content type: Research Article Published on: 23 February 2010
- View Full Text
- View PDF
Wide-Band Audio Coding Based on Frequency-Domain Linear Prediction

We revisit an original concept of speech coding in which the signal is separated into the carrier modulated by the signal envelope. A recently developed technique, called frequency-domain linear prediction (FD...

Authors: Petr Motlicek, Sriram Ganapathy, Hynek Hermansky and Harinath Garudadri

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:856280

Content type: Research Article Published on: 16 February 2010
- View Full Text
- View PDF
Query-Driven Strategy for On-the-Fly Term Spotting in Spontaneous Speech

Spoken utterance retrieval was largely studied in the last decades, with the purpose of indexing large audio databases or of detecting keywords in continuous speech streams. While the indexing of closed corpor...

Authors: Mickael Rouvier, Georges Linarès and Benjamin Lecouteux

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:326578

Content type: Research Article Published on: 2 February 2010
- View Full Text
- View PDF
Analysis of the Roles and the Dynamics of Breathy and Whispery Voice Qualities in Dialogue Speech

Breathy and whispery voices are nonmodal phonations produced by an air escape through the glottis and may carry important linguistic or paralinguistic information (intentions, attitudes, and emotions), dependi...

Authors: CarlosToshinori Ishi, Hiroshi Ishiguro and Norihiro Hagita

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:528193

Content type: Research Article Published on: 4 January 2010
- View Full Text
- View PDF
On the Impact of Children's Emotional Speech on Acoustic and Language Models

The automatic recognition of children's speech is well known to be a challenge, and so is the influence of affect that is believed to downgrade performance of a speech recogniser. In this contribution, we inve...

Authors: Stefan Steidl, Anton Batliner, Dino Seppi and Björn Schuller

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:783954

Content type: Research Article Published on: 3 January 2010
- View Full Text
- View PDF
Pitch- and Formant-Based Order Adaptation of the Fractional Fourier Transform and Its Application to Speech Recognition

Fractional Fourier transform (FrFT) has been proposed to improve the time-frequency resolution in signal analysis and processing. However, selecting the FrFT transform order for the proper analysis of multicom...

Authors: Hui Yin, Climent Nadeu and Volker Hohmann

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2009:304579

Content type: Research Article Published on: 3 January 2010
- View Full Text
- View PDF
Audio Query by Example Using Similarity Measures between Probability Density Functions of Features

This paper proposes a query by example system for generic audio. We estimate the similarity of the example signal and the samples in the queried database by calculating the distance between the probability den...

Authors: Marko Helén and Tuomas Virtanen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2010:179303

Content type: Research Article Published on: 22 December 2009
- View Full Text
- View PDF
Drum Sound Detection in Polyphonic Music with Hidden Markov Models

This paper proposes a method for transcribing drums from polyphonic music using a network of connected hidden Markov models (HMMs). The task is to detect the temporal locations of unpitched percussive sounds (...

Authors: Jouni Paulus and Anssi Klapuri

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:497292

Content type: Research Article Published on: 14 December 2009
- View Full Text
- View PDF
Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition

We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves We...

Authors: Akinori Ito, Yasutomo Kajiura, Motoyuki Suzuki and Shozo Makino

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:140575

Content type: Research Article Published on: 14 December 2009
- View Full Text
- View PDF
Compact Acoustic Models for Embedded Speech Recognition

Speech recognition applications are known to require a significant amount of resources. However, embedded speech recognition only authorizes few KB of memory, few MIPS, and small amount of training data. In or...

Authors: Christophe Lévy, Georges Linarès and Jean-François Bonastre

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:806186

Content type: Research Article Published on: 13 December 2009
- View Full Text
- View PDF
SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support

This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animat...

Authors: Giampiero Salvi, Jonas Beskow, Samer Al Moubayed and Björn Granström

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:191940

Content type: Research Article Published on: 16 November 2009
- View Full Text
- View PDF
Lip-Synching Using Speaker-Specific Articulation, Shape and Appearance Models

We describe here the control, shape and appearance models that are built using an original photogrammetric method to capture characteristics of speaker-specific facial articulation, anatomy, and texture. Two o...

Authors: Gérard Bailly, Oxana Govokhina, Frédéric Elisei and Gaspard Breton

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:769494

Content type: Research Article Published on: 15 November 2009
- View Full Text
- View PDF
Model-Based Synthesis of Visual Speech Movements from 3D Video

We describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. Speech lip movements are captured using a 3D stereo face capture system and split up into p...

Authors: JamesD Edge, Adrian Hilton and Philip Jackson

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:597267

Content type: Research Article Published on: 15 November 2009
- View Full Text
- View PDF
Optimizing Automatic Speech Recognition for Low-Proficient Non-Native Speakers

Computer-Assisted Language Learning (CALL) applications for improving the oral skills of low-proficient learners have to cope with non-native speech that is particularly challenging. Since unconstrained non-na...

Authors: Joost van Doremalen, Catia Cucchiarini and Helmer Strik

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2010:973954

Content type: Research Article Published on: 1 November 2009
- View Full Text
- View PDF
An Adaptive Framework for Acoustic Monitoring of Potential Hazards

Robust recognition of general audio events constitutes a topic of intensive research in the signal processing community. This work presents an efficient methodology for acoustic surveillance of atypical situat...

Authors: Stavros Ntalampiras, Ilyas Potamitis and Nikos Fakotakis

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:594103

Content type: Research Article Published on: 20 October 2009
- View Full Text
- View PDF
Performance Study of Objective Speech Quality Measurement for Modern Wireless-VoIP Communications

Wireless-VoIP communications introduce perceptual degradations that are not present with traditional VoIP communications. This paper investigates the effects of such degradations on the performance of three st...

Authors: TiagoH Falk and Wai-Yip Chan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:104382

Content type: Research Article Published on: 18 October 2009
- View Full Text
- View PDF
Optimization of an Image-Based Talking Head System

This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a ...

Authors: Kang Liu and Joern Ostermann

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:174192

Content type: Research Article Published on: 30 September 2009
- View Full Text
- View PDF
On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech

Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either...

Authors: Wesley Mattheyses, Lukas Latacz and Werner Verhelst

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:169819

Content type: Research Article Published on: 22 September 2009
- View Full Text
- View PDF
Adaptive V/UV Speech Detection Based on Characterization of Background Noise

The paper presents an adaptive system for Voiced/Unvoiced (V/UV) speech detection in the presence of background noise. Genetic algorithms were used to select the features that offer the best V/UV detection acc...

Authors: F Beritelli, S Casale, A Russo and S Serrano

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:965436

Content type: Research Article Published on: 9 September 2009
- View Full Text
- View PDF
Signal Processing Implementation and Comparison of Automotive Spatial Sound Rendering Strategies

Design and implementation strategies of spatial sound rendering are investigated in this paper for automotive scenarios. Six design methods are implemented for various rendering modes with different number of ...

Authors: MingsianR Bai and Jhih-Ren Hong

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:876297

Content type: Research Article Published on: 24 August 2009
- View Full Text
- View PDF
Automatic Speech Recognition Systems for the Evaluation of Voice and Speech Disorders in Head and Neck Cancer

In patients suffering from head and neck cancer, speech intelligibility is often restricted. For assessment and outcome measurements, automatic speech recognition systems have previously been shown to be appro...

Authors: Andreas Maier, Tino Haderlein, Florian Stelzle, Elmar Nöth, Emeka Nkenke, Frank Rosanowski, Anne Schützenberger and Maria Schuster

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2010:926951

Content type: Research Article Published on: 19 August 2009
- View Full Text
- View PDF
Musical Sound Separation Based on Binary Time-Frequency Masking

The problem of overlapping harmonics is particularly acute in musical sound separation and has not been addressed adequately. We propose a monaural system based on binary time-frequency masking with an emphasi...

Authors: Yipeng Li and DeLiang Wang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:130567

Content type: Research Article Published on: 19 July 2009
- View Full Text
- View PDF
Analysis of Salient Feature Jitter in the Cochlea for Objective Prediction of Temporally Localized Distortion in Synthesized Speech

Temporally localized distortions account for the highest variance in subjective evaluation of coded speech signals (Sen (2001) and Hall (2001). The ability to discern and decompose perceptually relevant tempor...

Authors: Wenliang Lu and D Sen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:865723

Content type: Research Article Published on: 14 July 2009
- View Full Text
- View PDF
Tracking Intermittently Speaking Multiple Speakers Using a Particle Filter

The problem of tracking multiple intermittently speaking speakers is difficult as some distinct problems must be addressed. The number of active speakers must be estimated, these active speakers must be identi...

Authors: Angela Quinlan, Mitsuru Kawamoto, Yosuke Matsusaka, Hideki Asoh and Futoshi Asano

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:673202

Content type: Research Article Published on: 23 June 2009
- View Full Text
- View PDF
A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often appl...

Authors: Yizhar Lavner and Dima Ruinskiy

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:239892

Content type: Research Article Published on: 17 June 2009
- View Full Text
- View PDF
Integrated Phoneme Subspace Method for Speech Feature Extraction

Speech feature extraction has been a key focus in robust speech recognition research. In this work, we discuss data-driven linear feature transformations applied to feature vectors in the logarithmic mel-frequ...

Authors: Hyunsin Park, Tetsuya Takiguchi and Yasuo Ariki

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:690451

Content type: Research Article Published on: 16 June 2009
- View Full Text
- View PDF
Analysis of Damped Mass-Spring Systems for Sound Synthesis

There are many ways of synthesizing sound on a computer. The method that we consider, called a mass-spring system, synthesizes sound by simulating the vibrations of a network of interconnected masses, springs, an...

Authors: Don Morgan and Sanzheng Qiao

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:947823

Content type: Research Article Published on: 8 June 2009
- View Full Text
- View PDF
An Overview of the Coding Standard MPEG-4 Audio Amendments 1 and 2: HE-AAC, SSC, and HE-AAC v2

In 2003 and 2004, the ISO/IEC MPEG standardization committee added two amendments to their MPEG-4 audio coding standard. These amendments concern parametric coding techniques and encompass Spectral Band Replic...

Authors: AC den Brinker, J Breebaart, P Ekstrand, J Engdegård, F Henn, K Kjörling, W Oomen and H Purnhagen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:468971

Content type: Review Article Published on: 3 June 2009
- View Full Text
- View PDF
Recognition of Noisy Speech: A Comparative Survey of Robust Model Architecture and Feature Enhancement

Performance of speech recognition systems strongly degrades in the presence of background noise, like the driving noise inside a car. In contrast to existing works, we aim to improve noise robustness focusing ...

Authors: Björn Schuller, Martin Wöllmer, Tobias Moosmayr and Gerhard Rigoll

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:942617

Content type: Research Article Published on: 24 May 2009
- View Full Text
- View PDF
Analytical Features: A Knowledge-Based Approach to Audio Feature Generation

We present a feature generation system designed to create audio features for supervised classification tasks. The main contribution to feature generation studies is the notion of analytical features (AFs), a cons...

Authors: François Pachet and Pierre Roy

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:153017

Content type: Research Article Published on: 8 April 2009
- View Full Text
- View PDF
Comparison of Linear Prediction Models for Audio Signals

While linear prediction (LP) has become immensely popular in speech modeling, it does not seem to provide a good approach for modeling audio signals. This is somewhat surprising, since a tonal signal consistin...

Authors: Toon van Waterschoot and Marc Moonen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2008:706935

Content type: Research Article Published on: 18 March 2009
- View Full Text
- View PDF
Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages

Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique...

Authors: ArnarThor Jensson, Koji Iwano and Sadaoki Furui

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2008:573832

Content type: Research Article Published on: 27 January 2009
- View Full Text
- View PDF
Using SVM as Back-End Classifier for Language Identification

Robust automatic language identification (LID) is a task of identifying the language from a short utterance spoken by an unknown speaker. One of the mainstream approaches named parallel phone recognition langu...

Authors: Hongbin Suo, Ming Li, Ping Lu and Yonghong Yan

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:674859

Content type: Research Article Published on: 10 November 2008
- View Full Text
- View PDF
Intelligent Audio, Speech, and Music Processing Applications

Authors: WoonS Gan, SenM Kuo and JohnHL Hansen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:854716

Content type: Editorial Published on: 5 November 2008
- View Full Text
- View PDF
Auditory Sparse Representation for Robust Speaker Recognition Based on Tensor Structure

This paper investigates the problem of speaker recognition in noisy conditions. A new approach called nonnegative tensor principal component analysis (NTPCA) with sparse constraint is proposed for speech featu...

Authors: Qiang Wu and Liqing Zhang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:578612

Content type: Research Article Published on: 2 November 2008
- View Full Text
- View PDF
Beamforming under Quantization Errors in Wireless Binaural Hearing Aids

Improving the intelligibility of speech in different environments is one of the main objectives of hearing aid signal processing algorithms. Hearing aids typically employ beamforming techniques using multiple ...

Authors: Sriram Srinivasan, Ashish Pandharipande and Kees Janse

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:824797

Content type: Research Article Published on: 6 July 2008
- View Full Text
- View PDF
Online Personalization of Hearing Instruments

Online personalization of hearing instruments refers to learning preferred tuning parameter values from user feedback through a control wheel (or remote control), during normal operation of the hearing aid. We...

Authors: Alexander Ypma, Job Geurts, Serkan Özer, Erik van der Werf and Bert de Vries

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:183456

Content type: Research Article Published on: 25 June 2008
- View Full Text
- View PDF
Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

A proven method for achieving effective automatic speech recognition (ASR) due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed ...

Authors: Umit H. Yapanel and John H.L. Hansen

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:148967

Content type: Research Article Published on: 19 June 2008
- View Full Text
- View PDF
Real-Time Perceptual Simulation of Moving Sources: Application to the Leslie Cabinet and 3D Sound Immersion

Perception of moving sound sources obeys different brain processes from those mediating the localization of static sound events. In view of these specificities, a preprocessing model was designed, based on the...

Authors: R Kronland-Martinet and T Voinier

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:849696

Content type: Research Article Published on: 15 June 2008
- View Full Text
- View PDF
Automatic Music Boundary Detection Using Short Segmental Acoustic Similarity in a Music Piece

The present paper proposes a new approach for detecting music boundaries, such as the boundary between music pieces or the boundary between a music piece and a speech section for automatic segmentation of musi...

Authors: Yoshiaki Itoh, Akira Iwabuchi, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka and Shi-Wook Lee

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:480786

Content type: Research Article Published on: 11 June 2008
- View Full Text
- View PDF
Quality Enhancement of Compressed Audio Based on Statistical Conversion

Most audio compression formats are based on the idea of low bit rate transparent encoding. As these types of audio signals are starting to migrate from portable players with inexpensive headphones to higher qu...

Authors: Demetrios Cantzos, Athanasios Mouchtaris and Chris Kyriakakis

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:462830

Content type: Research Article Published on: 5 June 2008
- View Full Text
- View PDF
Fast Noise Compensation and Adaptive Enhancement for Speech Separation

We propose a novel approach to improve adaptive decorrelation filtering- (ADF-) based speech source separation in diffuse noise. The effects of noise on system adaptation and separation outputs are handled sep...

Authors: Rong Hu and Yunxin Zhao

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:349214

Content type: Research Article Published on: 5 June 2008
- View Full Text
- View PDF
On a Method for Improving Impulsive Sounds Localization in Hearing Defenders

This paper proposes a new algorithm for a directional aid with hearing defenders. Users of existing hearing defenders experience distorted information, or in worst cases, directional information may not be per...

Authors: Benny Sällberg, Farook Sattar and Ingvar Claesson

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:274684

Content type: Research Article Published on: 25 May 2008
- View Full Text
- View PDF
Frequency-Domain Adaptive Algorithm for Network Echo Cancellation in VoIP

We propose a new low complexity, low delay, and fast converging frequency-domain adaptive algorithm for network echo cancellation in VoIP exploiting MMax and sparse partial (SP) tap-selection criteria in the f...

Authors: Xiang(Shawn) Lin, Andy W.H. Khong, Milŏs Doroslovăcki and Patrick A. Naylor

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:156960

Content type: Research Article Published on: 22 April 2008
- View Full Text
- View PDF
Estimation of Interchannel Time Difference in Frequency Subbands Based on Nonuniform Discrete Fourier Transform

Binaural cue coding (BCC) is an efficient technique for spatial audio rendering by using the side information such as interchannel level difference (ICLD), interchannel time difference (ICTD), and interchannel...

Authors: Bo Qiu, Yong Xu, Yadong Lu and Jun Yang

Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:618104

Content type: Research Article Published on: 13 April 2008
- View Full Text
- View PDF

How was your experience today?

Rating Please select one rating

Awful

Bad

Good

Great

Thank you for your feedback.

Tell us why (opens in a new tab)

Articles

Follow

Affiliated with

Annual Journal Metrics

Funding your APC

​​​​​​​​​