Skip to main content

Articles

Page 10 of 11

  1. The automatic recognition of children's speech is well known to be a challenge, and so is the influence of affect that is believed to downgrade performance of a speech recogniser. In this contribution, we inve...

    Authors: Stefan Steidl, Anton Batliner, Dino Seppi and Björn Schuller
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:783954
  2. This paper proposes a query by example system for generic audio. We estimate the similarity of the example signal and the samples in the queried database by calculating the distance between the probability den...

    Authors: Marko Helén and Tuomas Virtanen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2010:179303
  3. This paper proposes a method for transcribing drums from polyphonic music using a network of connected hidden Markov models (HMMs). The task is to detect the temporal locations of unpitched percussive sounds (...

    Authors: Jouni Paulus and Anssi Klapuri
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:497292
  4. We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves We...

    Authors: Akinori Ito, Yasutomo Kajiura, Motoyuki Suzuki and Shozo Makino
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:140575
  5. Speech recognition applications are known to require a significant amount of resources. However, embedded speech recognition only authorizes few KB of memory, few MIPS, and small amount of training data. In or...

    Authors: Christophe Lévy, Georges Linarès and Jean-François Bonastre
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:806186
  6. This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animat...

    Authors: Giampiero Salvi, Jonas Beskow, Samer Al Moubayed and Björn Granström
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:191940
  7. We describe here the control, shape and appearance models that are built using an original photogrammetric method to capture characteristics of speaker-specific facial articulation, anatomy, and texture. Two o...

    Authors: Gérard Bailly, Oxana Govokhina, Frédéric Elisei and Gaspard Breton
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:769494
  8. We describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. Speech lip movements are captured using a 3D stereo face capture system and split up into p...

    Authors: JamesD Edge, Adrian Hilton and Philip Jackson
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:597267
  9. Computer-Assisted Language Learning (CALL) applications for improving the oral skills of low-proficient learners have to cope with non-native speech that is particularly challenging. Since unconstrained non-na...

    Authors: Joost van Doremalen, Catia Cucchiarini and Helmer Strik
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2010:973954
  10. Robust recognition of general audio events constitutes a topic of intensive research in the signal processing community. This work presents an efficient methodology for acoustic surveillance of atypical situat...

    Authors: Stavros Ntalampiras, Ilyas Potamitis and Nikos Fakotakis
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:594103
  11. Wireless-VoIP communications introduce perceptual degradations that are not present with traditional VoIP communications. This paper investigates the effects of such degradations on the performance of three st...

    Authors: TiagoH Falk and Wai-Yip Chan
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:104382
  12. This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a ...

    Authors: Kang Liu and Joern Ostermann
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:174192
  13. Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either...

    Authors: Wesley Mattheyses, Lukas Latacz and Werner Verhelst
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:169819
  14. The paper presents an adaptive system for Voiced/Unvoiced (V/UV) speech detection in the presence of background noise. Genetic algorithms were used to select the features that offer the best V/UV detection acc...

    Authors: F Beritelli, S Casale, A Russo and S Serrano
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:965436
  15. Design and implementation strategies of spatial sound rendering are investigated in this paper for automotive scenarios. Six design methods are implemented for various rendering modes with different number of ...

    Authors: MingsianR Bai and Jhih-Ren Hong
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:876297
  16. In patients suffering from head and neck cancer, speech intelligibility is often restricted. For assessment and outcome measurements, automatic speech recognition systems have previously been shown to be appro...

    Authors: Andreas Maier, Tino Haderlein, Florian Stelzle, Elmar Nöth, Emeka Nkenke, Frank Rosanowski, Anne Schützenberger and Maria Schuster
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2010:926951
  17. The problem of overlapping harmonics is particularly acute in musical sound separation and has not been addressed adequately. We propose a monaural system based on binary time-frequency masking with an emphasi...

    Authors: Yipeng Li and DeLiang Wang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:130567
  18. Temporally localized distortions account for the highest variance in subjective evaluation of coded speech signals (Sen (2001) and Hall (2001). The ability to discern and decompose perceptually relevant tempor...

    Authors: Wenliang Lu and D Sen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:865723
  19. The problem of tracking multiple intermittently speaking speakers is difficult as some distinct problems must be addressed. The number of active speakers must be estimated, these active speakers must be identi...

    Authors: Angela Quinlan, Mitsuru Kawamoto, Yosuke Matsusaka, Hideki Asoh and Futoshi Asano
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:673202
  20. Speech feature extraction has been a key focus in robust speech recognition research. In this work, we discuss data-driven linear feature transformations applied to feature vectors in the logarithmic mel-frequ...

    Authors: Hyunsin Park, Tetsuya Takiguchi and Yasuo Ariki
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:690451
  21. There are many ways of synthesizing sound on a computer. The method that we consider, called a mass-spring system, synthesizes sound by simulating the vibrations of a network of interconnected masses, springs, an...

    Authors: Don Morgan and Sanzheng Qiao
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:947823
  22. In 2003 and 2004, the ISO/IEC MPEG standardization committee added two amendments to their MPEG-4 audio coding standard. These amendments concern parametric coding techniques and encompass Spectral Band Replic...

    Authors: AC den Brinker, J Breebaart, P Ekstrand, J Engdegård, F Henn, K Kjörling, W Oomen and H Purnhagen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:468971
  23. Performance of speech recognition systems strongly degrades in the presence of background noise, like the driving noise inside a car. In contrast to existing works, we aim to improve noise robustness focusing ...

    Authors: Björn Schuller, Martin Wöllmer, Tobias Moosmayr and Gerhard Rigoll
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:942617
  24. While linear prediction (LP) has become immensely popular in speech modeling, it does not seem to provide a good approach for modeling audio signals. This is somewhat surprising, since a tonal signal consistin...

    Authors: Toon van Waterschoot and Marc Moonen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2008:706935
  25. Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique...

    Authors: ArnarThor Jensson, Koji Iwano and Sadaoki Furui
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2008:573832
  26. Robust automatic language identification (LID) is a task of identifying the language from a short utterance spoken by an unknown speaker. One of the mainstream approaches named parallel phone recognition langu...

    Authors: Hongbin Suo, Ming Li, Ping Lu and Yonghong Yan
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:674859
  27. This paper investigates the problem of speaker recognition in noisy conditions. A new approach called nonnegative tensor principal component analysis (NTPCA) with sparse constraint is proposed for speech featu...

    Authors: Qiang Wu and Liqing Zhang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:578612
  28. Improving the intelligibility of speech in different environments is one of the main objectives of hearing aid signal processing algorithms. Hearing aids typically employ beamforming techniques using multiple ...

    Authors: Sriram Srinivasan, Ashish Pandharipande and Kees Janse
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:824797
  29. Online personalization of hearing instruments refers to learning preferred tuning parameter values from user feedback through a control wheel (or remote control), during normal operation of the hearing aid. We...

    Authors: Alexander Ypma, Job Geurts, Serkan Özer, Erik van der Werf and Bert de Vries
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:183456
  30. A proven method for achieving effective automatic speech recognition (ASR) due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed ...

    Authors: Umit H. Yapanel and John H.L. Hansen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:148967
  31. Perception of moving sound sources obeys different brain processes from those mediating the localization of static sound events. In view of these specificities, a preprocessing model was designed, based on the...

    Authors: R Kronland-Martinet and T Voinier
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:849696
  32. The present paper proposes a new approach for detecting music boundaries, such as the boundary between music pieces or the boundary between a music piece and a speech section for automatic segmentation of musi...

    Authors: Yoshiaki Itoh, Akira Iwabuchi, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka and Shi-Wook Lee
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:480786
  33. Most audio compression formats are based on the idea of low bit rate transparent encoding. As these types of audio signals are starting to migrate from portable players with inexpensive headphones to higher qu...

    Authors: Demetrios Cantzos, Athanasios Mouchtaris and Chris Kyriakakis
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:462830
  34. We propose a novel approach to improve adaptive decorrelation filtering- (ADF-) based speech source separation in diffuse noise. The effects of noise on system adaptation and separation outputs are handled sep...

    Authors: Rong Hu and Yunxin Zhao
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:349214
  35. This paper proposes a new algorithm for a directional aid with hearing defenders. Users of existing hearing defenders experience distorted information, or in worst cases, directional information may not be per...

    Authors: Benny Sällberg, Farook Sattar and Ingvar Claesson
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:274684
  36. We propose a new low complexity, low delay, and fast converging frequency-domain adaptive algorithm for network echo cancellation in VoIP exploiting MMax and sparse partial (SP) tap-selection criteria in the f...

    Authors: Xiang(Shawn) Lin, Andy W.H. Khong, Milŏs Doroslovăcki and Patrick A. Naylor
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:156960
  37. Binaural cue coding (BCC) is an efficient technique for spatial audio rendering by using the side information such as interchannel level difference (ICLD), interchannel time difference (ICTD), and interchannel...

    Authors: Bo Qiu, Yong Xu, Yadong Lu and Jun Yang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:618104
  38. The behavior of time delay estimation (TDE) is well understood and therefore attractive to apply in acoustic source localization (ASL). A time delay between microphones maps into a hyperbola. Furthermore, the ...

    Authors: Pasi Pertilä, Teemu Korhonen and Ari Visa
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:278185
  39. Rhythmic information plays an important role in Music Information Retrieval. Example applications include automatically annotating large databases by genre, meter, ballroom dance style or tempo, fully automate...

    Authors: Björn Schuller, Florian Eyben and Gerhard Rigoll
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:846135
  40. The phasor representation is introduced to identify the characteristic of the active noise control (ANC) systems. The conventional representation, transfer function, cannot explain the fact that the performanc...

    Authors: Fu-Kun Chen, Ding-Horng Chen and Yue-Dar Jou
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:126859
  41. A multiresolution source/filter model for coding of audio source signals (spot recordings) is proposed. Spot recordings are a subset of the multimicrophone recordings of a music performance, before the mixing ...

    Authors: Athanasios Mouchtaris, Kiki Karadimou and Panagiotis Tsakalides
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:624321
  42. The automatic recognition of foreign-accented Arabic speech is a challenging task since it involves a large number of nonnative accents. As well, the nonnative speech data available for training are generally ...

    Authors: YousefAjami Alotaibi, Sid-Ahmed Selouani and Douglas O'Shaughnessy
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:679831
  43. This paper deals with continuous-time filter transfer functions that resemble tuning curves at particular set of places on the basilar membrane of the biological cochlea and that are suitable for practical VLS...

    Authors: AG Katsiamis, EM Drakakis and RF Lyon
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2007 2007:063685
  44. This work is the result of an interdisciplinary collaboration between scientists from the fields of audio signal processing, phonetics and cognitive neuroscience aiming at studying the perception of modificati...

    Authors: Sølvi Ystad, Cyrille Magne, Snorre Farner, Gregory Pallone, Mitsuko Aramaki, Mireille Besson and Richard Kronland-Martinet
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2007 2007:030194
  45. Multistage vector quantization (MSVQ) is a technique for low complexity implementation of high-dimensional quantizers, which has found applications within speech, audio, and image coding. In this paper, a mult...

    Authors: Pradeepa Yahampath
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2007 2007:067146
  46. A noise suppression algorithm is proposed based on filtering the spectrotemporal modulations of noisy signals. The modulations are estimated from a multiscale representation of the signal spectrogram generated...

    Authors: Nima Mesgarani and Shihab Shamma
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2007 2007:042357

Who reads the journal?

Learn more about the impact the EURASIP Journal on Audio, Speech, and Music Processing has worldwide

Annual Journal Metrics

  • 2022 Citation Impact
    2.4 - 2-year Impact Factor
    2.0 - 5-year Impact Factor
    1.081 - SNIP (Source Normalized Impact per Paper)
    0.458 - SJR (SCImago Journal Rank)

    2023 Speed
    17 days submission to first editorial decision for all manuscripts (Median)
    154 days submission to accept (Median)

    2023 Usage 
    368,607 downloads
    70 Altmetric mentions 

Funding your APC

​​​​​​​Open access funding and policy support by SpringerOpen​​

​​​​We offer a free open access support service to make it easier for you to discover and apply for article-processing charge (APC) funding. Learn more here