Skip to main content

Articles

Page 10 of 11

  1. Mood of Music is among the most relevant and commercially promising, yet challenging attributes for retrieval in large music collections. In this respect this article first provides a short overview on methods...

    Authors: Björn Schuller, Johannes Dorfner and Gerhard Rigoll
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:735854
  2. This work explores the effect of mismatches between adults' and children's speech due to differences in various acoustic correlates on the automatic speech recognition performance under mismatched conditions. ...

    Authors: Shweta Ghai and Rohit Sinha
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:318785
  3. Human communication about entities and events is primarily linguistic in nature. While visual representations of information are shown to be highly effective as well, relatively little is known about the commu...

    Authors: Xiaojuan Ma, Christiane Fellbaum and Perry Cook
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:404860
  4. The paper considers the task of recognizing phonemes and words from a singing input by using a phonetic hidden Markov model recognizer. The system is targeted to both monophonic singing and singing in polyphon...

    Authors: Annamaria Mesaros and Tuomas Virtanen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:546047
  5. With ageing, human voices undergo several changes which are typically characterized by increased hoarseness and changes in articulation patterns. In this study, we have examined the effect on Automatic Speech ...

    Authors: Ravichander Vipperla, Steve Renals and Joe Frankel
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:525783
  6. We revisit an original concept of speech coding in which the signal is separated into the carrier modulated by the signal envelope. A recently developed technique, called frequency-domain linear prediction (FD...

    Authors: Petr Motlicek, Sriram Ganapathy, Hynek Hermansky and Harinath Garudadri
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:856280
  7. Spoken utterance retrieval was largely studied in the last decades, with the purpose of indexing large audio databases or of detecting keywords in continuous speech streams. While the indexing of closed corpor...

    Authors: Mickael Rouvier, Georges Linarès and Benjamin Lecouteux
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:326578
  8. Breathy and whispery voices are nonmodal phonations produced by an air escape through the glottis and may carry important linguistic or paralinguistic information (intentions, attitudes, and emotions), dependi...

    Authors: CarlosToshinori Ishi, Hiroshi Ishiguro and Norihiro Hagita
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:528193
  9. The automatic recognition of children's speech is well known to be a challenge, and so is the influence of affect that is believed to downgrade performance of a speech recogniser. In this contribution, we inve...

    Authors: Stefan Steidl, Anton Batliner, Dino Seppi and Björn Schuller
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2010:783954
  10. Fractional Fourier transform (FrFT) has been proposed to improve the time-frequency resolution in signal analysis and processing. However, selecting the FrFT transform order for the proper analysis of multicom...

    Authors: Hui Yin, Climent Nadeu and Volker Hohmann
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2010 2009:304579
  11. This paper proposes a query by example system for generic audio. We estimate the similarity of the example signal and the samples in the queried database by calculating the distance between the probability den...

    Authors: Marko Helén and Tuomas Virtanen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2010:179303
  12. This paper proposes a method for transcribing drums from polyphonic music using a network of connected hidden Markov models (HMMs). The task is to detect the temporal locations of unpitched percussive sounds (...

    Authors: Jouni Paulus and Anssi Klapuri
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:497292
  13. We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves We...

    Authors: Akinori Ito, Yasutomo Kajiura, Motoyuki Suzuki and Shozo Makino
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:140575
  14. Speech recognition applications are known to require a significant amount of resources. However, embedded speech recognition only authorizes few KB of memory, few MIPS, and small amount of training data. In or...

    Authors: Christophe Lévy, Georges Linarès and Jean-François Bonastre
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:806186
  15. This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animat...

    Authors: Giampiero Salvi, Jonas Beskow, Samer Al Moubayed and Björn Granström
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:191940
  16. We describe here the control, shape and appearance models that are built using an original photogrammetric method to capture characteristics of speaker-specific facial articulation, anatomy, and texture. Two o...

    Authors: Gérard Bailly, Oxana Govokhina, Frédéric Elisei and Gaspard Breton
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:769494
  17. We describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. Speech lip movements are captured using a 3D stereo face capture system and split up into p...

    Authors: JamesD Edge, Adrian Hilton and Philip Jackson
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:597267
  18. Computer-Assisted Language Learning (CALL) applications for improving the oral skills of low-proficient learners have to cope with non-native speech that is particularly challenging. Since unconstrained non-na...

    Authors: Joost van Doremalen, Catia Cucchiarini and Helmer Strik
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2010:973954
  19. Robust recognition of general audio events constitutes a topic of intensive research in the signal processing community. This work presents an efficient methodology for acoustic surveillance of atypical situat...

    Authors: Stavros Ntalampiras, Ilyas Potamitis and Nikos Fakotakis
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:594103
  20. Wireless-VoIP communications introduce perceptual degradations that are not present with traditional VoIP communications. This paper investigates the effects of such degradations on the performance of three st...

    Authors: TiagoH Falk and Wai-Yip Chan
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:104382
  21. This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a ...

    Authors: Kang Liu and Joern Ostermann
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:174192
  22. Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either...

    Authors: Wesley Mattheyses, Lukas Latacz and Werner Verhelst
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:169819
  23. The paper presents an adaptive system for Voiced/Unvoiced (V/UV) speech detection in the presence of background noise. Genetic algorithms were used to select the features that offer the best V/UV detection acc...

    Authors: F Beritelli, S Casale, A Russo and S Serrano
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:965436
  24. Design and implementation strategies of spatial sound rendering are investigated in this paper for automotive scenarios. Six design methods are implemented for various rendering modes with different number of ...

    Authors: MingsianR Bai and Jhih-Ren Hong
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:876297
  25. In patients suffering from head and neck cancer, speech intelligibility is often restricted. For assessment and outcome measurements, automatic speech recognition systems have previously been shown to be appro...

    Authors: Andreas Maier, Tino Haderlein, Florian Stelzle, Elmar Nöth, Emeka Nkenke, Frank Rosanowski, Anne Schützenberger and Maria Schuster
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2010:926951
  26. The problem of overlapping harmonics is particularly acute in musical sound separation and has not been addressed adequately. We propose a monaural system based on binary time-frequency masking with an emphasi...

    Authors: Yipeng Li and DeLiang Wang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:130567
  27. Temporally localized distortions account for the highest variance in subjective evaluation of coded speech signals (Sen (2001) and Hall (2001). The ability to discern and decompose perceptually relevant tempor...

    Authors: Wenliang Lu and D Sen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:865723
  28. The problem of tracking multiple intermittently speaking speakers is difficult as some distinct problems must be addressed. The number of active speakers must be estimated, these active speakers must be identi...

    Authors: Angela Quinlan, Mitsuru Kawamoto, Yosuke Matsusaka, Hideki Asoh and Futoshi Asano
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:673202
  29. Speech feature extraction has been a key focus in robust speech recognition research. In this work, we discuss data-driven linear feature transformations applied to feature vectors in the logarithmic mel-frequ...

    Authors: Hyunsin Park, Tetsuya Takiguchi and Yasuo Ariki
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:690451
  30. There are many ways of synthesizing sound on a computer. The method that we consider, called a mass-spring system, synthesizes sound by simulating the vibrations of a network of interconnected masses, springs, an...

    Authors: Don Morgan and Sanzheng Qiao
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:947823
  31. In 2003 and 2004, the ISO/IEC MPEG standardization committee added two amendments to their MPEG-4 audio coding standard. These amendments concern parametric coding techniques and encompass Spectral Band Replic...

    Authors: AC den Brinker, J Breebaart, P Ekstrand, J Engdegård, F Henn, K Kjörling, W Oomen and H Purnhagen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:468971
  32. Performance of speech recognition systems strongly degrades in the presence of background noise, like the driving noise inside a car. In contrast to existing works, we aim to improve noise robustness focusing ...

    Authors: Björn Schuller, Martin Wöllmer, Tobias Moosmayr and Gerhard Rigoll
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2009:942617
  33. While linear prediction (LP) has become immensely popular in speech modeling, it does not seem to provide a good approach for modeling audio signals. This is somewhat surprising, since a tonal signal consistin...

    Authors: Toon van Waterschoot and Marc Moonen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2008:706935
  34. Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique...

    Authors: ArnarThor Jensson, Koji Iwano and Sadaoki Furui
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2009 2008:573832
  35. Robust automatic language identification (LID) is a task of identifying the language from a short utterance spoken by an unknown speaker. One of the mainstream approaches named parallel phone recognition langu...

    Authors: Hongbin Suo, Ming Li, Ping Lu and Yonghong Yan
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:674859
  36. This paper investigates the problem of speaker recognition in noisy conditions. A new approach called nonnegative tensor principal component analysis (NTPCA) with sparse constraint is proposed for speech featu...

    Authors: Qiang Wu and Liqing Zhang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:578612
  37. Improving the intelligibility of speech in different environments is one of the main objectives of hearing aid signal processing algorithms. Hearing aids typically employ beamforming techniques using multiple ...

    Authors: Sriram Srinivasan, Ashish Pandharipande and Kees Janse
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:824797
  38. Online personalization of hearing instruments refers to learning preferred tuning parameter values from user feedback through a control wheel (or remote control), during normal operation of the hearing aid. We...

    Authors: Alexander Ypma, Job Geurts, Serkan Özer, Erik van der Werf and Bert de Vries
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:183456
  39. A proven method for achieving effective automatic speech recognition (ASR) due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed ...

    Authors: Umit H. Yapanel and John H.L. Hansen
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:148967
  40. Perception of moving sound sources obeys different brain processes from those mediating the localization of static sound events. In view of these specificities, a preprocessing model was designed, based on the...

    Authors: R Kronland-Martinet and T Voinier
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:849696
  41. The present paper proposes a new approach for detecting music boundaries, such as the boundary between music pieces or the boundary between a music piece and a speech section for automatic segmentation of musi...

    Authors: Yoshiaki Itoh, Akira Iwabuchi, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka and Shi-Wook Lee
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:480786
  42. Most audio compression formats are based on the idea of low bit rate transparent encoding. As these types of audio signals are starting to migrate from portable players with inexpensive headphones to higher qu...

    Authors: Demetrios Cantzos, Athanasios Mouchtaris and Chris Kyriakakis
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:462830
  43. We propose a novel approach to improve adaptive decorrelation filtering- (ADF-) based speech source separation in diffuse noise. The effects of noise on system adaptation and separation outputs are handled sep...

    Authors: Rong Hu and Yunxin Zhao
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:349214
  44. This paper proposes a new algorithm for a directional aid with hearing defenders. Users of existing hearing defenders experience distorted information, or in worst cases, directional information may not be per...

    Authors: Benny Sällberg, Farook Sattar and Ingvar Claesson
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:274684
  45. We propose a new low complexity, low delay, and fast converging frequency-domain adaptive algorithm for network echo cancellation in VoIP exploiting MMax and sparse partial (SP) tap-selection criteria in the f...

    Authors: Xiang(Shawn) Lin, Andy W.H. Khong, Milŏs Doroslovăcki and Patrick A. Naylor
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:156960
  46. Binaural cue coding (BCC) is an efficient technique for spatial audio rendering by using the side information such as interchannel level difference (ICLD), interchannel time difference (ICTD), and interchannel...

    Authors: Bo Qiu, Yong Xu, Yadong Lu and Jun Yang
    Citation: EURASIP Journal on Audio, Speech, and Music Processing 2008 2008:618104

Who reads the journal?

Learn more about the impact the EURASIP Journal on Audio, Speech, and Music Processing has worldwide

Annual Journal Metrics

  • Citation Impact 2023
    Journal Impact Factor: 1.7
    5-year Journal Impact Factor: 1.6
    Source Normalized Impact per Paper (SNIP): 1.051
    SCImago Journal Rank (SJR): 0.414

    Speed 2023
    Submission to first editorial decision (median days): 17
    Submission to acceptance (median days): 154

    Usage 2023
    Downloads: 368,607
    Altmetric mentions: 70

Funding your APC

​​​​​​​Open access funding and policy support by SpringerOpen​​

​​​​We offer a free open access support service to make it easier for you to discover and apply for article-processing charge (APC) funding. Learn more here