Perceptual Continuity and Naturalness of Expressive Strength in Singing Voices Based on Speech Morphing

Yonezawa, Tomoko; Suzuki, Noriko; Abe, Shinji; Mase, Kenji; Kogure, Kiyoshi

doi:10.1155/2007/23807

Research Article
Open access
Published: 01 October 2007

Perceptual Continuity and Naturalness of Expressive Strength in Singing Voices Based on Speech Morphing

Tomoko Yonezawa^1,2,3,
Noriko Suzuki⁴,
Shinji Abe^1,3,
Kenji Mase^2,1 &
…
Kiyoshi Kogure⁵

EURASIP Journal on Audio, Speech, and Music Processing volume 2007, Article number: 023807 (2007) Cite this article

1806 Accesses
2 Citations
Metrics details

Abstract

This paper experimentally shows the importance of perceptual continuity of the expressive strength in vocal timbre for natural change in vocal expression. In order to synthesize various and continuous expressive strengths with vocal timbre, we investigated gradually changing expressions by applying the STRAIGHT speech morphing algorithm to singing voices. Here, a singing voice without expression is used as the base of morphing, and singing voices with three different expressions are used as the target. Through statistical analyses of perceptual evaluations, we confirmed that the proposed morphing algorithm provides perceptual continuity of vocal timbre. Our results showed the following: (i) gradual strengths in absolute evaluations, and (ii) a perceptually linear strength provided by the calculation of corrected intervals of the morph ratio by the inverse (reciprocal) function of an equation that approximates the perceptual strength. Finally, we concluded that applying continuity was highly effective for achieving perceptual naturalness, judging from the results showing that (iii) our gradual transformation method can perform well for perceived naturalness.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17]

References

Duffy BR: Anthropomorphism and the social robot. Robotics and Autonomous Systems 2003,42(3-4):177-190. 10.1016/S0921-8890(02)00374-3
Article MATH Google Scholar
Minato T, MacDorman KF, Shimada M, Itakura S, Lee K, Ishiguro H: Evaluating humanlikeness by comparing responses elicited by an android and a person. Proceedings of the 2nd International Workshop on Man-Machine Symbiotic Systems, November 2004, Kyoto, Japan 373-383.
Google Scholar
Hanson D: Exploring the aesthetic range for humanoid robots. Proceedings of the 28th Annual Conference of the Cognitive Science Society in Cooperation with the 5th International Conference on Cognitive Science (CogSci/ICCS '06), July 2006, Vancouver, BC, Canada 16-20.
Google Scholar
Kawahara H, Matsui H: Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), April 2003, Hong Kong 1: 256-259.
Article Google Scholar
Kawahara H, Masuda-Kasuse I, Cheveigne A: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a reptitive structure in sounds. Speech Communication 1999,27(3-4):187-207. 10.1016/S0167-6393(98)00085-5
Article Google Scholar
Schröder M: Emotional speech synthesis: a review. Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech '01), September 2001, Aalborg, Denmark 1: 561-564.
Google Scholar
Erickson D: Expressive speech: production, perception and application to speech synthesis. Acoustical Science and Technology 2005,26(4):317-325. 10.1250/ast.26.317
Article Google Scholar
Iida A, Iga S, Higuchi F, Campbell N, Yasumura M: A speech synthesis system with emotion for assisting communication. Proceedings of the ISCA Workshop on Speech and Emotion, September 2000, Belfast, Northern Ireland, UK 167-172.
Google Scholar
Campbell N: Developments in corpus-based speech synthesis: approaching natural conversational speech. IEICE Transactions on Information and Systems 2005,E88-D(3):376-383. 10.1093/ietisy/e88-d.3.376
Article Google Scholar
Saitou T, Unoki M, Akagi M: Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis. Speech Communication 2005,46(3-4):405-417. 10.1016/j.specom.2005.01.010
Article Google Scholar
Saitou T, Tsuji N, Unoki M, Akagi M: Analysis of acoustic features affecting "singing-ness" and its application to siging-voice synthesis from speaking-voice. Proceedings of the 8th International Conference on Spoken Language Processing (ICSLP '04), October 2004, Jeju, Korea 3: 1929-1932.
Google Scholar
Cano P, Loscos A, Bonada J, Boer M, Serra X: Voice morphing system for impersonating in karaoke applications. Proceedings of the International Computer Music Conference (ICMC '00), August 2000, Berlin, Germany 109-112.
Google Scholar
Sogabe Y, Kakehi K, Kawahara H: Psychological evaluation of emotional speech using a new morphing method. Proceedings of the 4th Joint International Conference on Cognitive Science (ICCS/ASCS '03), July 2003, Sydney, Australia
Google Scholar
Matsui H, Kawahara H: Investigation of emotionally morphed speech perception and its structure using a high quality speech manipulation system. Proceedings of the 8th European Conference on Speech Communication and Technology (Eurospeech '03), September 2003, Geneva, Switzerland 2113-2116.
Google Scholar
Mareüil PB, Célérier P, Toen J: Generation of emotions by a morphing technique in English, French and Spanish. Proceedings of Speech Prosody, April 2002, Aix-en-Provence, France 187-190.
Google Scholar
Yonezawa T, Suzuki N, Mase K, Kogure K: Gradually changing expression of singing voice based on morphing. Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech '05), September 2005, Lisbon, Portugal 541-544.
Google Scholar
Yonezawa T, Suzuki N, Mase K, Kogure K: Handysinger: expressive singing voice morphing using personified hand-puppet interface. Proceedings of the 5th International Conference on New Interfaces for Musical Expression (NIME '05), May 2005, Vancouver, Canada 121-126.
Google Scholar

Download references

Author information

Authors and Affiliations

ATR Intelligent Robotics and Communication Laboratories, 2-2-2 Hikaridai, Keihanna Science City, Kyoto, 619-0288, Japan
Tomoko Yonezawa, Shinji Abe & Kenji Mase
Nagoya University, Furo-cho, Chikusa, Nagoya, 464-8601, Japan
Tomoko Yonezawa & Kenji Mase
ATR Media Information Science Laboratories, 2-2-2 Hikaridai, Keihanna Science City, Kyoto, 619-0288, Japan
Tomoko Yonezawa & Shinji Abe
National Institute of Information and Communication Technology/ATR Cognitive Information Science Laboratories, 2-2-2 Hikaridai, Keihanna Science City, Kyoto, 619-0288, Japan
Noriko Suzuki
ATR Knowledge Science Laboratories, 2-2-2 Hikaridai, Keihanna Science City, Kyoto, 619-0288, Japan
Kiyoshi Kogure

Authors

Tomoko Yonezawa
View author publications
You can also search for this author in PubMed Google Scholar
Noriko Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Shinji Abe
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Mase
View author publications
You can also search for this author in PubMed Google Scholar
Kiyoshi Kogure
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomoko Yonezawa.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Yonezawa, T., Suzuki, N., Abe, S. et al. Perceptual Continuity and Naturalness of Expressive Strength in Singing Voices Based on Speech Morphing. J AUDIO SPEECH MUSIC PROC. 2007, 023807 (2007). https://doi.org/10.1155/2007/23807

Download citation

Received: 30 November 2006
Revised: 23 April 2007
Accepted: 17 August 2007
Published: 01 October 2007
DOI: https://doi.org/10.1155/2007/23807

Perceptual Continuity and Naturalness of Expressive Strength in Singing Voices Based on Speech Morphing

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords