Prosody and Wavelets: Towards a natural speaking style conversion
Sanchez Gasulla, Gerard (2014)
Sanchez Gasulla, Gerard
2014
Tieto- ja sähkötekniikan tiedekunta - Faculty of Computing and Electrical Engineering
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2014-04-09
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201405211178
https://urn.fi/URN:NBN:fi:tty-201405211178
Tiivistelmä
Speech is the basis of human communication: in everyday life we automatically decode speech into language regardless of who speaks. In a similar way, we have the ability to recognize di erent speakers, despite the linguistic content of the speech. Additionally to the voice individuality of the speaker, the particular prosody of speech involves relevant information concerning the identity, age, social group or economical status of the speaker, helping us identify the person to whom we are talking without seeing the speaker.
Voice conversion systems deal with the conversion of a speech signal to sound as if it was uttered by another speaker. It has been an important amount of work in the conversion of the timber of the voice, the spectral features, meanwhile the conversion of pitch and the way it temporarily evolves, modeling the speaker dependent prosody, is mostly achieved by just controlling the level and range.
This thesis focuses on prosody conversion, proposing an approach based on a wavelet transformation of the pitch contours. It has been performed a study of the wavelet domain, discerning among the di erent timing of the prosodic events, thus allowing an improved modeling of them. Consequently, the prosody conversion is achieved in the wavelet domain, using regression techniques originally developed for the spectral features conversion, in voice conversion systems.
Voice conversion systems deal with the conversion of a speech signal to sound as if it was uttered by another speaker. It has been an important amount of work in the conversion of the timber of the voice, the spectral features, meanwhile the conversion of pitch and the way it temporarily evolves, modeling the speaker dependent prosody, is mostly achieved by just controlling the level and range.
This thesis focuses on prosody conversion, proposing an approach based on a wavelet transformation of the pitch contours. It has been performed a study of the wavelet domain, discerning among the di erent timing of the prosodic events, thus allowing an improved modeling of them. Consequently, the prosody conversion is achieved in the wavelet domain, using regression techniques originally developed for the spectral features conversion, in voice conversion systems.