Contributions to Multilingual Low-Footprint TTS System for Hand-Held Devices
Moberg, Marko (2007)
Moberg, Marko
Tampere University of Technology
2007
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-200810021118
https://urn.fi/URN:NBN:fi:tty-200810021118
Tiivistelmä
Speech technology in the form of automatic speech recognition (ASR) and speech synthesis (text-to-speech or TTS) has become common in everyday use. Applications such as in-car navigation, hands-free control of devices, aids for visually impaired people, telephone-based schedule and reservation services, military applications and even some dictation applications can be found on the market today. The advancement of technology has made it possible to provide voice-based applications also on smaller, hand-held devices.This thesis focuses on describing the challenges and solutions in optimizing a multilingual text-to-speech (TTS) system for hand-held devices. The challenges in development are introduced by the mismatch between the application requirements and available resources. The requirements include application features, speech quality, language support and portability. The main resources are memory size, performance, development time and cost. The trade-off between requirements and resources are especially challenging in cost-optimized embedded devices targeted for consumer market.In this thesis, a multilingual TTS framework is designed and optimized to meet the application requirements according to the availability of various resources. The TTS system uses a Klatt88 formant synthesizer or unit selection synthesis depending on the configuration. Novel approaches and improvements are applied to text normalization, text-to-phoneme conversion, control of synthesis parameters, system framework, and development tools.It is shown that commercially viable multilingual TTS-based applications can be created by the following four main methods. First, the limitations of the TTS technology can be hidden or alleviated by limiting the scope of the application. Second, optimization in memory consumption and performance makes the TTS technology more attractive for hand-held devices. Third, multilingualism and rapid development of new synthesis languages are enabled through system design and proper development methods and tools. Finally, the separated TTS engine software and language dependent data make it possible to hide the software engineering details by providing language developers an interface with a higher level of abstractness.
Kokoelmat
- Väitöskirjat [4908]