Hyppää sisältöön
    • Suomeksi
    • In English
Trepo
  • Suomeksi
  • In English
  • Kirjaudu
Näytä viite 
  •   Etusivu
  • Trepo
  • Kandidaatintutkielmat
  • Näytä viite
  •   Etusivu
  • Trepo
  • Kandidaatintutkielmat
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

Implementing a text-to-speech module for a Deep Speaking Avatar

Granat, Väinö-Waltteri (2021)

 
Avaa tiedosto
GranatVäinö-Waltteri.pdf (2.312Mt)
Lataukset: 



Granat, Väinö-Waltteri
2021

Tieto- ja sähkötekniikan kandidaattiohjelma - Bachelor's Programme in Computing and Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2021-06-03
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202105305566
Tiivistelmä
The aim of this project was to implement an end-to-end speech synthesis deep neural network, into a Deep Speaking Avatar project. Deep Speaking Avatar (DSA) is a character built using multiple deep neural networks (DNNs) that is able to have a natural-like conversation with its user. The network needs to be able to take text as its input which is turned into audible speech. The Deep Speaking Avatar project sets some limitations for the speech synthesis module. The module needs to be able to produce natural-sounding speech with a low memory footprint and short passthrough time.

The first part of this thesis consists of a general discussion about neural networks, how they are trained, and a deeper dive into some specific components that are relevant for this work, such as long short-term memory (LSTM) units and convolutional neural networks. Next neural networks specific to speech synthesis are examined. Especially encoder-decoder structure and attention layers turn out to be important. Then topics related to building the module with a focus on Tacotron 2 architecture which is used to build the module. Finally, the conclusions of this work are presented.

With Tacotron 2 architecture and a pretrained model, a working speech synthesis module for Deep Speaking Avatar was built. The module works with other components in the project but no real world testing has occurred yet
Kokoelmat
  • Kandidaatintutkielmat [9041]
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste
 

 

Selaa kokoelmaa

TekijätNimekkeetTiedekunta (2019 -)Tiedekunta (- 2018)Tutkinto-ohjelmat ja opintosuunnatAvainsanatJulkaisuajatKokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste