Hyppää sisältöön
    • Suomeksi
    • In English
Trepo
  • Suomeksi
  • In English
  • Kirjaudu
Näytä viite 
  •   Etusivu
  • Trepo
  • Opinnäytteet - ylempi korkeakoulututkinto
  • Näytä viite
  •   Etusivu
  • Trepo
  • Opinnäytteet - ylempi korkeakoulututkinto
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

Autoregressive model based on a deep convolutional neural network for audio generation

Cabello Piqueras, Laura (2017)

 
Avaa tiedosto
Cabello Piqueras Laura.pdf (2.851Mt)
Lataukset: 



Cabello Piqueras, Laura
2017

Information Technology
Tieto- ja sähkötekniikan tiedekunta - Faculty of Computing and Electrical Engineering
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2017-04-05
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201703221212
Tiivistelmä
The main objective of this work is to investigate how a deep convolutional neural network (CNN) performs in audio generation tasks. We study a final architecture based on an autoregressive model of deep CNN that operates directly at the waveform level.

In first place, we study different options to tackle the task of audio generation. We define the best approach as a classification task with one-hot encode data; generation is based on sequential predictions: after next sample of an input sequence is predicted, it is fed back into the network to predict the next sample.

We present the basics of the preferred architecture for generation, adapted from WaveNet model proposed by DeepMind. It is based on dilated causal convolutions which allows an exponential growth of the receptive field size with depth of the network. Bigger receptive fields are desirable when dealing with temporal sequences since it increases the model capacity to model temporal correlations at longer timescales.

Due to the lack of an objective method to assess the quality of new synthesized signals, we firstly test a wide range of network settings with pure tones so the network is capable to predict the same sequences. In order to overcome the diffculties of training a deep network and to accelerate the research adjusted to our computational resources, we constrain the input database to a mixture of two sinusoids within an audible range of frequencies. In generation phase, we acknowledge the key role of training a network with a large receptive field and large input sequences. Likewise, the amount of examples we feed to the network every training epoch exert a decisive influence in any studied approach.
Kokoelmat
  • Opinnäytteet - ylempi korkeakoulututkinto [42015]
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste
 

 

Selaa kokoelmaa

TekijätNimekkeetTiedekunta (2019 -)Tiedekunta (- 2018)Tutkinto-ohjelmat ja opintosuunnatAvainsanatJulkaisuajatKokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste