Hyppää sisältöön
    • Suomeksi
    • In English
Trepo
  • Suomeksi
  • In English
  • Kirjaudu
Näytä viite 
  •   Etusivu
  • Trepo
  • Kandidaatintutkielmat
  • Näytä viite
  •   Etusivu
  • Trepo
  • Kandidaatintutkielmat
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

Sequence Temporal Sub-Sampling for Automated Audio Captioning

Nguyen, Khoa (2020)

 
Avaa tiedosto
NguyenKhoa.pdf (4.676Mt)
Lataukset: 



Nguyen, Khoa
2020

Bachelor's Programme in Science and Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2020-11-12
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202011137957
Tiivistelmä
Audio captioning is a novel task in machine learning which involves the generation of textual description for an audio signal. For example, a method for audio captioning must be able to generate descriptions like “two people talking about football”, or “college clock striking” from the corresponding audio signals. Audio captioning is one of the tasks in the Detection and Classification of Acoustic Scenes and Events 2020 (DCASE2020). Most audio captioning methods use the encoder-decoder deep neural networks architecture as a function to map the extracted features from input audio sequence to the output captions. However, the length of an output caption is considerably less than the length of an input audio signal, for example, 10 words versus 2000 audio feature vectors. This thesis work reports an attempt to take advantage of this difference in length by employing temporal sub-sampling in the encoder-decoder neural networks. The method is evaluated using the Clotho audio captioning dataset and the DCASE2020 evaluation metrics. Experimental results show that temporal sequence sub-sampling is able to improve all considered metrics, as well as memory and time complexity while training and calculating predicted output.
Kokoelmat
  • Kandidaatintutkielmat [9897]
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste
 

 

Selaa kokoelmaa

TekijätNimekkeetTiedekunta (2019 -)Tiedekunta (- 2018)Tutkinto-ohjelmat ja opintosuunnatAvainsanatJulkaisuajatKokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste