Continual Learning In Automated Audio Captioning
Berg, Jan (2021)
Berg, Jan
2021
Tietojenkäsittelyopin maisteriohjelma - Master's Programme in Computer Science
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2021-11-15
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202111168433
https://urn.fi/URN:NBN:fi:tuni-202111168433
Tiivistelmä
Teaching neural network models to classify new tasks and old tasks on new domains is a process, where a common problem is the forgetting of previous tasks and/or domains. This problem is referred to as catastrophic forgetting. Continual Learning, which is sometimes also called Lifelong or Incremental learning, is a research field that aims to find solution to catastrophic forgetting in order to create models which can learn new tasks sequentially without need for fully retraining. As neural networks and machine learning in general has gained plenty of interest during last decade, the need to achieve these kind of continual learning methods have become apparent to many developers and researchers. Thus, continual learning has become quite a hot topic within neural networks research and many new methods have been introduced during the past 5 years.
Audio Captioning is a problem domain, where the goal is to generate textual presentation of audio data, effectively describing what is heard in the audio. Most current state-of-the-art Audio Captioning models are based on encoder-decoder type structures, with more recent models based on the attention-based Transformer model that has gained huge popularity and achieved state-of-the-art results in many different domains of tasks, such as in Neural Machine Translation.
This thesis presents basic concepts to neural networks as well as to Continual Learning. Different types of Continual Learning methods are presented with small overview for most popular methods explained.
Furthermore, this thesis presents the first study of Continual Learning in Audio Captioning using Learning Without Forgetting Continual Learning approach. Using the approach resulted in some degree of alleviation in catastrophic forgetting while further training an Audio Captioning model Wavetransformer.
Audio Captioning is a problem domain, where the goal is to generate textual presentation of audio data, effectively describing what is heard in the audio. Most current state-of-the-art Audio Captioning models are based on encoder-decoder type structures, with more recent models based on the attention-based Transformer model that has gained huge popularity and achieved state-of-the-art results in many different domains of tasks, such as in Neural Machine Translation.
This thesis presents basic concepts to neural networks as well as to Continual Learning. Different types of Continual Learning methods are presented with small overview for most popular methods explained.
Furthermore, this thesis presents the first study of Continual Learning in Audio Captioning using Learning Without Forgetting Continual Learning approach. Using the approach resulted in some degree of alleviation in catastrophic forgetting while further training an Audio Captioning model Wavetransformer.