One-shot Learning with Siamese Networks for Environmental Audio
Honka, Tapio (2019)
Honka, Tapio
2019
Sähkötekniikka
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2019-05-07
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201905061491
https://urn.fi/URN:NBN:fi:tty-201905061491
Tiivistelmä
In the recent years deep learning based approaches have dominated different types of classification problems. Usually these approaches require large amounts of training data to train a model capable of generalizing to any unseen data of the same type. However, in some applications it might be difficult to gather training data efficiently and it would be beneficial to classify new samples using only a few or even a single training example.
For us humans the knowledge from previously learned concepts is relatively easy to transfer to unfamiliar concepts, therefore many researchers have experimented with this idea in machine learning classification tasks. The idea of only using a single labelled example to classify unseen data is known as one-shot learning and has been successful especially in the field of computer vision. Many of the modern approaches for one-shot learning utilize a special neural network architecture named siamese network. This architecture can be trained to predict similarities between inputs, and can be used for a metric-based approach to one-shot learning. Siamese networks have been used for different audio related tasks before, however their usage in one-shot learning for audio classification has received less attention compared to computer vision.
The purpose of this thesis is to extend the idea of one-shot learning to environmental audio classification and see if this approach is feasible. The proposed system was trained and evaluated on the ESC dataset, consisting of 50 different environmental audio categories. The final one-shot evaluation was done to 5 completely unseen classes, using only a single example of each class when performing the classification. The results show that convolutional siamese networks are indeed a valid approach to the difficult one-shot classification task for environmental audio.
For us humans the knowledge from previously learned concepts is relatively easy to transfer to unfamiliar concepts, therefore many researchers have experimented with this idea in machine learning classification tasks. The idea of only using a single labelled example to classify unseen data is known as one-shot learning and has been successful especially in the field of computer vision. Many of the modern approaches for one-shot learning utilize a special neural network architecture named siamese network. This architecture can be trained to predict similarities between inputs, and can be used for a metric-based approach to one-shot learning. Siamese networks have been used for different audio related tasks before, however their usage in one-shot learning for audio classification has received less attention compared to computer vision.
The purpose of this thesis is to extend the idea of one-shot learning to environmental audio classification and see if this approach is feasible. The proposed system was trained and evaluated on the ESC dataset, consisting of 50 different environmental audio categories. The final one-shot evaluation was done to 5 completely unseen classes, using only a single example of each class when performing the classification. The results show that convolutional siamese networks are indeed a valid approach to the difficult one-shot classification task for environmental audio.
Kokoelmat
- Kandidaatintutkielmat [8324]