Audio Similarity with Siamese Networks
Lepistö, Joni (2023)
Lepistö, Joni
2023
Tieto- ja sähkötekniikan kandidaattiohjelma - Bachelor's Programme in Computing and Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2023-06-06
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202302172489
https://urn.fi/URN:NBN:fi:tuni-202302172489
Tiivistelmä
The aim of this thesis was to study the application of Siamese neural networks to the problem of audio similarity measurement. A selection of Siamese networks with different architectures was presented and the results attained by these networks were compared to a group of baseline methods, which consisted of more classic statistical methods as well as non-Siamese networks. The goal was to find out how the Siamese solutions performed in the task of general audio similarity measurements compared to these methods and if the Siamese solution possibly delivered more generalizable results when dealing with a vast selection audio samples.
All of the systems were trained and tested on a dataset of 2000 samples from different kinds of environments, such as sound samples from cities, nature and domestic settings. The best performers of audio similarity measurement with the dataset were deep-learning based methods, including the presented Siamese networks. Siamese networks showed great potential in their ability to generalize to the vast selection of audio classes, however, the best overall results were reached with a non-Siamese network solution using a convolutional neural network. In light of these promising findings, Siamese networks should be studied further regarding audio processing, -measurement and -evaluation tasks, since the amount of existing research was rather limited.
All of the systems were trained and tested on a dataset of 2000 samples from different kinds of environments, such as sound samples from cities, nature and domestic settings. The best performers of audio similarity measurement with the dataset were deep-learning based methods, including the presented Siamese networks. Siamese networks showed great potential in their ability to generalize to the vast selection of audio classes, however, the best overall results were reached with a non-Siamese network solution using a convolutional neural network. In light of these promising findings, Siamese networks should be studied further regarding audio processing, -measurement and -evaluation tasks, since the amount of existing research was rather limited.
Kokoelmat
- Kandidaatintutkielmat [8452]