Building and Visualizing a Percussion Dataset Using Deep Audio Embeddings and Dimensionality Reduction
Nieminen, Elias (2020)
Tieto- ja sähkötekniikan kandidaattiohjelma - Degree Programme in Computing and Electrical Engineering, BSc (Tech)
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
In this bachelor’s thesis, a dataset consisting of samples of various drums and percussion is built and a method for visualizing it is described. An insight to the process of planning, recording and labeling an audio dataset is provided, and further on, a network embedding based visualization technique is applied to the dataset. The dataset visualization method consists of two parts. First, high-dimensional embeddings produced by OpenL3, a pre-trained deep convolutional neural network model, are obtained. Then, the embeddings are projected to a 2-dimensional plane with the t-Stochastic Neighborhood Embedding (t-SNE) algorithm. Promising results are presented, as the method is able to separate different drum types and articulations into distinct clusters in the 2D plane, and moreover, it is able to recognize dynamic variations inside the classes themselves and organize the samples respectively. Additionally, Python programs are written to assist in the dataset labeling and visualization tasks, and the code and the dataset alongside with pre-computed embeddings are published online.
- Kandidaatintutkielmat