Filterbank Learning for Deep Neural Network Based Polyphonic Sound Event Detection

Cakir, Emre; Ozan, Ezgi Can; Virtanen, Tuomas

Filterbank Learning for Deep Neural Network Based Polyphonic Sound Event Detection

Cakir, Emre; Ozan, Ezgi Can; Virtanen, Tuomas (2016-11-03)

Avaa tiedosto

filterbank_learning_ijcnn_2016.pdf (669.5Kt)

Lataukset:

Cakir, Emre

Ozan, Ezgi Can

Virtanen, Tuomas

IEEE

03.11.2016

2016 International Joint Conference on Neural Networks (IJCNN)

This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited

doi:10.1109/IJCNN.2016.7727634

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201803011329

Kuvaus

Peer reviewed

Tiivistelmä

Deep learning techniques such as deep feedforward neural networks and deep convolutional neural networks have recently been shown to improve the performance in sound event detection compared to traditional methods such as Gaussian mixture models. One of the key factors of this improvement is the capability of deep architectures to automatically learn higher levels of acoustic features in each layer. In this work, we aim to combine the feature learning capabilities of deep architectures with the empirical knowledge of human perception. We use the first layer of a deep neural network to learn a mapping from a high-resolution magnitude spectrum to smaller amount of frequency bands, which effectively learns a filterbank for the sound event detection task. We initialize the first hidden layer weights to match with the perceptually motivated mel filterbank magnitude response. We also integrate this initialization scheme with context windowing by using an appropriately constrained deep convolutional neural network. The proposed method does not only result with better detection accuracy, but also provides insight on the frequencies deemed essential for better discrimination of given sound events.

Kokoelmat

TUNICRIS-julkaisut [16977]