A Comparative Analysis of Residual Block Alternatives for End-to-End Audio Classification

Naranjo-Alcazar, Javier; Perez-Castanos, Sergi; Martín-Morató, Irene; Zuccarello, Pedro; Ferri, Francesc J.; Cobos, Maximo

A Comparative Analysis of Residual Block Alternatives for End-to-End Audio Classification

Naranjo-Alcazar, Javier; Perez-Castanos, Sergi; Martín-Morató, Irene; Zuccarello, Pedro; Ferri, Francesc J.; Cobos, Maximo (2020)

Avaa tiedosto

09226468.pdf (910.4Kt)

Lataukset:

Naranjo-Alcazar, Javier

Perez-Castanos, Sergi

Martín-Morató, Irene

Zuccarello, Pedro

Ferri, Francesc J.

Cobos, Maximo

2020

doi:10.1109/ACCESS.2020.3031685

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202101141299

Kuvaus

Peer reviewed

Tiivistelmä

Residual learning is known for being a learning framework that facilitates the training of very deep neural networks. Residual blocks or units are made up of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or shortcut connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stacked layers making up a residual block. While residual networks for image classification using convolutional neural networks (CNNs) have been widely discussed in the literature, their adoption for 1D end-to-end architectures is still scarce in the audio domain. Thus, the suitability of different residual block designs for raw audio classification is partly unknown. The purpose of this article is to compare, analyze and discuss the performance of several residual block implementations, the most commonly used in image classification problems, within a state-of-the-art CNN-based architecture for end-to-end audio classification using raw audio waveforms. Deep and careful statistical analyses over six different residual block alternatives are conducted, considering two well-known datasets and common input normalization choices. The results show that, while some significant differences in performance are observed among architectures using different residual block designs, the selection of the most suitable residual block can be highly dependent on the input data.

Kokoelmat

TUNICRIS-julkaisut [16882]