Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs
Diaz-Guerra Aparicio, David; Miguel, Antonio; Beltran, Jose R. (2023)
Diaz-Guerra Aparicio, David
Miguel, Antonio
Beltran, Jose R.
2023
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202311079440
https://urn.fi/URN:NBN:fi:tuni-202311079440
Kuvaus
Peer reviewed
Tiivistelmä
In this paper, we present a new model for Direction of Arrival (DOA) estimation of sound sources based on an Icosahedral Convolutional Neural Network (CNN) applied over SRP-PHAT power maps computed from the signals received by a microphone array. This icosahedral CNN is equivariant to the 60 rotational symmetries of the icosahedron, which represent a good approximation of the continuous space of spherical rotations, and can be implemented using standard 2D convolutional layers, having a lower computational cost than most of the spherical CNNs. In addition, instead of using fully connected layers after the icosahedral convolutions, we propose a new soft-argmax function that can be seen as a differentiable version of the argmax function and allows us to solve the DOA estimation as a regression problem interpreting the output of the convolutional layers as a probability distribution. We prove that using models that fit the equivariances of the problem allows us to outperform other state-of-the-art models with a lower computational cost and more robustness, obtaining root mean square localization errors lower than 10∘ even in scenarios with a reverberation time T60 of 1.5s.
Kokoelmat
- TUNICRIS-julkaisut [19273]