Robust Direction Estimation with Convolutional Neural Networks-based Steered Response Power
Pertilä, Pasi; Cakir, Emre (2017)
Pertilä, Pasi
Cakir, Emre
IEEE
2017
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202002262384
https://urn.fi/URN:NBN:fi:tuni-202002262384
Kuvaus
Peer reviewed
Tiivistelmä
The steered response power (SRP) methods can be used to build a map of sound direction likelihood. In the presence of interference and reverberation, the map will exhibit multiple peaks with heights related to the corresponding sound's spectral content. Often in realistic use cases, the target of interest (such as speech) can exhibit a lower peak compared to an interference source. This will corrupt any direction dependent method, such as beamforming. Regression has been used to predict time-frequency (TF) regions corrupted by reverberation, and static broadband noise can be efficiently estimated for TF points. TF regions dominated by noise or reverberation can then be de-emphasized to obtain more reliable source direction estimates. In this work, we propose the use of convolutional neural networks (CNNs) for the prediction of a TF mask for emphasizing the direct path speech signal in time-varying interference. SRP with phase transform (SRP-PHAT) combined with the CNN-based masking is shown to be capable of reducing the impact of time-varying interference for speaker direction estimation using real speech sources in reverberation.
Kokoelmat
- TUNICRIS-julkaisut [19288]