Hyppää sisältöön
    • Suomeksi
    • In English
Trepo
  • Suomeksi
  • In English
  • Kirjaudu
Näytä viite 
  •   Etusivu
  • Trepo
  • Väitöskirjat
  • Näytä viite
  •   Etusivu
  • Trepo
  • Väitöskirjat
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

Binaural Audio for Multi-task Acoustic Scene Analysis

Krause, Daniel Aleksander (2025)

 
Avaa tiedosto
978-952-03-4164-0.pdf (4.475Mt)
Lataukset: 



Krause, Daniel Aleksander
Tampere University
2025

Tieto- ja sähkötekniikan tohtoriohjelma - Doctoral Programme in Computing and Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Väitöspäivä
2025-10-24
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:ISBN:978-952-03-4164-0
Tiivistelmä
The proliferation of intelligent systems and smart devices has intensified the need for machines to interpret and respond to complex real-world environments. Among key sensory modalities, auditory perception plays a crucial role in enabling systems to analyze the content of our surroundings, facilitating a wide range of applications from autonomous robotics to assistive technologies. This thesis advances the field of computational auditory scene analysis (CASA) by investigating data-driven methods for multi-task acoustic scene analysis using binaural audio. Binaural audio is chosen for it mimics the human perspective and offers a simple and flexible recording setup. Unlike microphone arrays that require careful geometric calibration and fixed configurations, binaural systems allow the receiver to change position and orientation freely, supporting dynamic perception and mobile sensing scenarios.

The main objective of this work is to exploit the spatial cues inherent in binaural recordings to enhance machine listening across several audio tasks: sound event detection, acoustic scene classification, direction-of-arrival estimation, and sound source distance estimation. These tasks are examined both individually and in various joint configurations, with the aim of exploring information shared across different tasks. The proposed solutions are based on deep neural networks and involve different model architectures to tackle the problem of concurrent learning of multiple audio tasks.

This thesis demonstrates that binaural audio features can significantly improve the performance of deep learning models, especially when applied to tasks performed jointly. A study is presented in which binaural features improve the efficiency of sound event detection and acoustic classification over standard monaural features. The topic of multi-task learning is further explored for simplified classification-based methods of direction-of-arrival and sound distance estimation, defined as direction and proximity estimation. Finally, a novel task of 3D sound event detection and localization is introduced, bringing together sound event detection, DOA estimation, and distance estimation into a unified setup, thus allowing for precise estimation of the positions of detected sound events.

Based on the results, this work identifies several inherent limitations of the binaural format, such as in resolving front-back ambiguities or tackling the cone of confusion effect. To address these challenges, listener motion cues are proposed to mimic human strategies for disambiguating source localization. Anew approach is proposed for integrating head rotation data to improve binaural localization. Finally, translation cues are used to represent the movement of the listener and to enhance sound distance and direction-of-arrival estimation, leading to significant performance gains. The results underscore the potential of dynamic auditory perception and movementaware systems in overcoming binaural limitations.

The experiments performed involved numerous novel acoustic scenarios, extending the state-of-the-art research on binaural acoustic scene analysis by covering a continuous range of sound source positions, diverse reverberant conditions, and scenarios involving a moving listener. To support this investigation, several new datasets were synthesized using carefully designed simulations that replicate realistic acoustic environments. The datasets were made publicly available to allow reproducibility and foster further research.
Kokoelmat
  • Väitöskirjat [5189]
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste
 

 

Selaa kokoelmaa

TekijätNimekkeetTiedekunta (2019 -)Tiedekunta (- 2018)Tutkinto-ohjelmat ja opintosuunnatAvainsanatJulkaisuajatKokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste