Hyppää sisältöön
    • Suomeksi
    • In English
Trepo
  • Suomeksi
  • In English
  • Kirjaudu
Näytä viite 
  •   Etusivu
  • Trepo
  • Väitöskirjat
  • Näytä viite
  •   Etusivu
  • Trepo
  • Väitöskirjat
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

Headset Removal for Gaze Contact in XR Applications

Ghorbani Lohesara, Fatemeh (2025)

 
Avaa tiedosto
978-952-03-4010-0.pdf (43.83Mt)
Lataukset: 



Ghorbani Lohesara, Fatemeh
Omakustanne/Self-published
2025

Doctoral Programme in Plenoptic Imaging
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Väitöspäivä
2025-07-18
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:ISBN:978-952-03-4010-0

Kuvaus

Cotutelle-yhteisväitöskirja
Tiivistelmä
The increasing integration of immersive technologies, particularly in Extended Reality (XR) applications, is transforming the way humans interact in virtual environments. However, achieving truly immersive experiences in XR, especially in social applications such as virtual meetings, teleconferencing, and collaborative environments, presents significant technical challenges. Among these, the occlusion of facial features by head-mounted displays (HMDs) is one of the most pressing obstacles. HMDs, while essential for XR environments, obstruct key facial regions, including the eyes and upper face, which are crucial for conveying emotions and facilitating natural communication in virtual spaces. This occlusion reduces the realism and effectiveness of social interactions in XR, where facial expressions and eye contact play a crucial role.

To address these challenges, this thesis contributes to the field in multiple ways. First, a comprehensive multimodal dataset, HEADSET, is introduced, capturing human facial expressions, body movements, and interactions using an advanced volumetric capture studio. The dataset includes diverse volumetric data from 27 participants displaying a range of emotions, as well as recordings from 11 participants wearing HMDs. The dataset integrates textured meshes, point clouds, multi-view RGB-D data, and light field (LF) data, making it an invaluable resource for researchers developing XR algorithms focused on facial expression recognition, 3D reconstruction, and volumetric video processing.

Building on this dataset, the core contribution of this thesis lies in the development of a new inpainting method for HMD removal without requiring additional hardware or complex calibrations to the XR setup. This is achieved through the introduction of an expression-aware video inpainting network, EVI-HRnet, based on generative adversarial networks (GANs). The network is designed to reconstruct occluded facial regions by leveraging both facial landmarks and a single occlusion-free reference image of the user. Unlike traditional methods that rely on external depth sensors, the proposed framework operates solely on RGB video input, making it lightweight and practical for real-world XR applications.

The EVI-HRnet framework ensures that the reconstructed facial regions maintain both temporal coherence across video frames and the emotional consistency of the subject’s expressions. A novel facial expression recognition (FER) loss function is also introduced to enhance the preservation of subtle emotional cues in the inpainted frames, thus enabling a more immersive and emotionally expressive user experience. This method is particularly advantageous for teleconferencing and other collaborative XR applications, where realistic facial features are crucial for effective communication. Beyond addressing HMD removal, we further evaluate the effectiveness of the proposed inpainting method for the broader task of facial video occlusion removal across various scenarios.

Furthermore, this thesis extends the RGB inpainting framework by addressing the joint challenge of HMD occlusion removal and 3D facial reconstruction. In addition to the EVI-HRnet model for video inpainting, the research integrates 3D Morphable Models (3DMM) with a dense facial landmark backbone, enabling the reconstruction of detailed 3D facial geometry from inpainted RGB video frames. The proposed pipeline combines GAN-based inpainting with state-of-the-art 3D reconstruction techniques, allowing for photorealistic 3D face modeling from a single RGB view. By utilizing dense 3D landmarks, the framework optimizes the alignment between the inpainted 2D facial regions and the 3D facial geometry, achieving notable accuracy and visual fidelity in the reconstructed models. This integrated approach enhances the realism of social XR applications, where participants’ faces are not only visible but also represented as 3D models, contributing to a more immersive experience.

In summary, this thesis makes considerable contributions to the field of XR by introducing a new multimodal dataset and an advanced framework for HMD occlusion removal. By addressing the critical issue of occlusion in XR environments without the need for additional hardware, this research extends the usability and accessibility of teleconferencing, virtual collaboration, and social XR applications.
Kokoelmat
  • Väitöskirjat [5229]
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste
 

 

Selaa kokoelmaa

TekijätNimekkeetTiedekunta (2019 -)Tiedekunta (- 2018)Tutkinto-ohjelmat ja opintosuunnatAvainsanatJulkaisuajatKokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste