Efficient Video Compression and Streaming Solutions for Immersive Virtual Reality
Zare, Alireza (2026)
Zare, Alireza
Tampere University
2026
Tieto- ja sähkötekniikan tohtoriohjelma - Doctoral Programme in Computing and Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Väitöspäivä
2026-03-27
Julkaisun pysyvä osoite on
https://urn.fi/URN:ISBN:978-952-03-4463-4
https://urn.fi/URN:ISBN:978-952-03-4463-4
Tiivistelmä
Delivering high-fidelity 360° video is essential for enabling immersive virtual reality (VR) experiences. However, conventional streaming methods face significant challenges in this domain, including the high bitrate demands of 360° content, latency constraints, and the limited decoding capabilities of many consumer devices. This thesis addresses these challenges by proposing innovative video compression and streaming solutions to efficiently support the delivery of immersive 360° video.
Modern VR systems utilize head-mounted displays (HMDs) with pose-tracking capabilities, allowing users to navigate within virtual environments. When viewing 360° video through an HMD, the content displayed is limited to the field of view (FoV) of the HMD, which corresponds to the user’s current viewing orientation. Streaming the entire 360° content at the highest fidelity using conventional methods would lead to excessive network bandwidth consumption and increased computational load on end-user devices. To address this, viewport-adaptive streaming (VAS) techniques partition a video picture into regions and prioritize high-fidelity streaming for the user’s current viewing orientation while reducing fidelity for out-of-view regions, optimizing bandwidth and computational efficiency.
This thesis develops and evaluates various VAS techniques for streaming monoscopic and stereoscopic video content, enhancing the visual quality of VR experiences. These approaches are implemented using the High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC) standards. In HEVC, motionconstrained tile sets (MCTS) are developed to partition a video sequence into isolated tile sets within the compress domain for efficient VAS. The evaluation demonstrates that MCTS-based VAS can reduce streaming bitrate by several tens of percent compared to conventional viewport-independent streaming, depending on tile granularity and configuration. However, this reduction comes with a slight compromise in storage efficiency due to the compression overhead introduced by MCTS. Furthermore, the thesis proposes a constrained inter-view prediction (CIVP) technique based on HEVC to enable the decoding of stereoscopic VR video content using a single-layer standard HEVC decoder. This capability is of practical importance, as optimized multiview HEVC decoder implementations are far less common in the consumer market compared to single-layer HEVC decoders.
In VVC, the newly introduced subpicture partitioning tool serves a purpose similar to that of MCTS but with several enhancements. Additionally, VVC incorporate advanced coding tools such as the Adaptive Loop Filter (ALF), whose use is restricted in subpicture-based VAS scenarios. This thesis presents a novel method to enable the use of ALF within subpicture-based VAS while ensuring compliance with the VVC standard. Experimental results show that ALF can be applied in a constrained way while preserving most of its performance advantages. The proposed technique has been integrated into the VVC Test Model (VTM) software.
Recognizing the 4K decoding limitations of most end-user devices, the thesis introduces mixed-resolution packing strategies for 6K and 8K video content delivery. These strategies deliver high-resolution VR video content while adhering to 4K decoding constraints, providing an optimal balance between visual quality and device compatibility. The packing schemes ensure the viewer’s FoV is rendered in high resolution while the remaining regions displayed at lower resolutions. The proposed 6K- and 8K-effective packing layouts reduce the decoded resolution of 360° video to 47% and 51% of the source content, respectively, and achieve average bitrate savings of 31% and 35% compared to the alternative 4K mixed-quality VAS approach. The 6K-effective packing is incorporated in OMAF Annex D and is also recommended in the VR Industry Forum Guidelines.
To evaluate the quality of experience (QoE) in VR applications, this thesis proposes a novel assessment methodology called Zonal-Cubic PSNR. This method enables a comprehensive analysis of streaming quality across various VR viewing scenarios, encompassing both stable and transitional conditions. By bridging the gap between objective and subjective evaluations, the metric provides valuable insights for designing VR streaming systems optimized for specific application requirements.
Together, these contributions advance the efficiency, quality, and accessibility of VR video streaming technologies, supporting the development of more immersive and resource-efficient VR experiences.
Modern VR systems utilize head-mounted displays (HMDs) with pose-tracking capabilities, allowing users to navigate within virtual environments. When viewing 360° video through an HMD, the content displayed is limited to the field of view (FoV) of the HMD, which corresponds to the user’s current viewing orientation. Streaming the entire 360° content at the highest fidelity using conventional methods would lead to excessive network bandwidth consumption and increased computational load on end-user devices. To address this, viewport-adaptive streaming (VAS) techniques partition a video picture into regions and prioritize high-fidelity streaming for the user’s current viewing orientation while reducing fidelity for out-of-view regions, optimizing bandwidth and computational efficiency.
This thesis develops and evaluates various VAS techniques for streaming monoscopic and stereoscopic video content, enhancing the visual quality of VR experiences. These approaches are implemented using the High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC) standards. In HEVC, motionconstrained tile sets (MCTS) are developed to partition a video sequence into isolated tile sets within the compress domain for efficient VAS. The evaluation demonstrates that MCTS-based VAS can reduce streaming bitrate by several tens of percent compared to conventional viewport-independent streaming, depending on tile granularity and configuration. However, this reduction comes with a slight compromise in storage efficiency due to the compression overhead introduced by MCTS. Furthermore, the thesis proposes a constrained inter-view prediction (CIVP) technique based on HEVC to enable the decoding of stereoscopic VR video content using a single-layer standard HEVC decoder. This capability is of practical importance, as optimized multiview HEVC decoder implementations are far less common in the consumer market compared to single-layer HEVC decoders.
In VVC, the newly introduced subpicture partitioning tool serves a purpose similar to that of MCTS but with several enhancements. Additionally, VVC incorporate advanced coding tools such as the Adaptive Loop Filter (ALF), whose use is restricted in subpicture-based VAS scenarios. This thesis presents a novel method to enable the use of ALF within subpicture-based VAS while ensuring compliance with the VVC standard. Experimental results show that ALF can be applied in a constrained way while preserving most of its performance advantages. The proposed technique has been integrated into the VVC Test Model (VTM) software.
Recognizing the 4K decoding limitations of most end-user devices, the thesis introduces mixed-resolution packing strategies for 6K and 8K video content delivery. These strategies deliver high-resolution VR video content while adhering to 4K decoding constraints, providing an optimal balance between visual quality and device compatibility. The packing schemes ensure the viewer’s FoV is rendered in high resolution while the remaining regions displayed at lower resolutions. The proposed 6K- and 8K-effective packing layouts reduce the decoded resolution of 360° video to 47% and 51% of the source content, respectively, and achieve average bitrate savings of 31% and 35% compared to the alternative 4K mixed-quality VAS approach. The 6K-effective packing is incorporated in OMAF Annex D and is also recommended in the VR Industry Forum Guidelines.
To evaluate the quality of experience (QoE) in VR applications, this thesis proposes a novel assessment methodology called Zonal-Cubic PSNR. This method enables a comprehensive analysis of streaming quality across various VR viewing scenarios, encompassing both stable and transitional conditions. By bridging the gap between objective and subjective evaluations, the metric provides valuable insights for designing VR streaming systems optimized for specific application requirements.
Together, these contributions advance the efficiency, quality, and accessibility of VR video streaming technologies, supporting the development of more immersive and resource-efficient VR experiences.
Kokoelmat
- Väitöskirjat [5298]
