Real-Time Video Denoising in GStreamer Pipeline
Sjögren, Simo (2024)
Sjögren, Simo
2024
Automaatiotekniikan DI-ohjelma - Master's Programme in Automation Engineering
Tekniikan ja luonnontieteiden tiedekunta - Faculty of Engineering and Natural Sciences
Hyväksymispäivämäärä
2024-12-19
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2024121911446
https://urn.fi/URN:NBN:fi:tuni-2024121911446
Tiivistelmä
This work investigates solutions for achieving efficient real-time denoising within a commercial low-latency 360° video streaming platform called Nokia RXRM. The video processing pipeline of this platform leverages GStreamer open source multimedia framework, where glshaders are utilized to render viewport regions from a 360° environment in real time. This environment has a strong presence of complex and structured high ISO noise.
Integrating video denoising algorithm as a secondary feature alongside the core processing pipeline must minimize computing resource usage and completely avoid increasing latency. This work explains why the state-of-the-art deep learning and temporal domain methods, such as BSVD and RTE-VD, are excluded from the consideration, based on clear evidence. This narrowed the focus in the thesis to easily parallelizable single frame dependent methods.
The single frame dependent methods can be classified as either transform-domain or spatial-domain methods. For these two, a detailed quality comparison was conducted from two perspectives: compression efficiency using modified SI/TI and referenceless objective quality measurements with VQMTK. Wavelet-based transform domain methods excelled in compression efficiency without compromising subjectively observable features, highlighting the structural nature of the noise. However, spatial domain methods performed better in objective quality evaluations, since transform-domain methods tended to introduce more artifacts. Despite this, wavelet-based methods were chosen for the implementation using glshaders within the GStreamer environment, due to their better compression capabilities for the underlying noise.
Implementing a transform domain based discrete wavelet transform across multiple resolution levels within the precompiled GStreamer pipeline using glshaders is challenging. The work presented in the thesis demonstrates that the denoising algorithm can only be partially implemented in this environment, focusing on one color channel instead of three. From a performance standpoint, this work demonstrates that the one channel implementation is viable and leaves substantial headroom for increased computational complexity in an expanded approach. This work further highlights, why extending this implementation to all channels is unfeasible with the GStreamer's current precompiled glshader elements. Achieving efficient real-time denoising in the GStreamer environment would require alternative approaches, such as using GStreamer probe elements combined with more flexible parallelization frameworks.
Integrating video denoising algorithm as a secondary feature alongside the core processing pipeline must minimize computing resource usage and completely avoid increasing latency. This work explains why the state-of-the-art deep learning and temporal domain methods, such as BSVD and RTE-VD, are excluded from the consideration, based on clear evidence. This narrowed the focus in the thesis to easily parallelizable single frame dependent methods.
The single frame dependent methods can be classified as either transform-domain or spatial-domain methods. For these two, a detailed quality comparison was conducted from two perspectives: compression efficiency using modified SI/TI and referenceless objective quality measurements with VQMTK. Wavelet-based transform domain methods excelled in compression efficiency without compromising subjectively observable features, highlighting the structural nature of the noise. However, spatial domain methods performed better in objective quality evaluations, since transform-domain methods tended to introduce more artifacts. Despite this, wavelet-based methods were chosen for the implementation using glshaders within the GStreamer environment, due to their better compression capabilities for the underlying noise.
Implementing a transform domain based discrete wavelet transform across multiple resolution levels within the precompiled GStreamer pipeline using glshaders is challenging. The work presented in the thesis demonstrates that the denoising algorithm can only be partially implemented in this environment, focusing on one color channel instead of three. From a performance standpoint, this work demonstrates that the one channel implementation is viable and leaves substantial headroom for increased computational complexity in an expanded approach. This work further highlights, why extending this implementation to all channels is unfeasible with the GStreamer's current precompiled glshader elements. Achieving efficient real-time denoising in the GStreamer environment would require alternative approaches, such as using GStreamer probe elements combined with more flexible parallelization frameworks.