Smart Saliency-Based Object Tracking Techniques for Low-Cost Video Communication
Hoang, Minh (2024)
Hoang, Minh
2024
Master's Programme in Computing Sciences and Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-12-02
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2024112710566
https://urn.fi/URN:NBN:fi:tuni-2024112710566
Tiivistelmä
With the rapid spread of technology, video has become the dominant media form. As viewers demand higher image quality, advancements in video resolutions like 4K and 8K, along with improvements in visual quality like higher color bit depth and high dynamic range (HDR), have emerged. However, these enhancements require higher bitrates and more bandwidth for a seamless streaming experience. Video coding addresses this problem efficiently by compressing data through pruning redundant information, reducing the required bitrate while maintaining visual quality. Advanced video coding standards and techniques typically apply uniform compression across video frames, resulting in lower bitrates that also often reduce the quality of experience (QoE).
Saliency-guided video coding is an approach that enhances video coding efficiency by applying variable compression based on the saliency of different frame areas. Areas that are considered regions of interest (ROIs) are preserved with higher quality in the output. This approach benefits reducing bitrate while maintaining the same level of QoE for human consumption or enhancing video coding for machine efficiency. However, identifying ROIs is computationally expensive. Conventionally, machine-learning-based object detection algorithms are often applied frame-by-frame, which substantially increases computational complexity and energy consumption.
This thesis introduces a novel solution that significantly reduces the computational complexity and energy consumption of the saliency-guided video coding approach. The computational complexity of the proposed method is kept low by employing a machine-learning-based ROI tracking method that uses motion-vector data generated during video encoding to estimate the movements of ROIs. Experimental results show that the proposed ROI tracking method achieves an impressive 84-94\% accuracy and introduces energy savings of 86-95\% compared to the frame-by-frame detecting approach. Furthermore, integrating this ROI tracking method into the saliency-guided video coding framework reduces CPU energy consumption by 79-82\% with only less than 5\% of video coding degradation. These findings indicate the potential of this solution for low-cost and low-power streaming media applications.
Saliency-guided video coding is an approach that enhances video coding efficiency by applying variable compression based on the saliency of different frame areas. Areas that are considered regions of interest (ROIs) are preserved with higher quality in the output. This approach benefits reducing bitrate while maintaining the same level of QoE for human consumption or enhancing video coding for machine efficiency. However, identifying ROIs is computationally expensive. Conventionally, machine-learning-based object detection algorithms are often applied frame-by-frame, which substantially increases computational complexity and energy consumption.
This thesis introduces a novel solution that significantly reduces the computational complexity and energy consumption of the saliency-guided video coding approach. The computational complexity of the proposed method is kept low by employing a machine-learning-based ROI tracking method that uses motion-vector data generated during video encoding to estimate the movements of ROIs. Experimental results show that the proposed ROI tracking method achieves an impressive 84-94\% accuracy and introduces energy savings of 86-95\% compared to the frame-by-frame detecting approach. Furthermore, integrating this ROI tracking method into the saliency-guided video coding framework reduces CPU energy consumption by 79-82\% with only less than 5\% of video coding degradation. These findings indicate the potential of this solution for low-cost and low-power streaming media applications.