Robust Visual Perception and Decision Making for Autonomous Systems
Wang, Vivienne Huiling (2025)
Wang, Vivienne Huiling
Tampere University
2025
Tieto- ja sähkötekniikan tohtoriohjelma - Doctoral Programme in Computing and Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Väitöspäivä
2025-03-28
Julkaisun pysyvä osoite on
https://urn.fi/URN:ISBN:978-952-03-3859-6
https://urn.fi/URN:ISBN:978-952-03-3859-6
Tiivistelmä
Autonomous systems, embodying a synthesis of visual perception and decision-making capabilities, are poised to revolutionize a wide array of sectors, from transportation and manufacturing to healthcare. Visual perception, the ability to comprehend and interpret visual data to discern the environment, lays the bedrock for these systems to function effectively in diverse real-world scenarios. Decision making complements this by equipping these systems with the ability to select optimal actions based on visually perceived information.
The first part of this thesis concentrates on visual perception, specifically through the lens of video object segmentation. As a crucial facet of computer vision, video object segmentation empowers autonomous systems to distinguish and track objects within dynamic video streams. Despite considerable advancements in the field, certain challenges endure, particularly when it comes to robust and coherent segmentation of objects in complex, real-world scenarios. This work introduces three key innovations to address these challenges. We present an efficient graph transduction learning approach for improved primary video object segmentation. A semi-supervised adaptation technique is put forth to harness the power of pretrained deep convolutional neural networks in semantic video object segmentation. Lastly, we introduce a hierarchical graphical model, fusing both bottom-up and top-down cues with long-term object relations and spatiotemporal contexts for superior performance in semantic video object segmentation.
The latter half of this thesis shifts the spotlight to decision making, specifically delving into hierarchical reinforcement learning (HRL). Reinforcement learning, a fundamental paradigm in decision making, facilitates autonomous systems to learn from their interactions with the environment. However, its efficacy can be obstructed by issues such as non-stationarity in off-policy training. This non-stationarity can be attributed to the continuous adjustments in the policies at different levels of the hierarchy throughout the learning phase. To counter this, we propose a novel adversarially guided subgoal generation framework within the realm of HRL. This adversarial learning technique effectively alleviates shifts in data distribution from relabeled experiences to the current high-level policy behavior, resulting in enhanced learning efficiency and stability.
In sum, this thesis endeavors to push the frontiers of visual perception and decision making capabilities for autonomous systems. Through contributions in video object segmentation and hierarchical reinforcement learning, it sets the stage for the development of more robust and dependable autonomous systems, thereby paving the way for their wider and safer application in society.
The first part of this thesis concentrates on visual perception, specifically through the lens of video object segmentation. As a crucial facet of computer vision, video object segmentation empowers autonomous systems to distinguish and track objects within dynamic video streams. Despite considerable advancements in the field, certain challenges endure, particularly when it comes to robust and coherent segmentation of objects in complex, real-world scenarios. This work introduces three key innovations to address these challenges. We present an efficient graph transduction learning approach for improved primary video object segmentation. A semi-supervised adaptation technique is put forth to harness the power of pretrained deep convolutional neural networks in semantic video object segmentation. Lastly, we introduce a hierarchical graphical model, fusing both bottom-up and top-down cues with long-term object relations and spatiotemporal contexts for superior performance in semantic video object segmentation.
The latter half of this thesis shifts the spotlight to decision making, specifically delving into hierarchical reinforcement learning (HRL). Reinforcement learning, a fundamental paradigm in decision making, facilitates autonomous systems to learn from their interactions with the environment. However, its efficacy can be obstructed by issues such as non-stationarity in off-policy training. This non-stationarity can be attributed to the continuous adjustments in the policies at different levels of the hierarchy throughout the learning phase. To counter this, we propose a novel adversarially guided subgoal generation framework within the realm of HRL. This adversarial learning technique effectively alleviates shifts in data distribution from relabeled experiences to the current high-level policy behavior, resulting in enhanced learning efficiency and stability.
In sum, this thesis endeavors to push the frontiers of visual perception and decision making capabilities for autonomous systems. Through contributions in video object segmentation and hierarchical reinforcement learning, it sets the stage for the development of more robust and dependable autonomous systems, thereby paving the way for their wider and safer application in society.
Kokoelmat
- Väitöskirjat [4967]