Multi-Task Networks and Anomaly Detection in Computer Vision
Lagos Benitez, Juan Pablo (2025)
Lagos Benitez, Juan Pablo
Tampere University
2025
Tieto- ja sähkötekniikan tohtoriohjelma - Doctoral Programme in Computing and Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Väitöspäivä
2025-05-28
Julkaisun pysyvä osoite on
https://urn.fi/URN:ISBN:978-952-03-3950-0
https://urn.fi/URN:ISBN:978-952-03-3950-0
Tiivistelmä
In this dissertation, we address key challenges in computer vision, focusing on multitask learning, unstructured environments, and the use of heterogeneous datasets for image anomaly detection. We propose novel methods and datasets across four core studies, yielding quantitative improvements across tasks.
We first present a multi-task convolutional neural network (CNN) that jointly performs semantic segmentation and depth completion, demonstrating a significant improvement in performance compared to single-task networks. When evaluated on the Virtual KITTI 2 dataset, our approach achieved a notable increase in both depth and segmentation accuracy, underscoring the benefits of joint training.
Next, we extend the multi-task approach to panoptic segmentation and depth completion, again using Virtual KITTI 2. Our model processes RGB images and sparse depth maps to deliver dense depth maps, along with semantic, instance, and panoptic segmentation. Despite handling multiple tasks, the model maintained high accuracy without a significant increase in computational cost.
For real-world applications, we introduce the FinnWoodlands dataset, containing 4,226 manually annotated objects for instance, semantic, and panoptic segmentation, with 60.6% of the annotations corresponding to three tree species ("Spruce," "Birch," and "Pine"). We benchmarked three state-of-the-art models, revealing the challenges posed by unstructured forest environments and the need for more robust models for such scenarios.
Finally, we present two novel datasets, CARS-AD and ROADS-AD, for unsupervised anomaly detection (AD). These datasets introduce diverse anomalies across thousands of samples, with pixel-wise ground truth annotations. Our benchmarks highlight the limitations of existing AD models, with the best-performing methods, Csflow and U-Flow on CARS-AD and Reverse Distillation on ROADS-AD, showcasing the complexity of these new datasets.
Our results demonstrate the effectiveness of multi-task networks in holistic scene understanding, cost-effective data collection for complex environments, and the critical role of heterogeneous datasets in advancing image anomaly detection. This research paves the way for future work in both structured and unstructured settings, pushing the boundaries of state-of-the-art techniques.
We first present a multi-task convolutional neural network (CNN) that jointly performs semantic segmentation and depth completion, demonstrating a significant improvement in performance compared to single-task networks. When evaluated on the Virtual KITTI 2 dataset, our approach achieved a notable increase in both depth and segmentation accuracy, underscoring the benefits of joint training.
Next, we extend the multi-task approach to panoptic segmentation and depth completion, again using Virtual KITTI 2. Our model processes RGB images and sparse depth maps to deliver dense depth maps, along with semantic, instance, and panoptic segmentation. Despite handling multiple tasks, the model maintained high accuracy without a significant increase in computational cost.
For real-world applications, we introduce the FinnWoodlands dataset, containing 4,226 manually annotated objects for instance, semantic, and panoptic segmentation, with 60.6% of the annotations corresponding to three tree species ("Spruce," "Birch," and "Pine"). We benchmarked three state-of-the-art models, revealing the challenges posed by unstructured forest environments and the need for more robust models for such scenarios.
Finally, we present two novel datasets, CARS-AD and ROADS-AD, for unsupervised anomaly detection (AD). These datasets introduce diverse anomalies across thousands of samples, with pixel-wise ground truth annotations. Our benchmarks highlight the limitations of existing AD models, with the best-performing methods, Csflow and U-Flow on CARS-AD and Reverse Distillation on ROADS-AD, showcasing the complexity of these new datasets.
Our results demonstrate the effectiveness of multi-task networks in holistic scene understanding, cost-effective data collection for complex environments, and the critical role of heterogeneous datasets in advancing image anomaly detection. This research paves the way for future work in both structured and unstructured settings, pushing the boundaries of state-of-the-art techniques.
Kokoelmat
- Väitöskirjat [5015]