Object detection and sim-to-real 6D pose estimation
Sharma, Gaurang (2023)
Sharma, Gaurang
2023
Master's Programme in Computing Sciences
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2023-05-04
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202304264610
https://urn.fi/URN:NBN:fi:tuni-202304264610
Tiivistelmä
Deep Learning has led to significant advances in computer vision, making perception an important component in many fields such as robotics, medicine, agriculture, remote sensing, etc. Object detection has been a major part of computer vision research that has led to further enhancements like object pose, grasp, and depth estimation. However, even object detectors suffer from a lack of data, which requires a well-defined data pipeline that first labels and then augments data. Based on the conducted review, no available labeling tool supports the benchmark (COCO) export functionality for multi-label ground truth, and no augmentation library supports transformations for the combination of polygon segmentation, bounding boxes, and key points. Having determined the need for an updated data pipeline, in this project, a novel approach is presented that spans from labeling to augmentation and includes data visualization, manipulation, and cleaning. In addition, this work majorly focuses on the usage of object detectors in an industrial use case and further uses multitask learning to develop a state-of-the-art multitask architecture. This pipeline and the architecture are further utilized to infer industrial object pose in the world coordinate frame. Finally, after comparison among multiple object detectors and pose estimators, a multitask architecture with pose estimation methodology is considered better for the industrial use case.