Visual metric-semantic 3D reconstruction
Raivio, Leevi (2021)
Raivio, Leevi
2021
Automaatiotekniikan DI-ohjelma - Master's Programme in Automation Engineering
Tekniikan ja luonnontieteiden tiedekunta - Faculty of Engineering and Natural Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2021-05-17
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202105175105
https://urn.fi/URN:NBN:fi:tuni-202105175105
Tiivistelmä
While people and animals understand their surroundings almost effortlessly, the problem is really hard to solve for machines. To understand their environment comprehensively, one needs to capture both spatial relations and semantic meaning from their surroundings and incorporate them into a coherent model of the environment. To apply this knowledge, one also needs to be able to relate new information they sense to the model, and update it accordingly. Although there are no complete answers to the problem, parts of it can already be solved and research on related subjects seems to only accelerate. With recent advances in relevant research areas, machines are able to generate increasingly general representations of the environment.
This master's thesis studies metric-semantic reconstruction from the perspective of visual data, using dense reconstruction of indoor environments as an example use-case. Related background information and theory are studied, and a baseline end-to-end three-dimensional metric-semantic reconstruction system is designed and evaluated. The purpose is to create a platform to base future research on and to find interesting topics to study.
Simultaneous localisation and mapping (SLAM) methods are used in this work to track the device pose and make reconstruction also possible in environments where localisation infrastructure or pre-existing maps are not available. On the other hand, panoptic segmentation is applied to incorporate rich semantic meaning into metric reconstructions. A view-based segmentation approach is chosen to render the system more robust to uncertainties related to visual data. The RTAB-Map library is applied for globally consistent three-dimensional metric SLAM, while EfficientPS is chosen as the panoptic segmentation approach. Individual components of the system are evaluated quantitatively, after which end-to-end results are generated with data captured from two indoor campus environments and analysed qualitatively.
The essential building blocks of a metric-semantic reconstruction system are specified and choices related to each are compared. Based on results, possible performance bottlenecks are identified. Improvements to existing methods are discussed, and possible future research topics are assessed as well. Although the approach is quite simple, and in some aspects can not match the most recent works on the field, it provides a strong baseline for future research.
This master's thesis studies metric-semantic reconstruction from the perspective of visual data, using dense reconstruction of indoor environments as an example use-case. Related background information and theory are studied, and a baseline end-to-end three-dimensional metric-semantic reconstruction system is designed and evaluated. The purpose is to create a platform to base future research on and to find interesting topics to study.
Simultaneous localisation and mapping (SLAM) methods are used in this work to track the device pose and make reconstruction also possible in environments where localisation infrastructure or pre-existing maps are not available. On the other hand, panoptic segmentation is applied to incorporate rich semantic meaning into metric reconstructions. A view-based segmentation approach is chosen to render the system more robust to uncertainties related to visual data. The RTAB-Map library is applied for globally consistent three-dimensional metric SLAM, while EfficientPS is chosen as the panoptic segmentation approach. Individual components of the system are evaluated quantitatively, after which end-to-end results are generated with data captured from two indoor campus environments and analysed qualitatively.
The essential building blocks of a metric-semantic reconstruction system are specified and choices related to each are compared. Based on results, possible performance bottlenecks are identified. Improvements to existing methods are discussed, and possible future research topics are assessed as well. Although the approach is quite simple, and in some aspects can not match the most recent works on the field, it provides a strong baseline for future research.