Modeling environment using multi-view stereo
Mäkinen, Tatu (2019)
Mäkinen, Tatu
2019
Tietotekniikan DI-ohjelma (Degree Programme in Information Technology)
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2019-10-11
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-201906212170
https://urn.fi/URN:NBN:fi:tuni-201906212170
Tiivistelmä
In this work, we study the potential of a two-camera system in building an understanding of the environment. We investigate, if stereo camera as the sole sensor can be trusted in real time environment analysis and modeling to enable movement and interaction in a general setting.
We propose a complete pipeline from the sensor setup to the final environment model, evaluate currently available algorithms for each step, and make our own implementation of the pipeline. To assess real world performance, we record our own stereo dataset in a laboratory environment in good lighting conditions. The dataset contains stereo recordings using different camera angles concerning the movement, and ground truth for the environment model and the camera trajectory recorded with external sensors.
The steps of our proposed pipeline are as follows. 1) We calibrate two cameras using de facto method to form the stereo camera system. 2) We calculate depth from the stereo images by finding dense correspondences using semi global block matching and compare results to a recent data driven convolutional neural network algorithm. 3) We estimate camera trajectory using temporal feature tracking. 4) We form a global point cloud from the depth maps and the camera poses and analyze drivability in indoors and outdoors environments by fitting a plane or a spline model, respectively, to the global cloud. 5) We segment objects based on connectivity in the drivability model and mesh rough object models on top of the segmented clouds. 6) We refine the object models by picking keyframes containing the object, re-estimating camera poses using structure from motion, and building an accurate dense cloud using multi-view stereo. We use a patch-based algorithm that optimizes the photo consistency of the patches in the visible cameras.
We conclude that with current state of the art algorithms, a stereo camera system is capable of reliably estimating drivability in real time and can be used as the sole sensor to enable autonomous movement. Building accurate object models for interaction purposes is more challenging and requires substantial view coverage and computation with the current multi-view algorithms.
Our pipeline has limitations in long-term modeling: drift accumulates, which can be dealt with by implementing loop closure, and using external information such as GPS. Data wise, we inefficiently conserve complete information, while storing compressed presentations such as octrees or the built model can be considered. Finally, environments with insufficient texture and lighting are problematic for camera-based systems and require complementary solutions.
We propose a complete pipeline from the sensor setup to the final environment model, evaluate currently available algorithms for each step, and make our own implementation of the pipeline. To assess real world performance, we record our own stereo dataset in a laboratory environment in good lighting conditions. The dataset contains stereo recordings using different camera angles concerning the movement, and ground truth for the environment model and the camera trajectory recorded with external sensors.
The steps of our proposed pipeline are as follows. 1) We calibrate two cameras using de facto method to form the stereo camera system. 2) We calculate depth from the stereo images by finding dense correspondences using semi global block matching and compare results to a recent data driven convolutional neural network algorithm. 3) We estimate camera trajectory using temporal feature tracking. 4) We form a global point cloud from the depth maps and the camera poses and analyze drivability in indoors and outdoors environments by fitting a plane or a spline model, respectively, to the global cloud. 5) We segment objects based on connectivity in the drivability model and mesh rough object models on top of the segmented clouds. 6) We refine the object models by picking keyframes containing the object, re-estimating camera poses using structure from motion, and building an accurate dense cloud using multi-view stereo. We use a patch-based algorithm that optimizes the photo consistency of the patches in the visible cameras.
We conclude that with current state of the art algorithms, a stereo camera system is capable of reliably estimating drivability in real time and can be used as the sole sensor to enable autonomous movement. Building accurate object models for interaction purposes is more challenging and requires substantial view coverage and computation with the current multi-view algorithms.
Our pipeline has limitations in long-term modeling: drift accumulates, which can be dealt with by implementing loop closure, and using external information such as GPS. Data wise, we inefficiently conserve complete information, while storing compressed presentations such as octrees or the built model can be considered. Finally, environments with insufficient texture and lighting are problematic for camera-based systems and require complementary solutions.