Simultaneous Localization and Mapping with Apple ARKit
Kaarre, Christian (2021)
Kaarre, Christian
2021
Tieto- ja sähkötekniikan kandidaattiohjelma - Bachelor's Programme in Computing and Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. Only for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2021-02-25
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202011077843
https://urn.fi/URN:NBN:fi:tuni-202011077843
Tiivistelmä
Simultaneous Localization and Mapping (SLAM) methods aim to map the environment with a moving sensor while keeping track of the sensor’s location within the map. Many types of SLAM systems have been proposed based on different sensors, one being visual SLAM utilizing visual information obtained by a camera. The objectives of this thesis are to research the main stages and limitations of the visual SLAM methods and to examine Apple ARKit API’s functionality and applications regarding SLAM methods. The thesis is divided into research of the visual SLAM methods, research of Apple ARKit’s functionality and experiments with ARKit.
The first part of the thesis examines the essential components of a feature-based monocular SLAM. The method is based on estimating the camera’s relative rotation and translation based on images taken from the environment. The main stages are extracting and matching features, estimating the camera’s extrinsic parameters based on the features, triangulation to obtain depth from the environment and loop closure. Loop closure is a stage where the camera returns to a previously visited location and results in global optimization being performed to the created map. The performance of visual SLAM methods decreases in a dynamic or repetitive environment, environment with a low number of features and when external disturbance affects the sensor. The accuracy of estimating the location can be increased by adding supporting sensors as a part of the SLAM implementation, e.g. inertial measurement unit (IMU). Sensor data fusion between pose estimated from visual data and IMU can be performed for example by extended Kalman filters.
In the second part of the thesis, Apple ARKit API and the functionality of the ARWorldTracking are presented. The experiments with apps utilizing ARKit are presented in the third part of the thesis. The first app focuses on ARKit’s visual inertial odometry and position estimation. Based on the research, ARKit had difficulties with local loop closure in a repetitive environment and the estimation of vertical position began to drift in large environments with external disturbance. The second part of the experiments examines the possibilities of scene reconstruction with ARKit. Two possibilities for reconstruction are presented: obtaining the images and camera parameters with ARKit and visualizing the features and their positions in an augmented reality scene. Images and camera parameters could be utilized in scene reconstruction with Structure from Motion. A sparse model of an object can be obtained from the features and their positions recognized by ARKit.
The first part of the thesis examines the essential components of a feature-based monocular SLAM. The method is based on estimating the camera’s relative rotation and translation based on images taken from the environment. The main stages are extracting and matching features, estimating the camera’s extrinsic parameters based on the features, triangulation to obtain depth from the environment and loop closure. Loop closure is a stage where the camera returns to a previously visited location and results in global optimization being performed to the created map. The performance of visual SLAM methods decreases in a dynamic or repetitive environment, environment with a low number of features and when external disturbance affects the sensor. The accuracy of estimating the location can be increased by adding supporting sensors as a part of the SLAM implementation, e.g. inertial measurement unit (IMU). Sensor data fusion between pose estimated from visual data and IMU can be performed for example by extended Kalman filters.
In the second part of the thesis, Apple ARKit API and the functionality of the ARWorldTracking are presented. The experiments with apps utilizing ARKit are presented in the third part of the thesis. The first app focuses on ARKit’s visual inertial odometry and position estimation. Based on the research, ARKit had difficulties with local loop closure in a repetitive environment and the estimation of vertical position began to drift in large environments with external disturbance. The second part of the experiments examines the possibilities of scene reconstruction with ARKit. Two possibilities for reconstruction are presented: obtaining the images and camera parameters with ARKit and visualizing the features and their positions in an augmented reality scene. Images and camera parameters could be utilized in scene reconstruction with Structure from Motion. A sparse model of an object can be obtained from the features and their positions recognized by ARKit.