Computer vision methods for augmented reality
Kaatrasalo, Valtteri (2022)
Kaatrasalo, Valtteri
2022
Teknis-luonnontieteellinen DI-ohjelma - Master's Programme in Science and Engineering
Tekniikan ja luonnontieteiden tiedekunta - Faculty of Engineering and Natural Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2022-04-04
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202204042972
https://urn.fi/URN:NBN:fi:tuni-202204042972
Tiivistelmä
Recent advances in computational capacity and computer vision methods have brought augmented reality (AR) applications to practically all modern mobile devices with a camera and a low-cost inertial measurement unit (IMU). A key feature of these AR systems is the ability to track the location and orientation of the mobile device in real-time. Visual inertial odometry (VIO) that fuses the visual data obtained from the captured images with IMU measurements has proven to be the most practical approach for sequentially estimating the device pose. The reasoning is that the camera and the IMU well complement each other; the camera enables accurate long-term tracking whereas the IMU provides robustness to visually challenging areas, temporary occlusions and motion blur. A related technique, simultaneous localization and mapping (SLAM), estimates the device pose and builds a map of the unknown environment at the same time. The SLAM map can be used to minimize VIO drift that is inevitable due to its sequential nature. In addition, the SLAM map can be densely reconstructed, and the resulting 3D model has many applications in AR and other applications.
A limitation with typical mobile AR games is that they are confined to small areas, and typically involve only a single player. In this thesis, a SLAM pipeline is devised that enables creating large-scale and multiplayer AR applications, that can be deployed to new environments using only a mobile phone. The SLAM system is then integrated to both a 3D mapping/reconstruction application and a multiplayer AR game. To showcase the results, a ~1000 m2 building at Tampere University is mapped with the 3D reconstruction application. In addition, this thesis gives a comprehensive introduction to computer vision topics that are needed to understand the VIO and SLAM problems.
The 3D point cloud given by the 3D reconstruction application is evaluated both qualitatively and quantitively. The results show that the point cloud is not distorted and thus well captures the true structure of the building. The point cloud also contains many small details, and is visually adequate for many applications. The localization accuracy of the SLAM system is also evaluated quantitively, and deemed to be largely sufficient for AR games.
A limitation with typical mobile AR games is that they are confined to small areas, and typically involve only a single player. In this thesis, a SLAM pipeline is devised that enables creating large-scale and multiplayer AR applications, that can be deployed to new environments using only a mobile phone. The SLAM system is then integrated to both a 3D mapping/reconstruction application and a multiplayer AR game. To showcase the results, a ~1000 m2 building at Tampere University is mapped with the 3D reconstruction application. In addition, this thesis gives a comprehensive introduction to computer vision topics that are needed to understand the VIO and SLAM problems.
The 3D point cloud given by the 3D reconstruction application is evaluated both qualitatively and quantitively. The results show that the point cloud is not distorted and thus well captures the true structure of the building. The point cloud also contains many small details, and is visually adequate for many applications. The localization accuracy of the SLAM system is also evaluated quantitively, and deemed to be largely sufficient for AR games.