Depth assisted composition of synthetic and real 3d scenes
Cortes Reina, Santiago (2015)
Cortes Reina, Santiago
2015
Master's Degree Programme in Information Technology
Tieto- ja sähkötekniikan tiedekunta - Faculty of Computing and Electrical Engineering
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2015-12-09
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201511251796
https://urn.fi/URN:NBN:fi:tty-201511251796
Tiivistelmä
In media production, previsualization is an important step. It allows the director and the production crew to see an estimate of the final product during the filmmaking process. This work focuses on a previsualization system for composite shots which involve real and virtual content. It shows the camera operator a correct perspective view of how the real objects in front of him look placed in a virtual space. The aim is to simplify the workflow, reduce production time and allow more direct control of the end result.
The real scene is shot with a 3D scene capture device, which combines an RGB color camera with time-of-flight depth camera. The device’s pose is tracked using a motion capture system. Depth-based segmentation is applied to remove the background and content outside the desired volume, the captured geometry is aligned with a stream from the RGB color camera and a dynamic point cloud of the remaining real scene contents is created. The virtual objects are then also transformed into the coordinate space of the tracked camera, and the resulting composite view is rendered accordingly. The prototype camera system is implemented as a self-contained unit with local processing.
A prototype system was constructed from a Microsoft Kinect v2, providing depth and color information of the real scene and a Microsoft Surface Pro 3 as a processing and display device. Both instruments were attached to a camera shoulder mount, with optical markers fixed to the body of the camera. The pose of the camera in 3D space is tracked with a Natural Point OptiTrack motion capture system, which streams the location information to the Surface device over a wireless 802.11n channel.
At its current state, the system is running at 15 frames per second with a resolution of 1024x768. Subjectively, the frame rate is already smooth enough for the operator to feel as if using a regular camera. Further improvements are targeted in the processing speed and the image quality provided by the system. The image suffers from some depth capture related artifacts which influence the depth segmentation, and therefore adaptive filtering methods based on edge-aware bilateral filtering have been investigated. The tested filtering has improved the quality significantly, while more effort has to be put in implementing the filtering in an efficient way.
The real scene is shot with a 3D scene capture device, which combines an RGB color camera with time-of-flight depth camera. The device’s pose is tracked using a motion capture system. Depth-based segmentation is applied to remove the background and content outside the desired volume, the captured geometry is aligned with a stream from the RGB color camera and a dynamic point cloud of the remaining real scene contents is created. The virtual objects are then also transformed into the coordinate space of the tracked camera, and the resulting composite view is rendered accordingly. The prototype camera system is implemented as a self-contained unit with local processing.
A prototype system was constructed from a Microsoft Kinect v2, providing depth and color information of the real scene and a Microsoft Surface Pro 3 as a processing and display device. Both instruments were attached to a camera shoulder mount, with optical markers fixed to the body of the camera. The pose of the camera in 3D space is tracked with a Natural Point OptiTrack motion capture system, which streams the location information to the Surface device over a wireless 802.11n channel.
At its current state, the system is running at 15 frames per second with a resolution of 1024x768. Subjectively, the frame rate is already smooth enough for the operator to feel as if using a regular camera. Further improvements are targeted in the processing speed and the image quality provided by the system. The image suffers from some depth capture related artifacts which influence the depth segmentation, and therefore adaptive filtering methods based on edge-aware bilateral filtering have been investigated. The tested filtering has improved the quality significantly, while more effort has to be put in implementing the filtering in an efficient way.