Human Key Point Detection Using Depth and Color Images
Hartikainen, Noora (2019)
Hartikainen, Noora
2019
Sähkötekniikan DI-ohjelma - Degree Programme in Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. Only for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2019-09-25
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-201909133289
https://urn.fi/URN:NBN:fi:tuni-201909133289
Tiivistelmä
In this thesis two human key point detectors are implemented. The detectors are intended to be utilized in a medical imaging device for the human body. A full body medical imaging device is a device used to capture medical images of the entire human body. The desired imaging area can be selected. Patient positioning is a step in a medical imaging sequence. In patient positioning the patient is placed to the correct position according to the medical imaging device.In a full body medical imaging device the patient is usually placed horizontally on a patient couch which is then driven to the correct position. The imaging process can be enhanced by automating manual patient positioning. If the patient’s pose and body parts are automatically identified from common color images or depth images, the medical imaging device could automatically drive to the desired position to begin a medical imaging sequence. Another application for using the key point detectors is to verify that the orientation and the positioning of the patient correspond to the imaging parameters of the selected medical imaging program. Lately deep learning has gained popularity amongst the computer vision community because of its success in computer vision applications. In this thesis, a deep learning based method was selected to detect human key points from images.
In this thesis, a neural network algorithm is used to detect thirteen human key points from color and depth images. The algorithm applied for the detection is a Fully Convolutional Network (FCN), namely You Only Look Once (YOLO). Two separate human key point detectors were trained; one to process color images and the other for depth images. The performance of the two models is estimated with Mean Average Precision (mAP). Promising results were obtained with the test data, from which it can perhaps be concluded that it is possible to identify the different body parts and ultimately the pose of the patient from color and depth images using YOLOv3 object detector. Mean Average Precision (mAP)@0.50 of the detector trained for depth images is 59.26 % whereas it is 95.68 % for the detector trained for color images. The detector based on color images performs significantly better than the detector which takes depth maps as an input. This is probably due to the fact that the model has been pretrained on a very large dataset consisting of three-channel color images. The pretrained feature extractor extracts features from color images more efficiently than from depth images.
Considering the small dataset used the results are promising. Both depth and color images have enough features which the model is able to extract and use in the detection task. A similar method could be utilized in the development of an automatic patient positioning system and in verifying that the patient is correctly positioned. In the future, the model could be further developed by improving the quality of the training data and expanding the diversity of the captured body types. In addition, the patient pose estimator could utilize temporal information in the estimation process. The correctness of the detected the body parts could be verified by fitting a generative model of a human body to the detections.
In this thesis, a neural network algorithm is used to detect thirteen human key points from color and depth images. The algorithm applied for the detection is a Fully Convolutional Network (FCN), namely You Only Look Once (YOLO). Two separate human key point detectors were trained; one to process color images and the other for depth images. The performance of the two models is estimated with Mean Average Precision (mAP). Promising results were obtained with the test data, from which it can perhaps be concluded that it is possible to identify the different body parts and ultimately the pose of the patient from color and depth images using YOLOv3 object detector. Mean Average Precision (mAP)@0.50 of the detector trained for depth images is 59.26 % whereas it is 95.68 % for the detector trained for color images. The detector based on color images performs significantly better than the detector which takes depth maps as an input. This is probably due to the fact that the model has been pretrained on a very large dataset consisting of three-channel color images. The pretrained feature extractor extracts features from color images more efficiently than from depth images.
Considering the small dataset used the results are promising. Both depth and color images have enough features which the model is able to extract and use in the detection task. A similar method could be utilized in the development of an automatic patient positioning system and in verifying that the patient is correctly positioned. In the future, the model could be further developed by improving the quality of the training data and expanding the diversity of the captured body types. In addition, the patient pose estimator could utilize temporal information in the estimation process. The correctness of the detected the body parts could be verified by fitting a generative model of a human body to the detections.