Human Keypoint Detection in Underground Mining Environment
Juntti, Raafael (2023)
Juntti, Raafael
2023
Tietotekniikan DI-ohjelma - Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2023-05-22
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202305165823
https://urn.fi/URN:NBN:fi:tuni-202305165823
Tiivistelmä
Human pose estimation (HPE) is a widely researched field of computer vision. General HPE methods, which predict 2D keypoints representing joints of the human body, are useful for developing solutions for more specific pose-related tasks such as activity detection or motion capture. In recent years, convolutional neural network based approaches have been dominating public HPE benchmarks in single-person and multi-person estimation tasks.
There has been interest in applying HPE methods in the mining industry, which is already seeing an increasing amount of advanced computing and automation deployed in working environments. The underground working environments of industrial mines has many challenges for computer vision, most notably difficult lighting conditions. Additionally, when applying deep-learning solutions pre-trained on generic data, they might struggle with the characteristic visual features of mining operations, which differ greatly from many more common imaging scenarios.
In this thesis work, we investigate existing HPE solutions and if they could be used in a challenging underground working environment. To this end, 13 different pre-trained multi-person human pose estimation models are compared when used on test data collected from an industrial mine. Additionally, one of these models, Detectron 2's Keypoint R-CNN R50, is further fine-tuned using simulated training data mimicking the mine environment. All of the selected models are freely available with pre-trained weights online at the time of writing.
Performance is measured with two distinct tests: keypoint localization and human pose prediction. In the pose prediction test, a simple pose classifier based on random forests is predicting poses from keypoints predicted by the HPE methods, which are then scored based on the amount of correct and false poses predicted. Keypoint localization is scored in relation to the distance between predicted points and ground truth points. Best performing off-the-shelf model in the tests is found out to be Detectron 2's Keypoint R-CNN X101, but other (and lighter) top-scoring models reach promising accuracies as well.
Fine tuning Keypoint R-CNN R50 with simulated data improves performance on the test set. Scores both in keypoint localization and pose prediction tests increase after fine-tuning. These encouraging results suggest that a cost-effective domain transfer for existing methods to the mining environment is feasible.
There has been interest in applying HPE methods in the mining industry, which is already seeing an increasing amount of advanced computing and automation deployed in working environments. The underground working environments of industrial mines has many challenges for computer vision, most notably difficult lighting conditions. Additionally, when applying deep-learning solutions pre-trained on generic data, they might struggle with the characteristic visual features of mining operations, which differ greatly from many more common imaging scenarios.
In this thesis work, we investigate existing HPE solutions and if they could be used in a challenging underground working environment. To this end, 13 different pre-trained multi-person human pose estimation models are compared when used on test data collected from an industrial mine. Additionally, one of these models, Detectron 2's Keypoint R-CNN R50, is further fine-tuned using simulated training data mimicking the mine environment. All of the selected models are freely available with pre-trained weights online at the time of writing.
Performance is measured with two distinct tests: keypoint localization and human pose prediction. In the pose prediction test, a simple pose classifier based on random forests is predicting poses from keypoints predicted by the HPE methods, which are then scored based on the amount of correct and false poses predicted. Keypoint localization is scored in relation to the distance between predicted points and ground truth points. Best performing off-the-shelf model in the tests is found out to be Detectron 2's Keypoint R-CNN X101, but other (and lighter) top-scoring models reach promising accuracies as well.
Fine tuning Keypoint R-CNN R50 with simulated data improves performance on the test set. Scores both in keypoint localization and pose prediction tests increase after fine-tuning. These encouraging results suggest that a cost-effective domain transfer for existing methods to the mining environment is feasible.