Human Pose Estimation Using Computer Vision
Kuang, Rui (2020)
Kuang, Rui
2020
Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. Only for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2020-12-03
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202012028423
https://urn.fi/URN:NBN:fi:tuni-202012028423
Tiivistelmä
Human pose estimation (HPE) is a classical task in the field of computer vision. Applications developed based HPE can bring us entertainment and convenience in our daily lives. For example, the popular software for teaching dancing and animation. In addition, with the development of CNN networks, many high-performance deep network structures are applied to HPE to achieve better performance. However, some networks have huge structures so that they cannot be used on devices with small resources. Therefore, human pose estimation based on small networks has significance for studying.
This thesis studies the simplified lightweight networks for solving human pose estimation. The baseline is the method proposed by Xiao et al. [47]. This baseline uses ResNet-50 [56] as the backbone to do downsampling. In order to obtain heat maps for joints, it adopts the combination of deconvolution and batch normalization. In addition, it also uses L2 loss as the loss function and uses Adam for algorithm optimization. Inspired by the baseline and SENet [55], the SE-ResNet50 is taken to do downsampling. This network structure can reduce the loss of important feature information in the down-sampling process by integrating information from the residual module and the SE module. Compared with the baseline, its accuracy is improved to 72.1% on the COCO 2017 [43] and 90.3% on MPII [34]. Besides, compared with the baseline implemented on PyTorch, the implemented version on PaddlePaddle achieves higher performance, and the FLOPs for PaddlePaddle are reduced by 0.18 G.
This thesis studies the simplified lightweight networks for solving human pose estimation. The baseline is the method proposed by Xiao et al. [47]. This baseline uses ResNet-50 [56] as the backbone to do downsampling. In order to obtain heat maps for joints, it adopts the combination of deconvolution and batch normalization. In addition, it also uses L2 loss as the loss function and uses Adam for algorithm optimization. Inspired by the baseline and SENet [55], the SE-ResNet50 is taken to do downsampling. This network structure can reduce the loss of important feature information in the down-sampling process by integrating information from the residual module and the SE module. Compared with the baseline, its accuracy is improved to 72.1% on the COCO 2017 [43] and 90.3% on MPII [34]. Besides, compared with the baseline implemented on PyTorch, the implemented version on PaddlePaddle achieves higher performance, and the FLOPs for PaddlePaddle are reduced by 0.18 G.