Human Pose Estimation Using Computer Vision

Kuang, Rui

Human Pose Estimation Using Computer Vision

Kuang, Rui (2020)

Avaa tiedosto

KuangRui.pdf (3.492Mt)

Lataukset:

Tekijä ei ole antanut lupaa avoimeen julkaisuun, aineisto on luettavissa vain Tampereen yliopiston kirjastojen opinnäytepisteillä. The author has not given permission to publish the thesis online. The thesis can be read at the thesis point at Tampere University Library.

Kuang, Rui

2020

Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences

This publication is copyrighted. Only for Your own personal use. Commercial use is prohibited.

Hyväksymispäivämäärä

2020-12-03

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202012028423

Tiivistelmä

Human pose estimation (HPE) is a classical task in the field of computer vision. Applications developed based HPE can bring us entertainment and convenience in our daily lives. For example, the popular software for teaching dancing and animation. In addition, with the development of CNN networks, many high-performance deep network structures are applied to HPE to achieve better performance. However, some networks have huge structures so that they cannot be used on devices with small resources. Therefore, human pose estimation based on small networks has significance for studying.
This thesis studies the simplified lightweight networks for solving human pose estimation. The baseline is the method proposed by Xiao et al. [47]. This baseline uses ResNet-50 [56] as the backbone to do downsampling. In order to obtain heat maps for joints, it adopts the combination of deconvolution and batch normalization. In addition, it also uses L2 loss as the loss function and uses Adam for algorithm optimization. Inspired by the baseline and SENet [55], the SE-ResNet50 is taken to do downsampling. This network structure can reduce the loss of important feature information in the down-sampling process by integrating information from the residual module and the SE module. Compared with the baseline, its accuracy is improved to 72.1% on the COCO 2017 [43] and 90.3% on MPII [34]. Besides, compared with the baseline implemented on PyTorch, the implemented version on PaddlePaddle achieves higher performance, and the FLOPs for PaddlePaddle are reduced by 0.18 G.

Kokoelmat

Opinnäytteet - ylempi korkeakoulututkinto (Limited access) [2333]