Pointing Gestures in Human-Robot Collaboration
Sassali, Noora (2025)
Sassali, Noora
2025
Tietotekniikan DI-ohjelma - Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2025-05-20
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202505195761
https://urn.fi/URN:NBN:fi:tuni-202505195761
Tiivistelmä
The recent advances in machine cognition pave the way towards the fifth industrial revolution, Industry 5.0, which envisions a future where robots work alongside humans in synergy. Interaction modalities play a key role in improving communication fluency between the robots and their human operators. The complementary use of modalities makes communication more robust, as the systems are less dependent on individual input channels. Pointing gestures can be used as an effective tool in collaborative applications, when interpreting speech commands is difficult due to noise.
This thesis presents a tool for localizing pointing gestures using RGB-D stream and 2D pose estimation. The foundation is laid through a literature review, which introduces the core concepts and the role of multimodality in Human-Robot Collaboration and explores the related works for pointing gestures. The developed modules for gesture localization and target selection are presented alongside the relevant mathematical frameworks and methodology. The performance of the tool was evaluated using quantitative and qualitative tests. A proof-of-concept integration into a collaborative application, which included object detection, speech interpretation and feedback modules, was implemented.
The gesturing tool reached an average accuracy of 3.0--3.3cm for the dominant hand, and 6.4--6.7cm for the non-dominant hand, when 10 targets were pointed on a planar 80x60cm workspace. Qualitative results revealed characteristics of the tool, such as area-wise performance differences. Integration tests demonstrated the complementary use of gestures in decision-making, but also revealed limitations of the used model in comparison to other existing methodologies.
In the future, the tool can be improved by examining the qualitative results further and developing an optimization. Improving the tool algorithmically could enable the usage of both hands in target selection, improving the usability of the tool. As the tests were conducted in a controlled laboratory environment by one right-handed user, further testing is required to assess the usability of the tool in real-world industrial environments.
This thesis presents a tool for localizing pointing gestures using RGB-D stream and 2D pose estimation. The foundation is laid through a literature review, which introduces the core concepts and the role of multimodality in Human-Robot Collaboration and explores the related works for pointing gestures. The developed modules for gesture localization and target selection are presented alongside the relevant mathematical frameworks and methodology. The performance of the tool was evaluated using quantitative and qualitative tests. A proof-of-concept integration into a collaborative application, which included object detection, speech interpretation and feedback modules, was implemented.
The gesturing tool reached an average accuracy of 3.0--3.3cm for the dominant hand, and 6.4--6.7cm for the non-dominant hand, when 10 targets were pointed on a planar 80x60cm workspace. Qualitative results revealed characteristics of the tool, such as area-wise performance differences. Integration tests demonstrated the complementary use of gestures in decision-making, but also revealed limitations of the used model in comparison to other existing methodologies.
In the future, the tool can be improved by examining the qualitative results further and developing an optimization. Improving the tool algorithmically could enable the usage of both hands in target selection, improving the usability of the tool. As the tests were conducted in a controlled laboratory environment by one right-handed user, further testing is required to assess the usability of the tool in real-world industrial environments.