Reward Learning from Demonstrations for Autonomous Earthmoving
Dewundara Liyanage, Ishira Uthkarshini (2020)
Dewundara Liyanage, Ishira Uthkarshini
2020
Automaatiotekniikan DI-ohjelma - Master's Programme in Automation Engineering
Tekniikan ja luonnontieteiden tiedekunta - Faculty of Engineering and Natural Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2020-10-26
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202010257460
https://urn.fi/URN:NBN:fi:tuni-202010257460
Tiivistelmä
With the increasing complexity of specific tasks, automation engineers look at various machine learning methods as opposed to methods that require laborious task specifications. Imitation learning methods have had varying degrees of success in the past. The main drawback of imitation learning methods is their inability to adapt to newer and ever-changing problems which hinder their flexibility. Reinforcement learning aims to solve the problem by learning instead based on a rewarding mechanism. However, a reward function needs to be determined prior to carrying out reinforcement learning. A range of methods have been used to define the reward functions, which are collectively referred to as inverse reinforcement learning methods.
The objective of this research is to find a reward function for the autonomous earthmoving of a GIM Machine. In this study, different inverse reinforcement learning implementations were explored. Unsupervised perceptual rewards was selected considering that it is a sample efficient method that is easy to implement on a machine without good simulations for the environment and its interactions. Based on this method, the task is broken down into stages. Demonstration data and stage labels are used to train a stage classifier. When an observation is made, it is classified into one of the stages, and the reward is calculated as a function of the distance to the next stage.
Unsupervised perceptual rewards is first used to obtain the reward function for the OpenAI Gym mountain car problem. Then q-learning is used to confirm that reinforcement learning can be applied effectively using the reward function obtained using this unsupervised perceptual rewards method. The method is then applied to demonstrations of the GIM Machine. Both low-level sensor data, as well as image features, have been used to calculate the reward. This research confirms the feasibility of using unsupervised perceptual rewards for reward function calculation and tests its robustness to changes in weather and lighting.
The objective of this research is to find a reward function for the autonomous earthmoving of a GIM Machine. In this study, different inverse reinforcement learning implementations were explored. Unsupervised perceptual rewards was selected considering that it is a sample efficient method that is easy to implement on a machine without good simulations for the environment and its interactions. Based on this method, the task is broken down into stages. Demonstration data and stage labels are used to train a stage classifier. When an observation is made, it is classified into one of the stages, and the reward is calculated as a function of the distance to the next stage.
Unsupervised perceptual rewards is first used to obtain the reward function for the OpenAI Gym mountain car problem. Then q-learning is used to confirm that reinforcement learning can be applied effectively using the reward function obtained using this unsupervised perceptual rewards method. The method is then applied to demonstrations of the GIM Machine. Both low-level sensor data, as well as image features, have been used to calculate the reward. This research confirms the feasibility of using unsupervised perceptual rewards for reward function calculation and tests its robustness to changes in weather and lighting.