Hyppää sisältöön
    • Suomeksi
    • In English
Trepo
  • Suomeksi
  • In English
  • Kirjaudu
Näytä viite 
  •   Etusivu
  • Trepo
  • Opinnäytteet - ylempi korkeakoulututkinto
  • Näytä viite
  •   Etusivu
  • Trepo
  • Opinnäytteet - ylempi korkeakoulututkinto
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

Generalizing Pareto optimal policies in multi-objective reinforcement learning: An empirical study of hypernetworks

Heiskanen, Santeri (2024)

 
Avaa tiedosto
HeiskanenSanteri.pdf (7.301Mt)
Lataukset: 



Heiskanen, Santeri
2024

Tietotekniikan DI-ohjelma - Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-10-04
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202409058563
Tiivistelmä
Multi-objective reinforcement learning is a branch of reinforcement learning considering multiple conflicting objectives simultaneously. Many existing multi-objective approaches suffer from poor sample efficiency. Often, this is due to the inability to utilize already learned skills when considering a new trade-off between the objectives. Motivated by the recent success of hypernetworks, this work couples an existing multi-objective reinforcement learning algorithm with a hypernetwork to quantify how soft information sharing affects sample efficiency.

The theoretical section of the thesis explains the basics of multi-objective reinforcement learning and hypernetworks. The underlying assumptions and the used algorithm with relevant background theory are described in detail. An actor-critic model is explored, where the critic is modeled as a contextual bandit using the hypernetwork. The hypernetwork is conditioned on the state and trade-off between objectives, while two critic network configurations with different inputs are considered to understand the information flow in the system.

The proposed method is evaluated in three robot control tasks with varying difficulty levels. The results indicate that while the hypernetworks can improve the convergence speed of the agents, the final performance is unaffected. A trade-off of the approach is increased run-to-run fluctuation caused by the agent's tendency to concentrate on the most effortless objective while disregarding the others. A straightforward proof-of-concept experiment showcased that this can be alleviated by accounting for the unbalanced objectives during the training. The development of a general algorithm taking into account variance in the objective difficulty is left for future research. Overall, the hypernetworks showed promising performance, with evident flaws that require attention going forward.
Kokoelmat
  • Opinnäytteet - ylempi korkeakoulututkinto [41809]
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste
 

 

Selaa kokoelmaa

TekijätNimekkeetTiedekunta (2019 -)Tiedekunta (- 2018)Tutkinto-ohjelmat ja opintosuunnatAvainsanatJulkaisuajatKokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste