Deep Counterfactual Regret Minimization in Continuous Action Space
Kattainen, Emil (2022)
Kattainen, Emil
2022
Tietotekniikan DI-ohjelma - Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2022-05-04
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202205014203
https://urn.fi/URN:NBN:fi:tuni-202205014203
Tiivistelmä
Counterfactual regret minimization based algorithms are used as the state-of-the-art solutions for various problems within imperfect-information games. Deep learning has seen a multitude of uses in recent years. Recently deep learning has been combined with counterfactual regret minimization to increase the generality of the counterfactual regret minimization algorithms.
This thesis proposes a new way of increasing the generality of the counterfactual regret minimization algorithms even further by increasing the role of neural networks. In addition, to combat the variance caused by the use of neural networks, a new way of sampling is introduced to reduce the variance.
These proposed modifications were compared against baseline algorithms. The proposed way of reducing variance improved the performance of counterfactual regret minimization, while the method for increasing generality was found to be lacking especially when scaling the baseline model. Possible reasons for this are discussed and future research ideas are offered.
This thesis proposes a new way of increasing the generality of the counterfactual regret minimization algorithms even further by increasing the role of neural networks. In addition, to combat the variance caused by the use of neural networks, a new way of sampling is introduced to reduce the variance.
These proposed modifications were compared against baseline algorithms. The proposed way of reducing variance improved the performance of counterfactual regret minimization, while the method for increasing generality was found to be lacking especially when scaling the baseline model. Possible reasons for this are discussed and future research ideas are offered.