Aircraft Control with Deep Reinforcement Learning in Real-time Simulations
Seppälä, Jaakko (2021)
Seppälä, Jaakko
2021
Automaatiotekniikan DI-ohjelma - Master's Programme in Automation Engineering
Tekniikan ja luonnontieteiden tiedekunta - Faculty of Engineering and Natural Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2021-03-02
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202102021883
https://urn.fi/URN:NBN:fi:tuni-202102021883
Tiivistelmä
In this thesis, reinforcement learning (RL) with deep neural networks is applied to controlling a simulated aircraft. The aim for the control is to maneuver the aircraft to a given target while minimizing input changes, fuel consumption and non-leveled flight. Multiple models featuring various parameter counts and layer numbers are tested for both control and real-time performance.
First question is how well the models learn to control the aircraft i.e. the control performance. The model’s control performance is then compared to a model predictive control themed optimization method that is used as an upper bound reference for the performance. This is done by running various benchmark scenarios and comparing the total costs. Also, the benchmark scenarios are ran with varying simulation parameters to see how well the models generalize to similar systems.
Second question is how well the neural network models perform real-time wise. Computation times are measured with all the model candidates on a reference test platform. Then, an estimate for the maximum number of aircrafts controlled simultaneously in real-time simulations is calculated as the result.
For constructing the RL environment, an aircraft physics model was given by the thesis commissioner, Insta DefSec Oy. The simulation model is lightweight, so it is easily implemented in to a RL system, but realistic enough, featuring nonlinear and nondifferentiable dynamics. The RL environment is implemented with Python using the OpenAI Gym interface for compatibility with machine learning libraries such as Keras and TensorFlow.
The selected RL algorithm is deep deterministic policy gradient (DDPG), which is used in a three-phased optimization scheme for the learning rate, critic, and actor models. The results indicated that the deeper and biggest networks in terms of parameters worked best as the critic model. However, all the actor model candidates achieved quite similar performance with the optimal critic model, with the best performer having around 100 000 parameters.
Overall, the control performance for the best performing actor model was almost as good as the model optimization in fine-tuning scenarios and better than the model optimization result in larger distance scenarios. Though, the model optimization was ran with a rather short prediction horizon of 21 seconds that is likely the cause for worse performance in longer scenarios. Also, the actor performed well while changing the simulation parameters, achieving low performance only with extremely unrealistic parameters.
For the most control-performing model, the real-time measurements showed that the maximum amount of aircrafts controlled with the central processing unit (CPU) is about 50 for 100 Hz simulations. Using the batching optimizations bumped this number up to thousands for the CPU and tens of thousands by also using a high-end graphics card.
First question is how well the models learn to control the aircraft i.e. the control performance. The model’s control performance is then compared to a model predictive control themed optimization method that is used as an upper bound reference for the performance. This is done by running various benchmark scenarios and comparing the total costs. Also, the benchmark scenarios are ran with varying simulation parameters to see how well the models generalize to similar systems.
Second question is how well the neural network models perform real-time wise. Computation times are measured with all the model candidates on a reference test platform. Then, an estimate for the maximum number of aircrafts controlled simultaneously in real-time simulations is calculated as the result.
For constructing the RL environment, an aircraft physics model was given by the thesis commissioner, Insta DefSec Oy. The simulation model is lightweight, so it is easily implemented in to a RL system, but realistic enough, featuring nonlinear and nondifferentiable dynamics. The RL environment is implemented with Python using the OpenAI Gym interface for compatibility with machine learning libraries such as Keras and TensorFlow.
The selected RL algorithm is deep deterministic policy gradient (DDPG), which is used in a three-phased optimization scheme for the learning rate, critic, and actor models. The results indicated that the deeper and biggest networks in terms of parameters worked best as the critic model. However, all the actor model candidates achieved quite similar performance with the optimal critic model, with the best performer having around 100 000 parameters.
Overall, the control performance for the best performing actor model was almost as good as the model optimization in fine-tuning scenarios and better than the model optimization result in larger distance scenarios. Though, the model optimization was ran with a rather short prediction horizon of 21 seconds that is likely the cause for worse performance in longer scenarios. Also, the actor performed well while changing the simulation parameters, achieving low performance only with extremely unrealistic parameters.
For the most control-performing model, the real-time measurements showed that the maximum amount of aircrafts controlled with the central processing unit (CPU) is about 50 for 100 Hz simulations. Using the batching optimizations bumped this number up to thousands for the CPU and tens of thousands by also using a high-end graphics card.