Machine Learning Methods for Aerosol Condensation Modelling: Suitability of the Methods for Sensitivity Analysis
Halonen, Onni Hermanni (2024)
Halonen, Onni Hermanni
2024
Teknis-luonnontieteellinen DI-ohjelma - Master's Programme in Science and Engineering
Tekniikan ja luonnontieteiden tiedekunta - Faculty of Engineering and Natural Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-09-04
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202406307445
https://urn.fi/URN:NBN:fi:tuni-202406307445
Tiivistelmä
There are many gaps in aerosol research. So far, there has not been a comprehensive report on what kind of approximations can be made in the study of aerosol condensation. The purpose of this thesis was to find out how to efficiently study aerosol condensation and its sensitivity to different parameters.
The research was done using numerical modelling and Machine Learning (ML) methods. The investigated parameters were vapour diffusion coefficient, mass accommodation coefficient, vapour mass concentration, particle number concentration, particle size distribution characteristics and particle shape. A numerical model was developed to acquire a large data set and various ML models were trained with this data. The models were compared, and it was found out how good results they give. In addition, it was investigated how it would be possible to implement a sensitivity analysis with the help of such models.
A good method for sensitivity analysis is to calculate SHapley Additive exPlanations (SHAP) values. They describe how strongly the change of each parameter affects the end result. More intuitive results can be obtained by calculating the mean absolute SHAP values. These describe the average effect of each parameter on the end result.
It was found that there are significant differences in the computational efficiencies of different types of ML models. Random forests were found to be slow and heavy in this case. Gradient Boosting machines were more efficient, but hyperparameter optimization must be done very carefully. This takes a lot of time. A simpler polynomial regression is a fast and efficient method but makes significant assumptions about the system. It gives rough estimates, not exact results.
Feedforward Neural Network (FNN) has generally been found to be suitable for a wide range of situations. Neural networks, however, like many other ML methods, require substantial computational resources. Therefore, the training data had to be limited to a fairly small size or the hyperparameter optimization had to be done with a small sample of hyperparameter combinations to save time. However, the results obtained with FNN were promising.
The results of the work, while preliminary, can be used in similar ML modelling. In particular, the work tells how the condensation of aerosols can be effectively modelled with numerical methods, how different types of ML methods are suitable for simulating such models and how sensitivity analysis for different parameters can be done with the models.
The research was done using numerical modelling and Machine Learning (ML) methods. The investigated parameters were vapour diffusion coefficient, mass accommodation coefficient, vapour mass concentration, particle number concentration, particle size distribution characteristics and particle shape. A numerical model was developed to acquire a large data set and various ML models were trained with this data. The models were compared, and it was found out how good results they give. In addition, it was investigated how it would be possible to implement a sensitivity analysis with the help of such models.
A good method for sensitivity analysis is to calculate SHapley Additive exPlanations (SHAP) values. They describe how strongly the change of each parameter affects the end result. More intuitive results can be obtained by calculating the mean absolute SHAP values. These describe the average effect of each parameter on the end result.
It was found that there are significant differences in the computational efficiencies of different types of ML models. Random forests were found to be slow and heavy in this case. Gradient Boosting machines were more efficient, but hyperparameter optimization must be done very carefully. This takes a lot of time. A simpler polynomial regression is a fast and efficient method but makes significant assumptions about the system. It gives rough estimates, not exact results.
Feedforward Neural Network (FNN) has generally been found to be suitable for a wide range of situations. Neural networks, however, like many other ML methods, require substantial computational resources. Therefore, the training data had to be limited to a fairly small size or the hyperparameter optimization had to be done with a small sample of hyperparameter combinations to save time. However, the results obtained with FNN were promising.
The results of the work, while preliminary, can be used in similar ML modelling. In particular, the work tells how the condensation of aerosols can be effectively modelled with numerical methods, how different types of ML methods are suitable for simulating such models and how sensitivity analysis for different parameters can be done with the models.