# Bioprocess optimization using machine learning methods

##### Hassan, Syeda Sakira (2013)

Hassan, Syeda Sakira

2013

Master's Degree Programme in Information Technology

Luonnontieteiden tiedekunta - Faculty of Natural SciencesTieto- ja sähkötekniikan tiedekunta - Faculty of Computing and Electrical Engineering

This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.

##### Hyväksymispäivämäärä

2013-10-09**Julkaisun pysyvä osoite on**

http://urn.fi/URN:NBN:fi:tty-201310241387

##### Tiivistelmä

In bioprocess development, the need for optimization is to achieve improvements in the productivity as well as in the quality of the product. This involves acquiring an overview of dataset associated with different process runs, identifying primary control parameters, and determining a useful control direction. Hence, the use of several data analysis approaches to explore optimization possibilities can be very valuable in bioprocess development.

In this thesis, multiple linear regression, Lasso regression, and artificial neural networks were used for modeling a bioprocess dataset. As a case study, we used the data obtained from a statistical culture media optimization experiment for microbial hydrogen production. Apart from the linear models, dataset were transformed to build the quadratic multiple linear regression and Lasso models. In addition, two-layer and three-layer artificial neural networks models were also developed. In order to predict the maximum achievable hydrogen production yield, a genetic algorithm was used to optimize the parameters of the developed models. The prediction accuracy and the maximum achievable hydrogen yield by Lasso and artificial neural networks models were benchmarked against those of the multiple linear regression.

All the three methods were capable in providing a significant model for the culture media optimization. However, the performance of the quadratic multiple linear regression to fit the examined data was not adequate. In this case, the correlation between the observed and predicted yield was 0.37. The modeling was still successful with the quadratic Lasso model (0.82). The performances of two artificial neural network models outperformed the others. According to artificial neural networks, the correlations between the observed and predicted yield were 0.92 for two-layer and 0.91 for three-layer models. With the help of genetic algorithm, the maximum achievable hydrogen yield was 2.24 mol-H 2 /mol-glycerol consumed for the linear multiple linear regression model. On the other hand, the results obtained from the Lasso and artificial neural networks models were closer to the highest experimental observation. Thus, we found that both lasso regression and artificial neural networks were pertinent to this kind of bioprocess data.

In this thesis, multiple linear regression, Lasso regression, and artificial neural networks were used for modeling a bioprocess dataset. As a case study, we used the data obtained from a statistical culture media optimization experiment for microbial hydrogen production. Apart from the linear models, dataset were transformed to build the quadratic multiple linear regression and Lasso models. In addition, two-layer and three-layer artificial neural networks models were also developed. In order to predict the maximum achievable hydrogen production yield, a genetic algorithm was used to optimize the parameters of the developed models. The prediction accuracy and the maximum achievable hydrogen yield by Lasso and artificial neural networks models were benchmarked against those of the multiple linear regression.

All the three methods were capable in providing a significant model for the culture media optimization. However, the performance of the quadratic multiple linear regression to fit the examined data was not adequate. In this case, the correlation between the observed and predicted yield was 0.37. The modeling was still successful with the quadratic Lasso model (0.82). The performances of two artificial neural network models outperformed the others. According to artificial neural networks, the correlations between the observed and predicted yield were 0.92 for two-layer and 0.91 for three-layer models. With the help of genetic algorithm, the maximum achievable hydrogen yield was 2.24 mol-H 2 /mol-glycerol consumed for the linear multiple linear regression model. On the other hand, the results obtained from the Lasso and artificial neural networks models were closer to the highest experimental observation. Thus, we found that both lasso regression and artificial neural networks were pertinent to this kind of bioprocess data.