Applying Machine Learning Algorithms to Psychiatric Patient Data
Nikkanen, Tommi (2021)
Nikkanen, Tommi
2021
Tietojenkäsittelyopin maisteriohjelma - Master's Programme in Computer Science
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2021-11-22
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202111228585
https://urn.fi/URN:NBN:fi:tuni-202111228585
Tiivistelmä
The purpose of this work is to classify data in the field of psychiatry and neurology by applying different supervised machine learning algorithms.
The work is divided into two parts. The methodogical part represents all the methods used in the project part. Evaluation measures are given for balanced and imbalanced datasets to compare how well the different models perform. Also, a SMOTE (Synthetic Minority Over-sampling Technique) algorithm is utilized to help with inbalanced datasets. The project part consists of two different classification tasks: a classification of applications into six predefined categories, and a binary classification of the applications for compensation into two groups - accepted / declined. Different supervised machine learning algorithms were applied to the data. Random forest gave slightly better results than the other classifiers in the first classification task. Random forest was used in the second classification task. The results got improved in both classification tasks by using SMOTE algorithm for generating synthetic samples to balance the different categories in the dataset.
The work is divided into two parts. The methodogical part represents all the methods used in the project part. Evaluation measures are given for balanced and imbalanced datasets to compare how well the different models perform. Also, a SMOTE (Synthetic Minority Over-sampling Technique) algorithm is utilized to help with inbalanced datasets. The project part consists of two different classification tasks: a classification of applications into six predefined categories, and a binary classification of the applications for compensation into two groups - accepted / declined. Different supervised machine learning algorithms were applied to the data. Random forest gave slightly better results than the other classifiers in the first classification task. Random forest was used in the second classification task. The results got improved in both classification tasks by using SMOTE algorithm for generating synthetic samples to balance the different categories in the dataset.