Comparison of machine learning methods in the early identification of vasculitides, myositides and glomerulonephritides
Ryyppö, Rasmus; Häyrynen, Sergei; Joutsijoki, Henry; Juhola, Martti; Seppänen, Mikko R. J. (2024-01)
Ryyppö, Rasmus
Häyrynen, Sergei
Joutsijoki, Henry
Juhola, Martti
Seppänen, Mikko R. J.
01 / 2024
107917
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202311109561
https://urn.fi/URN:NBN:fi:tuni-202311109561
Kuvaus
Peer reviewed
Tiivistelmä
Background: Rare disease diagnoses are often delayed by years, including multiple doctor visits, and potential imprecise or incorrect diagnoses before receiving the correct one. Machine learning could solve this problem by flagging potential patients that doctors should examine more closely. Methods: Making the prediction situation as close as possible to real situation, we tested different masking sizes. In the masking phase, data was removed, and it was applied to all data points following the first rare disease diagnosis, including the day when the diagnosis was received, and in addition applied to selected number of days before initial diagnosis. Performance of machine learning models were compared with positive predictive value (PPV), negative predictive value (NPV), prevalence PPV (pPPV), prevalence NPV (pNPV), accuracy (ACC) and area under the receiver operation characteristics curve (AUC). Results: XGBoost had PPVs over 90 % in all masking settings, and InceptionVasGloMyotides had most of the PPVs over 90 %, but not as consistently. When the prevalence of the diseases was considered XGBoost achieved highest value of 8.8 % in binary classification with 30 days masking and InceptionVasGloMyotides achieved the best value of 6 % in the binary classification as well, but with 2160 days and 4320 days masking. ACC were varying between 89 % and 98 % with XGBoost and InceptionVasGloMyotides having variation between 79 % and 94 %. AUC on the other hand varied between 72.6 % and 94.5 % with InceptionVasGloMyotides and for XGBoost it varied between 69.9 % and 96.4 %. Conclusions: XGBoost and InceptionVasGloMyotides could successfully predict rare diseases for patients at least 30 days prior to initial rare disease diagnose. In addition, we managed to build performative custom deep learning model.
Kokoelmat
- TUNICRIS-julkaisut [19273]