Transformer model for Biomarkers prediction : Evaluating the impact of ECDF normalization on model robustness in clinical data
Shavliuk, Mikhail (2025)
Shavliuk, Mikhail
2025
Master's Programme in Computing Sciences
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
Hyväksymispäivämäärä
2025-02-18
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2024111810270
https://urn.fi/URN:NBN:fi:tuni-2024111810270
Tiivistelmä
This thesis addresses important challenges in predictive analytics in healthcare, specifically focusing on the robustness of machine learning models in the face of outliers and noise within clinical data. Using a transformer-based model and the MIMIC-III dataset, we explore empirical cumulative distribution function (ECDF) normalization as a robust alternative to conventional Z-score normalization. Our findings demonstrate that ECDF normalization significantly enhances model performance, retaining up to 196% more ROC-AUC for mortality prediction under high uniform noise conditions and reducing mean squared error (MSE) by 31% for percentile-based forecasting of time-series values, compared to Z-score normalization. These results suggest that ECDF normalization provides a robust approach for predictive analytics in healthcare, reducing the need for manual outlier removal and thus enhancing scalability for larger datasets. In addition to these methodological advances, significant computational optimizations led to a 30x improvement in training speed, which underscores the model's scalability for real-world applications.