Interpretability of Classification & Regression Ensemble Models
Bangalore Muralidhara, Chethan Kashyap (2024)
Bangalore Muralidhara, Chethan Kashyap
2024
Master's Programme in Computing Sciences and Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
Hyväksymispäivämäärä
2024-12-06
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2024112610545
https://urn.fi/URN:NBN:fi:tuni-2024112610545
Tiivistelmä
Featuring an emphasis on the implementation of Random Forests for the classification of cardiovascular disease and the prediction of California housing prices, this thesis investigates the interpretability of machine learning models. While Random Forests are highly effective ensemble models, their complexity often makes it challenging to comprehend how decisions are made behind predictions. To address this, we applied several interpretation techniques—LIME, SHAP, Gini Impurity, and Permutation Importance—to clarify how the model uses data to make predictions. Each method was evaluated based on runtime, interpretability, model flexibility, and its ability to capture feature interactions. SHAP emerged as the most comprehensive, offering both local and global insights. However, methods like LIME and Gini Impurity provided efficient alternatives, especially where quick, instance-specific insights or global feature importance were sufficient. Our findings suggest that while interpretation methods don’t fully reveal the inner workings of complex models, they help make predictions more transparent and understandable. This study highlights the importance of selecting appropriate interpretation methods to balance performance and interpretability, ultimately fostering trust in machine learning applications across domains like healthcare and real estate.