Shapley values as a generic approach to interpretable feature selection
Trotskii, Igor (2023)
Trotskii, Igor
2023
Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2023-11-17
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202310198939
https://urn.fi/URN:NBN:fi:tuni-202310198939
Tiivistelmä
The Shapley value is one of the most popular frameworks for explaining black-box machine learning models, originating from cooperative game theory. Shapley values compute the average of all the marginal contributions of each feature. Consequently, they provide a ranking of features based on their average contribution to the model’s output, which is frequently used for feature selection in practical work. Despite the significant amount of literature analyzing the performance of such feature selection, two major points are missing: a detailed analysis of the performance of Shapley-values-based feature selection relative to other, more traditional feature selection techniques across various datasets and domains, and a discussion about the validity of using Shapley values for feature selection in the first place. This thesis aims to compare the performance of Shapley-values-based feature selection with other well-established methods across various domains, including the state-of-the-art minimal-redundancy-maximal-relevance algorithm. Furthermore, it will explore the implications of the Shapley value axioms on feature selection.
In this thesis work, Shapley-values-based feature selection is compared to other methods using multiple datasets for both binary and multiclass text classification to gauge the method’s capabilities with high dimensional text data, and low and high dimensional numerical datasets. The Shapley-values-based method emerged as one of the top performers according to the used evaluation metrics, i.e. F1-score, precision and recall. However, it did not consistently outperform domain-specific methods. The Shapley-values-based method turned out to be a fast ”wrapper-like” feature selection technique that, unlike fast filter methods, considers feature interactions in its feature ranking. Yet, it does not guarantee an optimal feature subset, nor is it able to handle redundancy by itself due to the limitations of Shapley values definition. Some more sophisticated algorithms based on Shapley values, such as Interaction Shapley Values, are capable of mitigating the mentioned disadvantages, but they are neither as fast nor as memory-efficient.
In this thesis work, Shapley-values-based feature selection is compared to other methods using multiple datasets for both binary and multiclass text classification to gauge the method’s capabilities with high dimensional text data, and low and high dimensional numerical datasets. The Shapley-values-based method emerged as one of the top performers according to the used evaluation metrics, i.e. F1-score, precision and recall. However, it did not consistently outperform domain-specific methods. The Shapley-values-based method turned out to be a fast ”wrapper-like” feature selection technique that, unlike fast filter methods, considers feature interactions in its feature ranking. Yet, it does not guarantee an optimal feature subset, nor is it able to handle redundancy by itself due to the limitations of Shapley values definition. Some more sophisticated algorithms based on Shapley values, such as Interaction Shapley Values, are capable of mitigating the mentioned disadvantages, but they are neither as fast nor as memory-efficient.