A cluster-assisted differential evolution-based hybrid oversampling method for imbalanced datasets
Karabiyik, Muhammed Abdulhamid; Turkoglu, Bahaeddin; Asuroglu, Tunc (2025)
Karabiyik, Muhammed Abdulhamid
Turkoglu, Bahaeddin
Asuroglu, Tunc
2025
PeerJ Computer Science
e3177
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2025110410378
https://urn.fi/URN:NBN:fi:tuni-2025110410378
Kuvaus
Peer reviewed
Tiivistelmä
Class imbalance remains a significant challenge in machine learning, leading to biased models that favor the majority class while failing to accurately classify minority instances. Traditional oversampling methods, such as Synthetic Minority Over-sampling Technique (SMOTE) and its variants, often struggle with class overlap, poor decision boundary representation, and noise accumulation. To address these limitations, this study introduces ClusterDEBO, a novel hybrid oversampling method that integrates K-Means clustering with differential evolution (DE) to generate synthetic samples in a more structured and adaptive manner. The proposed method first partitions the minority class into clusters using the silhouette score to determine the optimal number of clusters. Within each cluster, DE-based mutation and crossover operations are applied to generate diverse and well-distributed synthetic samples while preserving the underlying data distribution. Additionally, a selective sampling and noise reduction mechanism is employed to filter out low-impact synthetic samples based on their contribution to classification performance. The effectiveness of ClusterDEBO is evaluated on 44 benchmark datasets using k-Nearest Neighbors (kNN), decision tree (DT), and support vector machines (SVM) as classifiers. The results demonstrate that ClusterDEBO consistently outperforms existing oversampling techniques, leading to improved class separability and enhanced classifier robustness. Moreover, statistical validation using the Friedman test confirms the significance of the improvements, ensuring that the observed gains are not due to random variations. The findings highlight the potential of cluster-assisted differential evolution as a powerful strategy for handling imbalanced datasets.
Kokoelmat
- TUNICRIS-julkaisut [22195]
