Random Walk Oversampling Technique for Minority Class Classification
Samad, Syed Abdul (2013)
Samad, Syed Abdul
2013
Master's Degree Programme in Information Technology
Luonnontieteiden tiedekunta - Faculty of Natural Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2013-05-08
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201305241178
https://urn.fi/URN:NBN:fi:tty-201305241178
Tiivistelmä
Learning classifiers from imbalanced or skewed datasets is an important topic, aris- ing very often in practice in classification problems. In such problems, almost all the instances are labeled as one class, while very few instances are labeled as the other class, usually the more important class. Traditional classifiers trying to achieve an accurate performance over a full range of instances are not suitable to deal with imbalance learning tasks. They tend to classify all the data into the majority class, which is usually the less important class. Researchers have already presented many solutions to this problem both on data and algorithmic level.
In this thesis a new approach to deal with imbalanced datasets is presented on the data level. This approach is an oversampling technique which involves generating new samples for the minority class by making a random walk in the dataset. The new samples are generated by some Markov Chain Monte Carlo Algorithm. Newly generated samples are then added to existing data set in order to balance the ratio between majority and minority class samples.
In this thesis a new approach to deal with imbalanced datasets is presented on the data level. This approach is an oversampling technique which involves generating new samples for the minority class by making a random walk in the dataset. The new samples are generated by some Markov Chain Monte Carlo Algorithm. Newly generated samples are then added to existing data set in order to balance the ratio between majority and minority class samples.