Attribute weighting in k-nearest neighbor classification
Syed, Muhammad Ejazuddin (2014)
Syed, Muhammad Ejazuddin
2014
Tietojenkäsittelyoppi - Computer Science
Informaatiotieteiden yksikkö - School of Information Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2014-11-27
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:uta-201412032370
https://urn.fi/URN:NBN:fi:uta-201412032370
Tiivistelmä
Data mining is the process of getting useful information by analyzing different kind of data. Predictive data mining is used to predict some property of incoming data for example how to classify it. Among many methods that are used for predictive data mining the K-nearest neighbor classification is one of the simplest and easy to use technique. Due to its simplicity small variations are possible with it for the purpose of improving its predictive accuracy.
The aim of this thesis was to study attribute weighting techniques and to implement and test some weighting variants in K-nearest neighbor classification. The HEOM distance metric and three values of K (1, 4 and 5) were used in K-nearest neighbor classification. Twelve datasets were selected from the UCI Machine Learning Repository for the analysis. Chi-square attribute weighting was done in order to implement the two weighting variants. One variation was the simple attribute weighting and the other was the class-wise attribute weighting. The evaluation was done by using the leave-one-out technique.
The conclusion that can be drawn from the results is that the structure of the dataset (the number and the distribution of the classes) and the value of K (the number of neighbors) have effect on the unweighted and attribute weighted K-nearest neighbor classification. For some datasets weighting is very useful especially for smaller classes, but for some datasets it does not give improvements in the result.
The aim of this thesis was to study attribute weighting techniques and to implement and test some weighting variants in K-nearest neighbor classification. The HEOM distance metric and three values of K (1, 4 and 5) were used in K-nearest neighbor classification. Twelve datasets were selected from the UCI Machine Learning Repository for the analysis. Chi-square attribute weighting was done in order to implement the two weighting variants. One variation was the simple attribute weighting and the other was the class-wise attribute weighting. The evaluation was done by using the leave-one-out technique.
The conclusion that can be drawn from the results is that the structure of the dataset (the number and the distribution of the classes) and the value of K (the number of neighbors) have effect on the unweighted and attribute weighted K-nearest neighbor classification. For some datasets weighting is very useful especially for smaller classes, but for some datasets it does not give improvements in the result.