Evaluating k-means and its variant clustering methods
Heikkinen, Janne (2024)
Heikkinen, Janne
2024
Tietojenkäsittelyopin maisteriohjelma - Master's Programme in Computer Science
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-10-31
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202410299635
https://urn.fi/URN:NBN:fi:tuni-202410299635
Tiivistelmä
Clustering is a widely used method to group data points to different groups. These groups consist of data points that are similar to each other and different from data points in other groups. k-means is a common clustering method. Other methods that are variants to k-means are k-means++, k-medians and Partitioning Around Medoids (PAM). The four methods’ performance are evaluated on three different datasets. Normalized Van Dongen criterion, Rand Index and normalized variation of information are scoring methods used to evaluate the results of each clustering method. Evaluated scores for each clustering method are similar to each other and each method found the correct number of clusters for their respective dataset excluding the dataset where each method failed. Even though the scores are similar there is difference what kind of clusters each method produces.