Processing and visualizing website browsing data for targeted content applications
Ruusuvuori, Ahti (2018)
Ruusuvuori, Ahti
2018
Tietotekniikka
Tieto- ja sähkötekniikan tiedekunta - Faculty of Computing and Electrical Engineering
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2018-11-07
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201810242434
https://urn.fi/URN:NBN:fi:tty-201810242434
Tiivistelmä
Due to the rapidly increasing amounts of information on the Internet, providing users with relevant content becomes increasingly important. To achieve this, websites may employ recommender systems to offer targeted content to their users. In addition, due to the amount of information, recommender systems have begun adapting machine learning algorithms to more efficiently and accurately provide relevant content for the user.
Machine learning based approaches to recommender systems often involve ”teaching” the system by feeding them pre-existing data about users and their preferences. This study aims to provide a machine learning based approach to a situation where a website has no specific data related to user preferences. Instead, the browsing patterns of previous users on a website are observed and analyzed to provide an estimation of the possible interests of new users.
The K-means clustering algorithm is used to evaluate the validity of clustering anonymous session data from a website visitor tracking system as a basis for identifying types of users. The clustered data is plotted into a scatterplot for analysis and used to examine the existence of clusters. In the case of distinct clustering, the clusters can be labeled as user groups, and further visitors can quickly be assigned to one of the groups to provide targeted content to.
K-means clustering is shown to perform suboptimally due to limitations in the algorithm’s implementation as well as high amounts of intracluster noise in the source data. However, the data exhibits areas of density and sparsity and could potentially provide meaningful results with a different clustering algorithm.
Machine learning based approaches to recommender systems often involve ”teaching” the system by feeding them pre-existing data about users and their preferences. This study aims to provide a machine learning based approach to a situation where a website has no specific data related to user preferences. Instead, the browsing patterns of previous users on a website are observed and analyzed to provide an estimation of the possible interests of new users.
The K-means clustering algorithm is used to evaluate the validity of clustering anonymous session data from a website visitor tracking system as a basis for identifying types of users. The clustered data is plotted into a scatterplot for analysis and used to examine the existence of clusters. In the case of distinct clustering, the clusters can be labeled as user groups, and further visitors can quickly be assigned to one of the groups to provide targeted content to.
K-means clustering is shown to perform suboptimally due to limitations in the algorithm’s implementation as well as high amounts of intracluster noise in the source data. However, the data exhibits areas of density and sparsity and could potentially provide meaningful results with a different clustering algorithm.