Analyzing cryptocurrency groups using topic modeling on Twitter posts
Rubio, Bruno (2019)
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
Cryptocurrencies are decentralized digital coins that use cryptographic protocols to provide more secure financial transactions. The world witnessed an impressive rise in the prices of these assets in the last few years, which stimulated a great interest regarding them. This thesis shifts the focus from the most notorious one, bitcoin, and from price aspects to concentrate on other cryptocoins and their technical features. A total of 25 cryptocoins were selected and then divided into 3 groups representing fundamental characteristics: Faster transactions, Smart Contracts and Privacy. Then, daily comments about these cryptocoins on Twitter were collected for 4 months. The main objective was to check whether the categorization fits well for each group, detect the prominent themes under discussion and perform a prediction task to see which ones may be discussed again in the future. Topic modeling, specifically Latent Dirichlet Allocation (LDA), was utilized to process the text data in order to find the topics that best represented each one of the groups. Coherence measures were applied to discover the optimal number of topics, which were later grouped into themes. Daily average probability distributions for topics, or topic weights, were treated as a time series data along with their theme representations. With that, it was possible to forecast theme weights using ARIMA and check the predictive ability of each theme by comparing mean squared error (MSE) of ARIMA and Naive methods. Overall, the cryptocoins seemed to be well represented since in every group there is at least one topic that directly refers to the meaning of the group. However, none of the previously mentioned topics was the most important in any of the groups. Faster transactions and Smart Contracts ended up being similar groups, having a Financial topic-group as the most notable theme, a similar organization of their remaining ones and low predictive ability, while the Privacy currency-group had different results, with a Mixed topic-group as the best-positioned theme and slightly better forecasting results.