Stock Prediction Based on Social Media Data via Sentiment Analysis: a Study on Reddit
Gui, Heng Jr (2019)
Gui, Heng Jr
2019
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2019-11-27
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-201911246223
https://urn.fi/URN:NBN:fi:tuni-201911246223
Tiivistelmä
With the development of internet and information technology, online text data has become available and accessible for research in many fields including stock prediction. Social media, being one of the biggest content generators on the internet, is a great data resource for text mining and stock prediction. It has a large capacity, high data density, and fast information spread.
In this thesis, analyses on the relationship between the stock-related text in social media (Reddit) and the price changes of corresponding stocks are implemented. In the analysis, sentiment analysis is first applied to extract the individual users’ emotions and opinions about the stocks. After that, the extracted features are analyzed via descriptive statistics and predictive analysis using the Pearson correlation coefficient and machine learning models. The predictive analysis is designed to examine the dependence between the social media text data and stock price change by evaluating the performance of predictions, four indicators are used in the evaluation including “prediction accuracy on price change direction” and three indicators in simulated algorithm trading experiments based on prediction results. They are “total profit with trading strategy for single stock”, “daily profit efficiency of trading strategy” and “total profit with Portfolio trading strategy”. From the results and the comparison with a Buy and Hold (B&H) baseline strategy, the predictions show good results in terms of “daily profit efficiency” and “total profit with Portfolio trading strategy”. Therefore, the online forum text from Reddit are proved to be correlated with future stock price changes and might be used to make more profit than B&H strategy by incorporating their information in portfolio trading strategies.
In this thesis, analyses on the relationship between the stock-related text in social media (Reddit) and the price changes of corresponding stocks are implemented. In the analysis, sentiment analysis is first applied to extract the individual users’ emotions and opinions about the stocks. After that, the extracted features are analyzed via descriptive statistics and predictive analysis using the Pearson correlation coefficient and machine learning models. The predictive analysis is designed to examine the dependence between the social media text data and stock price change by evaluating the performance of predictions, four indicators are used in the evaluation including “prediction accuracy on price change direction” and three indicators in simulated algorithm trading experiments based on prediction results. They are “total profit with trading strategy for single stock”, “daily profit efficiency of trading strategy” and “total profit with Portfolio trading strategy”. From the results and the comparison with a Buy and Hold (B&H) baseline strategy, the predictions show good results in terms of “daily profit efficiency” and “total profit with Portfolio trading strategy”. Therefore, the online forum text from Reddit are proved to be correlated with future stock price changes and might be used to make more profit than B&H strategy by incorporating their information in portfolio trading strategies.