Sentiment Analysis Performed on Forum Discussions Using RNNs
Syvänen, Aapo (2022)
Syvänen, Aapo
2022
Tieto- ja sähkötekniikan kandidaattiohjelma - Bachelor's Programme in Computing and Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2022-06-09
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202205255260
https://urn.fi/URN:NBN:fi:tuni-202205255260
Tiivistelmä
Especially online, emotional discussion is prevalent in today's discourse. The rise of polarity in online discussions motivated the research of how emotions can be regulated, or implicitly affected using natural language processing (NLP) techniques.
Many pre-trained sentiment analysis NLP models are available online, but few or none of them are trained on data that produce good results when analysing messages from forum discussions. The aim of this thesis was to produce a model with which emotional messages can be labelled and presented with affect labelling to the participants of online discussion. The Bi-directional Recurrent Neural Network model was trained with 27,000 sentences that were labelled with three labels, corresponding to the polarity of the sentence - negative, neutral or positive. The model achieved 72.2 % accuracy during training.
During testing, the model performed well in some cases; most notably short, very negative sentences were labelled correctly with a high confidence but the model incorrectly predicted very long sentences as positive with a high confidence. The most common classification was neutral with over 75 % of all sentences receiving this classification. This might be the result of a biased training data set, which also had the highest number of sentences labelled as neutral with similar percentage.
The main reason this model was created was the experimentation of affect labelling in a controlled setting. These experiments will be carried out in 2022 and further results will then be available.
Many pre-trained sentiment analysis NLP models are available online, but few or none of them are trained on data that produce good results when analysing messages from forum discussions. The aim of this thesis was to produce a model with which emotional messages can be labelled and presented with affect labelling to the participants of online discussion. The Bi-directional Recurrent Neural Network model was trained with 27,000 sentences that were labelled with three labels, corresponding to the polarity of the sentence - negative, neutral or positive. The model achieved 72.2 % accuracy during training.
During testing, the model performed well in some cases; most notably short, very negative sentences were labelled correctly with a high confidence but the model incorrectly predicted very long sentences as positive with a high confidence. The most common classification was neutral with over 75 % of all sentences receiving this classification. This might be the result of a biased training data set, which also had the highest number of sentences labelled as neutral with similar percentage.
The main reason this model was created was the experimentation of affect labelling in a controlled setting. These experiments will be carried out in 2022 and further results will then be available.
Kokoelmat
- Kandidaatintutkielmat [8709]