Text Analytics Methods for Sentence-level Sentiment Analysis
Zou, Nannan (2019)
Zou, Nannan
2019
Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2019-05-24
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201905211718
https://urn.fi/URN:NBN:fi:tty-201905211718
Tiivistelmä
Opinions have important effects on the process of decision making. With the explosion of text information on networks, sentiment analysis, which aims at predicting the opinions of people about specific entities, has become a popular tool to make sense of countless text information. There are multiple approaches for sentence-level sentiment analysis, including machine-learning methods and lexicon-based methods. In this MSc thesis we studied two typical sentiment analysis techniques -- AFINN and RNTN, which are also the representation of lexicon-based and machine-learning methods, respectively.
The assumption of a lexicon-based method is that the sum of sentiment orientation of each word or phrase predicts the contextual sentiment polarity. AFINN is a word list with sentiment strength ranging from -5 to +5, which is constructed with the inclusion of Internet slang and obscene words. With AFINN, we extract sentiment words from sentences and sentiment scores are then assigned to these words. The sentiment of a sentence is aggregated as the sum of scores from all its words.
The Stanford Sentiment Treebank is a corpus with labeled parse trees, which provides the community with the possibility to train compositional models based on supervised machine learning techniques. The labels of Stanford Sentiment Treebank involve 5 categories: negative, somewhat negative, neutral, somewhat positive and positive. Compared to the standard recursive neural network (RNN) and Matrix-Vector RNN, Recursive Neural Tensor Network (RNTN) is a more powerful composition model to compute compositional vector representations for input sentences. Dependent on the Stanford Sentiment Treebank, RNTN can predict the sentiment of input sentences by its computed vector representations.
With the benchmark datasets that cover diverse data sources, we carry out a thorough comparison between AFINN and RNTN. Our results highlight that although RNTN is much more complicated than AFINN, the performance of RNTN is not better than that of AFINN. To some extent, AFINN is more simple, more generic and takes less computation resources than RNTN in sentiment analysis.
The assumption of a lexicon-based method is that the sum of sentiment orientation of each word or phrase predicts the contextual sentiment polarity. AFINN is a word list with sentiment strength ranging from -5 to +5, which is constructed with the inclusion of Internet slang and obscene words. With AFINN, we extract sentiment words from sentences and sentiment scores are then assigned to these words. The sentiment of a sentence is aggregated as the sum of scores from all its words.
The Stanford Sentiment Treebank is a corpus with labeled parse trees, which provides the community with the possibility to train compositional models based on supervised machine learning techniques. The labels of Stanford Sentiment Treebank involve 5 categories: negative, somewhat negative, neutral, somewhat positive and positive. Compared to the standard recursive neural network (RNN) and Matrix-Vector RNN, Recursive Neural Tensor Network (RNTN) is a more powerful composition model to compute compositional vector representations for input sentences. Dependent on the Stanford Sentiment Treebank, RNTN can predict the sentiment of input sentences by its computed vector representations.
With the benchmark datasets that cover diverse data sources, we carry out a thorough comparison between AFINN and RNTN. Our results highlight that although RNTN is much more complicated than AFINN, the performance of RNTN is not better than that of AFINN. To some extent, AFINN is more simple, more generic and takes less computation resources than RNTN in sentiment analysis.