Sentiment classification with deep neural networks
Zhou, Yi (2019)
Zhou, Yi
2019
Tietojenkäsittelytieteiden tutkinto-ohjelma
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2019-05-27
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-201907172658
https://urn.fi/URN:NBN:fi:tuni-201907172658
Tiivistelmä
Sentiment classification is an important task in Natural Language Processing (NLP) area. Deep neural networks become the mainstream method to perform the text sentiment classification nowadays. In this thesis two datasets are used. The first dataset is a hotel review dataset(TripAdvisor dataset) that collects the hotel reviews from the TripAdvisor website using Python Scrapy framework. The preprocessing steps are then applied to clean the dataset. A record in the TripAdvisor dataset consists of the text review and corresponding sentiment score. There are 5 sentimental labels: very negative, negative, neutral, positive, and very positive. The second dataset is the Stanford Sentiment Treebank (SST) dataset. It is a public and common dataset for sentiment classification.
Text Convolutional Neural Network (Text-CNN), Very Deep Convolutional Neural Network (VDCNN), and Bidirectional Long Short Term Memory neural network (BiLSTM) were chosen as different methods for the evaluation in the experiments. The Text-CNN was the first work to apply convolutional neural network architecture for the text classification. The VD-CNN applied deep convolutional layers, with up to 29 layers, to perform the text classification. The BiLSTM exploited the bidirectional recurrent neural network with long short term memory cell mechanism. On the other hand, word embedding techniques are also considered as an important factor in sentiment classification. Thus, in this thesis, GloVe and FastText techniques were used to investigate the effect of word embedding initialization on the dataset. GloVe is a unsupervised word embedding learning algorithm. FastText uses shallow neural network to generate word vectors and it has fast convergence speed for training and high speed for inference.
The experiment was implemented using PyTorch framework. It shows that the BiLSTM with GloVe as the word vector initialization achieved the highest accuracy 73.73% while the VD-CNN with FastText had the lowest accuracy 71.95% on the TripAdvisor dataset. The BiLSTM model achieved 0.68 F1-score while the VD-CNN model obtained 0.67 F1-score on the TripAdvisor dataset. On the SST dataset, BiLSTM with GloVe again achieved the highest accuracy 36.35% and 0.35 F1-score. The VD-CNN model with GloVe had the worst evaluation result in terms of accuracy and F1-score. The Text-CNN model performed better than the VD-CNN model even thought the VD-CNN model has more layers in most cases.
By analyzing the misclassified reviews in the TripAdvisor dataset from the three deep neural networks, it is shown that the hotel reviews with more contradictory sentimental words were more prone to misclassification than other hotel reviews.
Text Convolutional Neural Network (Text-CNN), Very Deep Convolutional Neural Network (VDCNN), and Bidirectional Long Short Term Memory neural network (BiLSTM) were chosen as different methods for the evaluation in the experiments. The Text-CNN was the first work to apply convolutional neural network architecture for the text classification. The VD-CNN applied deep convolutional layers, with up to 29 layers, to perform the text classification. The BiLSTM exploited the bidirectional recurrent neural network with long short term memory cell mechanism. On the other hand, word embedding techniques are also considered as an important factor in sentiment classification. Thus, in this thesis, GloVe and FastText techniques were used to investigate the effect of word embedding initialization on the dataset. GloVe is a unsupervised word embedding learning algorithm. FastText uses shallow neural network to generate word vectors and it has fast convergence speed for training and high speed for inference.
The experiment was implemented using PyTorch framework. It shows that the BiLSTM with GloVe as the word vector initialization achieved the highest accuracy 73.73% while the VD-CNN with FastText had the lowest accuracy 71.95% on the TripAdvisor dataset. The BiLSTM model achieved 0.68 F1-score while the VD-CNN model obtained 0.67 F1-score on the TripAdvisor dataset. On the SST dataset, BiLSTM with GloVe again achieved the highest accuracy 36.35% and 0.35 F1-score. The VD-CNN model with GloVe had the worst evaluation result in terms of accuracy and F1-score. The Text-CNN model performed better than the VD-CNN model even thought the VD-CNN model has more layers in most cases.
By analyzing the misclassified reviews in the TripAdvisor dataset from the three deep neural networks, it is shown that the hotel reviews with more contradictory sentimental words were more prone to misclassification than other hotel reviews.