Video-Based Scene Classification Using Pretrained Neural Networks
Kallio, Veera (2021)
Kallio, Veera
2021
Tieto- ja sähkötekniikan kandidaattiohjelma - Bachelor's Programme in Computing and Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2021-09-20
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202108176578
https://urn.fi/URN:NBN:fi:tuni-202108176578
Tiivistelmä
Neural network is a widely used machine learning method where a computer analyzes data and learns from it. There are many neural networks for data classification, and using existing models makes the classification faster and easier. This thesis investigates two well-known pretrained neural networks and their performance in video scene classification. The objective of this study is to find out how different models succeed in the scene classification task and whether there are differences between them, and to analyze which visual features are effective in classification.
The theoretical part of the study explains the principles of neural networks and in particular the convolutional neural networks, as well as introduces the neural networks used in the project, VGGNet and ResNet (Residual Neural Network). In addition, previous studies of the classification topic are discussed. It was observed that even though there is a lot of research on classification, video scene classification to event-type scenes specifically has been studied quite little and comparisons of different neural network models for this purpose were not found. In the experimental part, VGGNet and ResNet are used to classify video material that contains urban scenes from big European cities. Some of the material is used in training of the models and some in testing, so that the final performance can be evaluated. In addition, suitable values for the parameters of the models are searched in order to get the most optimal classification result.
The study shows that both VGGNet and ResNet perform well in the video scene classification, and there are no big differences between them. ResNet succeeded a little better in the task with 82.2 % accuracy, while the accuracy of VGGNet was 79.6 %. The results are consistent with previous studies. The most difficulties the neural networks had with telling apart very similar scenes, such as inside of tram and bus. Because of different features, the accuracies of different scene classes could vary even 40 percentage points. It can be concluded on the basis of the thesis, that pretrained neural networks can be utilized in video scene classification, and it is likely that also the results of other models than those used in this study would not differ from each other very much. However, the selection of an appropriate model slightly affects the accuracy of the classification, as well as the speed, and additionally the accuracy can be improved with parameter optimization.
The theoretical part of the study explains the principles of neural networks and in particular the convolutional neural networks, as well as introduces the neural networks used in the project, VGGNet and ResNet (Residual Neural Network). In addition, previous studies of the classification topic are discussed. It was observed that even though there is a lot of research on classification, video scene classification to event-type scenes specifically has been studied quite little and comparisons of different neural network models for this purpose were not found. In the experimental part, VGGNet and ResNet are used to classify video material that contains urban scenes from big European cities. Some of the material is used in training of the models and some in testing, so that the final performance can be evaluated. In addition, suitable values for the parameters of the models are searched in order to get the most optimal classification result.
The study shows that both VGGNet and ResNet perform well in the video scene classification, and there are no big differences between them. ResNet succeeded a little better in the task with 82.2 % accuracy, while the accuracy of VGGNet was 79.6 %. The results are consistent with previous studies. The most difficulties the neural networks had with telling apart very similar scenes, such as inside of tram and bus. Because of different features, the accuracies of different scene classes could vary even 40 percentage points. It can be concluded on the basis of the thesis, that pretrained neural networks can be utilized in video scene classification, and it is likely that also the results of other models than those used in this study would not differ from each other very much. However, the selection of an appropriate model slightly affects the accuracy of the classification, as well as the speed, and additionally the accuracy can be improved with parameter optimization.
Kokoelmat
- Kandidaatintutkielmat [8997]