Small Knife Detection in Visual Surveillance
Entok, Umut Cem (2023)
Entok, Umut Cem
2023
Master's Programme in Computing Sciences
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2023-12-01
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202311089489
https://urn.fi/URN:NBN:fi:tuni-202311089489
Tiivistelmä
Object detectors have played an important role in detection problems in recent years. Detection of illegal objects present in real-time surveillance cameras can be accomplished with the help of object detectors instead of human monitoring of surveillance cameras. This approach can automatize surveillance systems and help humans decrease false decisions thanks to the accurate models developed.
The main objectives of this thesis are to collect and annotate images and videos from surveillance cameras in which small-sized knives are present and to train a state-of-the-art (SOTA) object detection model to detect illegality in real-time. To obtain datasets, videos are collected from a surveillance camera available in the SURE project. Then, the frames of the videos are collected and hands and knives are manually annotated. After that, a hand detector is trained with annotated images. Following this, the hand detection model is further trained with knife images so that the detector can focus on detecting knives held by hands. YOLOv5 detector is chosen because it provides high accuracy and real-time speed to meet the project requirements.
Since the camera module has a wide-range Field-of-view (FoV), objects presented in the monitored frames from the camera are relatively small compared to the frame size. This makes the detection of an object challenging. To overcome this, several methods are proposed such as tiling of images so that the object-to-image ratio can be increased. At the same time, the number of pixels representing the object is preserved. Other methods are to use high input resolution, combined datasets to increase available data, and transfer learning methodology to extract knowledge from similar problems such as hand detection to illegal object detection tasks. The results show that image tiling and using high input resolution can improve the detection accuracy of an object detection model. On the other hand, combining similar datasets and using transfer learning might not provide an improvement due to not having a sufficient amount of input images to apply these approaches.
The main objectives of this thesis are to collect and annotate images and videos from surveillance cameras in which small-sized knives are present and to train a state-of-the-art (SOTA) object detection model to detect illegality in real-time. To obtain datasets, videos are collected from a surveillance camera available in the SURE project. Then, the frames of the videos are collected and hands and knives are manually annotated. After that, a hand detector is trained with annotated images. Following this, the hand detection model is further trained with knife images so that the detector can focus on detecting knives held by hands. YOLOv5 detector is chosen because it provides high accuracy and real-time speed to meet the project requirements.
Since the camera module has a wide-range Field-of-view (FoV), objects presented in the monitored frames from the camera are relatively small compared to the frame size. This makes the detection of an object challenging. To overcome this, several methods are proposed such as tiling of images so that the object-to-image ratio can be increased. At the same time, the number of pixels representing the object is preserved. Other methods are to use high input resolution, combined datasets to increase available data, and transfer learning methodology to extract knowledge from similar problems such as hand detection to illegal object detection tasks. The results show that image tiling and using high input resolution can improve the detection accuracy of an object detection model. On the other hand, combining similar datasets and using transfer learning might not provide an improvement due to not having a sufficient amount of input images to apply these approaches.