Anomaly Detection in Time-Series Data: Analysis on scalability for data-intensive requirements and implementation for industrial scenarios
Chowdhury, Fariha (2024)
Chowdhury, Fariha
2024
Master's Programme in Computing Sciences
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-12-09
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2024112210412
https://urn.fi/URN:NBN:fi:tuni-2024112210412
Tiivistelmä
In today’s industrial landscape, machines and systems produce vast amounts of time-based data from sensors and monitoring tools. These time-based data are essential for smooth oper-ations and early problem detection to prevent costly disruptions. Real-time anomaly detection is critical in industrial environments for maintaining safety and preventing breakdowns. However, implementing real-time anomaly detection presents several challenges. Due to large-scale data, the system must be robust enough to handle the data storage and processing.
The main goal of this thesis is to explore the best practices to detect anomaly in time-series data. This research is extended to analyse the scalability requirements for a data-intensive industry scenario. Different machine learning models are trained and tested to find out the best one for the specific industry domain. Lastly, a system has been designed and implemented which is used to detect anomaly in real-time and Grafana has been used to notify the user when there is anomaly in the system.
The findings of this thesis are multi-directed. Machine learning models such as Isolation Forest and K-means algorithm have been determined as the best suitable algorithm for anomaly detection in a specific domain. The step-by-step approach – data collection, cleaning and preprocessing to train the models has also been described. The system design and implementation helped to create a tangible outcome of the whole process.
The main goal of this thesis is to explore the best practices to detect anomaly in time-series data. This research is extended to analyse the scalability requirements for a data-intensive industry scenario. Different machine learning models are trained and tested to find out the best one for the specific industry domain. Lastly, a system has been designed and implemented which is used to detect anomaly in real-time and Grafana has been used to notify the user when there is anomaly in the system.
The findings of this thesis are multi-directed. Machine learning models such as Isolation Forest and K-means algorithm have been determined as the best suitable algorithm for anomaly detection in a specific domain. The step-by-step approach – data collection, cleaning and preprocessing to train the models has also been described. The system design and implementation helped to create a tangible outcome of the whole process.