Monitoring of real-time data pipelines
Lehtinen, Saku (2026)
Lehtinen, Saku
2026
Tietotekniikan DI-ohjelma - Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
Hyväksymispäivämäärä
2026-01-11
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2025123112285
https://urn.fi/URN:NBN:fi:tuni-2025123112285
Tiivistelmä
Real-time data pipelines are a central part of modern analytics systems, as organizations use continuously updated data for operational visibility, timely decision-making, and downstream analytics or machine-learning processes. Failures in these pipelines can lead to delayed insights, inaccurate analytics, or service outages. Because these pipelines operate continuously and at scale, they require effective monitoring to ensure that data flows correctly and that emerging issues are detected early. Without proper monitoring, failures may go unnoticed until they affect downstream systems or end users.
This thesis examines a software system used in a large Finnish IT organization that processes real-time 4G and 5G analytics data through multiple distributed data pipelines. Although the system included monitoring for individual components, it lacked a solution that provides an end-to-end view of how data moves through the pipeline
The purpose of this thesis is to address that gap by analysing the requirements for monitoring real-time data pipelines and by developing a prototype solution that offers clear visibility into the flow of data across the system. The work first identifies functional requirements by examining existing monitoring practices, user needs, and operational constraints. It then designs and implements a prototype monitoring solution that provides an end-to-end overview of pipeline activity.
The evaluation shows that the solution meets all mandatory requirements and several optional ones. It provides a clear and practical overview of data flow across the system, supports verification after installations and upgrades, and offers an effective starting point for trouble-shooting. While the solution does not yet provide fully modelled pipeline structures or include data-quality monitoring, it significantly improves operational visibility in a distributed real-time environment.
Overall, the thesis demonstrates that a dedicated dashboard providing an overview of pipeline status offers clear operational value in a distributed real-time environment. The solution helps users verify data flow, detect issues, and understand system behaviour more easily than with service-level monitoring alone.
This thesis examines a software system used in a large Finnish IT organization that processes real-time 4G and 5G analytics data through multiple distributed data pipelines. Although the system included monitoring for individual components, it lacked a solution that provides an end-to-end view of how data moves through the pipeline
The purpose of this thesis is to address that gap by analysing the requirements for monitoring real-time data pipelines and by developing a prototype solution that offers clear visibility into the flow of data across the system. The work first identifies functional requirements by examining existing monitoring practices, user needs, and operational constraints. It then designs and implements a prototype monitoring solution that provides an end-to-end overview of pipeline activity.
The evaluation shows that the solution meets all mandatory requirements and several optional ones. It provides a clear and practical overview of data flow across the system, supports verification after installations and upgrades, and offers an effective starting point for trouble-shooting. While the solution does not yet provide fully modelled pipeline structures or include data-quality monitoring, it significantly improves operational visibility in a distributed real-time environment.
Overall, the thesis demonstrates that a dedicated dashboard providing an overview of pipeline status offers clear operational value in a distributed real-time environment. The solution helps users verify data flow, detect issues, and understand system behaviour more easily than with service-level monitoring alone.
