Smart Ticketing: Continuous learning system for document classification
Heinsuo, Leo (2020)
Heinsuo, Leo
2020
Tietotekniikan DI-tutkinto-ohjelma - Degree Programme in Information Technology, MSc (Tech)
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2020-08-05
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202005155363
https://urn.fi/URN:NBN:fi:tuni-202005155363
Tiivistelmä
The purpose of this thesis is to showcase how to build a continuous learning system for document classification. A project done at CGI NEAAS, which this thesis revolves around, aimed to create an automated classification cloud endpoint for predicting service ticket categories. The data used for predicting mostly consisted of human-typed text.
The theory and key concept behind Natural Language Processing, machine learning classification, cloud computing, and lifecycle management are first explained. Existing solution frameworks on cloud platforms are examined. A detailed solution architecture on the Azure cloud platform is proposed. The implementation process of the devised system based on pipeline automation is described in detail. Python is used as the programming language for the implementation.
The resulting system uses a combination of CI/CD pipelines called Azure Pipelines and a machine learning-specific pipeline called ML pipeline. The solution puts MLOps principles and practices into action, focusing on adding continuous training or CT functionality to set of pipelines, alongside CI and CD. This allows for new machine learning models to be automatically trained and deployed when ticket data changes and the model performance degrades.
The theory and key concept behind Natural Language Processing, machine learning classification, cloud computing, and lifecycle management are first explained. Existing solution frameworks on cloud platforms are examined. A detailed solution architecture on the Azure cloud platform is proposed. The implementation process of the devised system based on pipeline automation is described in detail. Python is used as the programming language for the implementation.
The resulting system uses a combination of CI/CD pipelines called Azure Pipelines and a machine learning-specific pipeline called ML pipeline. The solution puts MLOps principles and practices into action, focusing on adding continuous training or CT functionality to set of pipelines, alongside CI and CD. This allows for new machine learning models to be automatically trained and deployed when ticket data changes and the model performance degrades.