Clustering for the automatic annotation of customer service chat messages
Derrar, Honain Mohib (2019)
Derrar, Honain Mohib
2019
Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2019-02-21
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201902011200
https://urn.fi/URN:NBN:fi:tty-201902011200
Tiivistelmä
The objective of this thesis work is to identify a clustering setting that provides human annotators with the support they need to perform topic annotation of customer service chat data.
Many of the customer service chat automation tools available involve the use of supervised machine learning techniques to learn how to answer a customer query based on historical conversations between a customer and a customer service agent. While this approach has provided satisfying results for many use cases, it still represents a challenge since annotation work incurs large costs.
In order to alleviate some of the challenges faced by the annotation team at ultimate.ai, we seek to provide a solution using clustering approaches that helps reduce the annotation workload by providing as many correct chat message annotations as possible automatically. At the same time, the approach needs to be easily usable and applicable to chat data from different languages and industries while not requiring immense computational resources.
The approach used in this work improves upon the previously used clustering baseline in the company and identifies a clustering evaluation metric that enables further internal research to continuously improve the clustering of customer service chat data. Finally, metric learning is explored in a effort to improve the obtained results even further.
Many of the customer service chat automation tools available involve the use of supervised machine learning techniques to learn how to answer a customer query based on historical conversations between a customer and a customer service agent. While this approach has provided satisfying results for many use cases, it still represents a challenge since annotation work incurs large costs.
In order to alleviate some of the challenges faced by the annotation team at ultimate.ai, we seek to provide a solution using clustering approaches that helps reduce the annotation workload by providing as many correct chat message annotations as possible automatically. At the same time, the approach needs to be easily usable and applicable to chat data from different languages and industries while not requiring immense computational resources.
The approach used in this work improves upon the previously used clustering baseline in the company and identifies a clustering evaluation metric that enables further internal research to continuously improve the clustering of customer service chat data. Finally, metric learning is explored in a effort to improve the obtained results even further.