A Query Assistant to Resolve Data Semantics Misunderstandings
Mummaadi, Raane (2024)
Mummaadi, Raane
2024
Master's Programme in Computing Sciences and Electrical Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-12-10
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2024112610507
https://urn.fi/URN:NBN:fi:tuni-2024112610507
Tiivistelmä
This thesis proposes the development of a Query Assistant that helps users avoid semantic misunderstandings when querying datasets in which numerical attributes have semantics that limit participation in aggregation or arithmetic expressions. The project assumes that the original data curator had defined rules that make explicit the relevant properties of the data, which we shall refer to, collectively, as the semantic limitations These are (1) units of measurement, (2) properties of numerical scales according to measurement theory and (3) application-specific limitations on the data. Stating these semantic limitations explicitly can help the Query Assistant to detect and explain potential semantic errors in user queries (expressed in either SQL or natural language) to the Dataset User.
In this dissertation the Query Assistant has been trained and evaluated with a real-world dataset on weather data, where all three of our categories of semantic limitations play key roles in figuring out how data should be treated when it comes to operations like aggregation. Fine-tuning the Large Language Model GPT-4 with rule-based reasoning and prompting techniques, the resulting Query Assistant can improve the accuracy and reliability of natural language interactions with the data. This helps the Query Assistant in capturing nuances of different measurement scales and, therefore, capturing an appropriate aggregation function or arithmetic operation according to nominal, ordinal, interval, and ratio scales.
The aim of this research is to enhance the user's experience in querying a dataset, reducing the possibility of semantic errors that could trigger incomprehensible responses displaying misleading data. The Query Assistant will allow users to make more specific queries by giving immediate feedback on potential errors and providing an explanation for those errors, thus allowing users to arrive at more trustworthy and meaningful data analyses. This makes not only for improved usability when dealing with complex data sets but also for better decision-making procedures when fields heavily depend on accurate data interpretation.
In this dissertation the Query Assistant has been trained and evaluated with a real-world dataset on weather data, where all three of our categories of semantic limitations play key roles in figuring out how data should be treated when it comes to operations like aggregation. Fine-tuning the Large Language Model GPT-4 with rule-based reasoning and prompting techniques, the resulting Query Assistant can improve the accuracy and reliability of natural language interactions with the data. This helps the Query Assistant in capturing nuances of different measurement scales and, therefore, capturing an appropriate aggregation function or arithmetic operation according to nominal, ordinal, interval, and ratio scales.
The aim of this research is to enhance the user's experience in querying a dataset, reducing the possibility of semantic errors that could trigger incomprehensible responses displaying misleading data. The Query Assistant will allow users to make more specific queries by giving immediate feedback on potential errors and providing an explanation for those errors, thus allowing users to arrive at more trustworthy and meaningful data analyses. This makes not only for improved usability when dealing with complex data sets but also for better decision-making procedures when fields heavily depend on accurate data interpretation.