Transparent RDF Question Answering and Summarisation
Vilavuo (née Zimina), Elizaveta (2025)
Vilavuo (née Zimina), Elizaveta
Tampere University
2025
Informaation ja järjestelmien tohtoriohjelma - Doctoral Programme in Information and Systems
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Väitöspäivä
2025-05-20
Julkaisun pysyvä osoite on
https://urn.fi/URN:ISBN:978-952-03-3914-2
https://urn.fi/URN:ISBN:978-952-03-3914-2
Tiivistelmä
The rapid expansion of information on the Web resulted in the development of collections of structured data known as knowledge bases (KBs), often storing information in triples in the Resource Description Framework (RDF) format. Finding necessary pieces of data in these KBs usually requires knowledge of their structure and query languages, such as SPARQL, which can be challenging for an inexperienced user. The dissertation aims to facilitate users’ communication with KBs, specifically by developing systems that perform question answering (QA) and summarisation over large RDF KBs, such as DBpedia and Wikidata, using natural language (NL).
The research process included publication of four papers. The first one describes the development of GQA, a system exploiting the Grammatical Framework technology to convert complex NL questions into semantic grammar parses. MuGQA, a multilingual extension of GQA answering questions in English, German, French, and Italian, is the focus of the second publication. The third one is devoted to TraQuLA, the most advanced QA system developed, involving flexible pattern matching and allowing users to trace the system’s reasoning in transforming NL questions into SPARQL queries, thus ensuring transparency. The fourth publication tackles the challenge of generating human-readable summaries for multiple RDF entities, whereas prior research has been focused on summarising individual entities.
The dissertation contributes to the fields of QA over knowledge graphs and entity summarisation by creating transparent, multilingual, and user-accessible systems that bridge the gap between extensive knowledge bases and non-expert users. Testing demonstrated that a rule-based QA system (TraQuLA) can successfully compete with advanced machine learning techniques over popular QA datasets, while remaining easily interpretable for users and developers. While exploring the novel field of NL summarisation of multiple RDF entities, we designed an experimental framework with evaluation criteria to assess the quality of machine-generated summaries and their effectiveness in helping humans in writing their own summaries. Overall, the dissertation advances QA and summarisation in the field of RDF data, tackling both technical challenges and user-focused aspects to enhance the accessibility of structured KBs.
The research process included publication of four papers. The first one describes the development of GQA, a system exploiting the Grammatical Framework technology to convert complex NL questions into semantic grammar parses. MuGQA, a multilingual extension of GQA answering questions in English, German, French, and Italian, is the focus of the second publication. The third one is devoted to TraQuLA, the most advanced QA system developed, involving flexible pattern matching and allowing users to trace the system’s reasoning in transforming NL questions into SPARQL queries, thus ensuring transparency. The fourth publication tackles the challenge of generating human-readable summaries for multiple RDF entities, whereas prior research has been focused on summarising individual entities.
The dissertation contributes to the fields of QA over knowledge graphs and entity summarisation by creating transparent, multilingual, and user-accessible systems that bridge the gap between extensive knowledge bases and non-expert users. Testing demonstrated that a rule-based QA system (TraQuLA) can successfully compete with advanced machine learning techniques over popular QA datasets, while remaining easily interpretable for users and developers. While exploring the novel field of NL summarisation of multiple RDF entities, we designed an experimental framework with evaluation criteria to assess the quality of machine-generated summaries and their effectiveness in helping humans in writing their own summaries. Overall, the dissertation advances QA and summarisation in the field of RDF data, tackling both technical challenges and user-focused aspects to enhance the accessibility of structured KBs.
Kokoelmat
- Väitöskirjat [5009]