TREATS : Fairness-aware entity resolution over streaming data
Brasileiro Araújo, Tiago; Efthymiou, Vasilis; Christophides, Vassilis; Pitoura, Evaggelia; Stefanidis, Kostas (2025-03)
Brasileiro Araújo, Tiago
Efthymiou, Vasilis
Christophides, Vassilis
Pitoura, Evaggelia
Stefanidis, Kostas
03 / 2025
102506
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202501101278
https://urn.fi/URN:NBN:fi:tuni-202501101278
Kuvaus
Peer reviewed
Tiivistelmä
Currently, the growing proliferation of information systems generates large volumes of data continuously, stemming from a variety of sources such as web platforms, social networks, and multiple devices. These data, often lacking a defined schema, require an initial process of consolidation and cleansing before analysis and knowledge extraction can occur. In this context, Entity Resolution (ER) plays a crucial role, facilitating the integration of knowledge bases and identifying similarities among entities from different sources. However, the traditional ER process is computationally expensive, and becomes more complicated in the streaming context where the data arrive continuously. Moreover, there is a lack of studies involving fairness and ER, which is related to the absence of discrimination or bias. In this sense, fairness criteria aim to mitigate the implications of data bias in ER systems, which requires more than just optimizing accuracy, as traditionally done. Considering this context, this work presents TREATS, a schema-agnostic and fairness-aware ER workflow developed for managing streaming data incrementally. The proposed fairness-aware ER framework tackles constraints across various groups of interest, presenting a resilient and equitable solution to the related challenges. Through experimental evaluation, the proposed techniques and heuristics are compared against state-of-the-art approaches over five real-world data source pairs, in which the results demonstrated significant improvements in terms of fairness, without degradation of effectiveness and efficiency measures in the streaming environment. In summary, our contributions aim to propel the ER field forward by providing a workflow that addresses both technical challenges and ethical concerns.
Kokoelmat
- TUNICRIS-julkaisut [19716]