Hyppää sisältöön
    • Suomeksi
    • In English
Trepo
  • Suomeksi
  • In English
  • Kirjaudu
Näytä viite 
  •   Etusivu
  • Trepo
  • Opinnäytteet - ylempi korkeakoulututkinto
  • Näytä viite
  •   Etusivu
  • Trepo
  • Opinnäytteet - ylempi korkeakoulututkinto
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

Managing Missing Data in Data Integration

Jokipii, Mervi (2023)

 
Avaa tiedosto
JokipiiMervi.pdf (1.255Mt)
Lataukset: 



Jokipii, Mervi
2023

Tietojenkäsittelyopin maisteriohjelma - Master's Programme in Computer Science
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2023-05-28
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202305155792
Tiivistelmä
The amount of data in the world is constantly growing at an enormous pace, especially with the expansion of the internet. Data is stored in different formats in various source systems. The goal of data integration is to provide users with unified access to heterogeneous and independent data without requiring them to understand the logic of the source systems. Users can submit queries on the mediated schema that interprets them to the source systems. The data in integration is rarely complete: it may contain incorrect or completely missing values. These missing data can be managed and enriched using various methods.

The literature review of this thesis explores data integration and its challenges, as well as the missing data mechanisms and strategies for dealing with missing data. The experimental section of this work analyses these strategies in the context of online automotive dealerships. Cars are increasingly being purchased directly from the internet or at least using the internet as a strong support in the purchasing process. Incomplete car data can lead to issues such as the car not appearing in potential buyers' search results, even resulting in the car not being sold.

The results of this work show that finding a similar car from a dataset is crucial in managing missing car data, which is not always straightforward. String matching -method is an essential part of finding a similar car, but it doesn't always give a perfectly accurate result. For this reason, the work presents a model for managing missing car data, where string matching is used only when necessary. According to the model, string matching can also be strengthened by comparing other values belonging to the same attribute group. External sources, such as pre-existing com- mercial databases or a company's self-built database, should also be used, if needed, to find the similar car.
Kokoelmat
  • Opinnäytteet - ylempi korkeakoulututkinto [41685]
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste
 

 

Selaa kokoelmaa

TekijätNimekkeetTiedekunta (2019 -)Tiedekunta (- 2018)Tutkinto-ohjelmat ja opintosuunnatAvainsanatJulkaisuajatKokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste