Improving data quality in data warehouse
Äikäs, Pasi (2018)
Äikäs, Pasi
2018
Tietojohtaminen
Talouden ja rakentamisen tiedekunta - Faculty of Business and Built Environment
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2018-12-05
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201811142577
https://urn.fi/URN:NBN:fi:tty-201811142577
Tiivistelmä
Data quality plays a critical role in today’s organizations. Data is the basis of decisions and bad quality data can have costly and harmful effects to organization’s operations and reputation. A technical way to improve data quality is by data matching. Data matching is a sub process of data integration in which data referring to same real-world entity are mapped and consolidated.
The goal of this thesis was to define how data quality can be improved in data warehouse environment. The identified process was applied to case organization’s data warehouse containing data related to transportation.
The research consists of a theoretical part and an empirical part. The theoretical part contains literature review on data quality from different aspects, basic data warehouse environment, and, finally, a definition for data matching process. The theoretical part gave basis for the empirical part in which the data matching process was applied and documented, and the results were analyzed.
As a result, the data matching process to improve data quality in case organization was defined. The effectiveness of the applied process as a whole and its steps was proved.
The goal of this thesis was to define how data quality can be improved in data warehouse environment. The identified process was applied to case organization’s data warehouse containing data related to transportation.
The research consists of a theoretical part and an empirical part. The theoretical part contains literature review on data quality from different aspects, basic data warehouse environment, and, finally, a definition for data matching process. The theoretical part gave basis for the empirical part in which the data matching process was applied and documented, and the results were analyzed.
As a result, the data matching process to improve data quality in case organization was defined. The effectiveness of the applied process as a whole and its steps was proved.