Hyppää sisältöön
    • Suomeksi
    • In English
Trepo
  • Suomeksi
  • In English
  • Kirjaudu
Näytä viite 
  •   Etusivu
  • Trepo
  • Opinnäytteet - ylempi korkeakoulututkinto
  • Näytä viite
  •   Etusivu
  • Trepo
  • Opinnäytteet - ylempi korkeakoulututkinto
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

Enhancing Automatic Invoice Coding Performance with Unstructured Data : A feature extraction approach

Eerola, Teemu (2024)

 
Avaa tiedosto
EerolaTeemu.pdf (1.051Mt)
Lataukset: 



Eerola, Teemu
2024

Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-04-30
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202404053320
Tiivistelmä
As a part of invoice processing, companies are documenting their spend to cope with financial audits, tax regulations, and their spend management purposes in a process called invoice coding. This process involves selecting coding values from a predefined list for each coding dimension describing the different aspects of spend. However, the manual coding process can be time-consuming and the traditional invoice automation processes still fail to automate coding completely.

The invoice coding problem can be formulated as a supervised classification task and a machine learning model can be trained on historical invoices to predict coding values for new invoices. This study explores how to incorporate the textual data of the invoice to improve the coding predictions. This data has not been previously used in the baseline system, because the lack of standardization in invoices makes it difficult to capture. This work studies the feasible feature extraction methods and assesses the performance against the baseline result that uses only invoice header data.

Experiments are conducted across datasets from seven companies with varying invoice origins, processes, and languages. Different coding dimensions bring diverse aspects of coding to experiments like capturing related words or semantic meaning. The results showed that incorporating textual data can improve accuracy by up to 21 percentage points over the baseline. The most robust performance improvement was obtained by latent semantic analysis features.

This thesis provides a practical pathway for implementing these enhancements in production and lays the groundwork for future advancements leveraging the textual data of invoice content.
Kokoelmat
  • Opinnäytteet - ylempi korkeakoulututkinto [42164]
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste
 

 

Selaa kokoelmaa

TekijätNimekkeetTiedekunta (2019 -)Tiedekunta (- 2018)Tutkinto-ohjelmat ja opintosuunnatAvainsanatJulkaisuajatKokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste