Hyppää sisältöön
    • Suomeksi
    • In English
Trepo
  • Suomeksi
  • In English
  • Kirjaudu
Näytä viite 
  •   Etusivu
  • Trepo
  • Opinnäytteet - ylempi korkeakoulututkinto
  • Näytä viite
  •   Etusivu
  • Trepo
  • Opinnäytteet - ylempi korkeakoulututkinto
  • Näytä viite
JavaScript is disabled for your browser. Some features of this site may not work without it.

A Re-examination of chatbot evaluation metrics

Duong, Kien (2022)

 
Avaa tiedosto
DuongKien.pdf (1.047Mt)
Lataukset: 



Duong, Kien
2022

Master's Programme in Computing Sciences
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2022-05-24
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202205064457
Tiivistelmä
One of the most important and challenging parts of developing a chatbot is its evaluation. Judging a conversation depends on the number of complex elements. The objective of the thesis is to understand the characteristics of two types of automated metrics: trained-metric and untrained-metric, and identify the most suitable metrics for dialog evaluation. Moreover, experiments have been conducted to study the weaknesses of word-overlap metrics in morphology-rich language and solutions for that problem.

In particular, six evaluation metrics including Kullback–Leibler divergence, Coherence, BLEU, Embedding, Entropy, and MaUde were used for the experiment. In addition, three datasets for two different languages (English, and Finnish) are collected to study whether or not languages can influence the quality of the metrics. The metrics are requested to discriminate between the qualified answers and the unqualified answers. The incorrect answers are generated by randomly sampling sentences, which are not relevant to the context in the database. The obtained results indicate that BLEU for 1-gram and greedy-matching are the two most appropriate options for chatbot evaluation. One solution is found to solve the problem related to morphology-rich language. The efficiency of BLEU in Finnish can be boosted by segmenting words into sub-words or morphemes.
Kokoelmat
  • Opinnäytteet - ylempi korkeakoulututkinto [41307]
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste
 

 

Selaa kokoelmaa

TekijätNimekkeetTiedekunta (2019 -)Tiedekunta (- 2018)Tutkinto-ohjelmat ja opintosuunnatAvainsanatJulkaisuajatKokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy
Kalevantie 5
PL 617
33014 Tampereen yliopisto
oa[@]tuni.fi | Tietosuoja | Saavutettavuusseloste