Phishing Without Borders : Utilization of Large Language Models for Phishing in Finnish
Päärni, Oskari (2024)
Päärni, Oskari
2024
Tietojenkäsittelyopin maisteriohjelma - Master's Programme in Computer Science
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-05-22
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202404305002
https://urn.fi/URN:NBN:fi:tuni-202404305002
Tiivistelmä
Phishing is a persistent and growing threat to society in the cyber security field, with both organizations and individuals alike falling victim to phishing scams around the world. At the same time, technological advances in artificial intelligence fuel innovation in numerous fields, the GPT models at the forefront of this development. This advancement also gives powerful tools to scammers and other cybercriminals.
The aim of this thesis was to evaluate the capabilities of OpenAI’s GPT models in generating convincing phishing messages in Finnish, as well as explore its capabilities in recognizing and evaluating phishing messages. This was achieved by creating Python software tools for each application, and performing a questionnaire to evaluate how convincing the generated messages were compared to real-world, malicious phishing messages as well as legitimate, benign messages sent by trusted actors. These same messages were given to GPT-4, and the model was prompted to evaluate them similarly to the questionnaire respondents to gauge its capabilities in phishing message recognition and evaluation.
The results of the questionnaire showed that while all phishing messages, generated or not, were considered significantly less trustworthy than legitimate messages by questionnaire respondents, the messages generated for the questionnaire performed worse than actual phishing messages. As for message recognition, GPT-4 was capable of recognizing messages based on images and evaluating whether to trust them based on the content, even making same observations as questionnaire respondents. On the other hand, the results were somewhat inconsistent over multiple iterations, and the model had some trouble recognizing the context for some messages.
The aim of this thesis was to evaluate the capabilities of OpenAI’s GPT models in generating convincing phishing messages in Finnish, as well as explore its capabilities in recognizing and evaluating phishing messages. This was achieved by creating Python software tools for each application, and performing a questionnaire to evaluate how convincing the generated messages were compared to real-world, malicious phishing messages as well as legitimate, benign messages sent by trusted actors. These same messages were given to GPT-4, and the model was prompted to evaluate them similarly to the questionnaire respondents to gauge its capabilities in phishing message recognition and evaluation.
The results of the questionnaire showed that while all phishing messages, generated or not, were considered significantly less trustworthy than legitimate messages by questionnaire respondents, the messages generated for the questionnaire performed worse than actual phishing messages. As for message recognition, GPT-4 was capable of recognizing messages based on images and evaluating whether to trust them based on the content, even making same observations as questionnaire respondents. On the other hand, the results were somewhat inconsistent over multiple iterations, and the model had some trouble recognizing the context for some messages.