Recognition of phishing attacks utilizing anomalies in phishing websites
CHAUDHARY, SUNIL (2012)
CHAUDHARY, SUNIL
2012
Tietojenkäsittelyoppi - Computer Science
Informaatiotieteiden yksikkö - School of Information Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2012-12-05
Julkaisun pysyvä osoite on
https://urn.fi/urn:nbn:fi:uta-1-23156
https://urn.fi/urn:nbn:fi:uta-1-23156
Tiivistelmä
The fight against phishing has resulted in several anticipating phishing prevention techniques. However, they are only partially able to address the phishing problem. There are still a large number of Internet users who are tricked to disclose their personal information to fake websites every day. This might be because existing phishing prevention techniques are either not foolproof or they are unable to deal with the emerging changes in phishing.
The main purpose of this thesis is to identify anomalies that can be found in the Uniform Resource Locators (URLs) and source codes of phishing websites and determine an efficient way to employ those anomalies for phishing detection. In order to do that, I performed the meta-analysis of several existing phishing prevention techniques, specifically heuristic methods. Then, I selected forty-one anomalies, which can be found in the URLs and sources codes of phishing websites and are also mentioned or utilized by the past studies. This is followed by the verification of those anomalies using an experiment conducted on twenty online phishing websites. The study revealed that some anomalies, which were once significant for phishing detection, are no longer included in present day phishing websites, and several anomalies are also widely present in legitimate websites. Such ambiguous anomalies need further analysis to determine their significance in phishing detection. Moreover, it was also found that several heuristic methods use an insufficient set of anomalies which introduces inaccuracy in their results. Finally, in order to design an efficient heuristic method employing anomalies that can be found in URLs and source codes of phishing websites, it is suggested to give due priority to the anomalies that are: difficult for phishers to bypass, only found in phishing websites, seriously harmful, independent of other anomalies, and do not consume a lot of time for evaluation.
Asiasanat:phishing, phishing prevention, URL, DOM objects, whitelist, blacklist, heuristics, meta-analysis, software quality
The main purpose of this thesis is to identify anomalies that can be found in the Uniform Resource Locators (URLs) and source codes of phishing websites and determine an efficient way to employ those anomalies for phishing detection. In order to do that, I performed the meta-analysis of several existing phishing prevention techniques, specifically heuristic methods. Then, I selected forty-one anomalies, which can be found in the URLs and sources codes of phishing websites and are also mentioned or utilized by the past studies. This is followed by the verification of those anomalies using an experiment conducted on twenty online phishing websites. The study revealed that some anomalies, which were once significant for phishing detection, are no longer included in present day phishing websites, and several anomalies are also widely present in legitimate websites. Such ambiguous anomalies need further analysis to determine their significance in phishing detection. Moreover, it was also found that several heuristic methods use an insufficient set of anomalies which introduces inaccuracy in their results. Finally, in order to design an efficient heuristic method employing anomalies that can be found in URLs and source codes of phishing websites, it is suggested to give due priority to the anomalies that are: difficult for phishers to bypass, only found in phishing websites, seriously harmful, independent of other anomalies, and do not consume a lot of time for evaluation.
Asiasanat:phishing, phishing prevention, URL, DOM objects, whitelist, blacklist, heuristics, meta-analysis, software quality