Techniques to Measure User Behavior on Tor Onion Services : Using ethical measurement methods with a honeypot case study
Abdullah, Waris (2025)
Abdullah, Waris
2025
Tietotekniikan DI-ohjelma - Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
Hyväksymispäivämäärä
2025-12-13
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-2025120211187
https://urn.fi/URN:NBN:fi:tuni-2025120211187
Tiivistelmä
Tor onion services are anonymous and often host sensitive content, which makes them difficult to study. Consequently, researchers struggle to understand user behavior on onion services and how to safely measure it. To address measurement techniques, I review experimental studies from 2015 to 2025 and group their methods into five main categories: content- and discovery-based methods, user-reported methods, network-level methods, interaction-based methods, and incident- or leak-based methods. For each type, I describe the behaviors it measures, along with its strengths, and limitations. This thesis provides a simple framework for choosing and combining methods.
Additionally, I present a honeypot case study that I co-authored. We deployed fake onion services across different categories, languages, and discovery channels. These onion services looked like illicit forums but contained only legal decoy content and stored no personal data. We logged page visits, Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) solves, and login/registration attempts. The results show that users discover sites mainly through the Ahmia search engine, while paste services generate mostly bot traffic. Pages that appear to host Child Sexual Abuse Material (CSAM) attract the most interaction, and English pages receive significantly more engagement than other languages. I conclude by offering practical guidelines for ethical measurement and demonstrating that researchers must combine multiple methods to fully understand Tor user behavior.
Additionally, I present a honeypot case study that I co-authored. We deployed fake onion services across different categories, languages, and discovery channels. These onion services looked like illicit forums but contained only legal decoy content and stored no personal data. We logged page visits, Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) solves, and login/registration attempts. The results show that users discover sites mainly through the Ahmia search engine, while paste services generate mostly bot traffic. Pages that appear to host Child Sexual Abuse Material (CSAM) attract the most interaction, and English pages receive significantly more engagement than other languages. I conclude by offering practical guidelines for ethical measurement and demonstrating that researchers must combine multiple methods to fully understand Tor user behavior.
