Analysing and Interpreting Web Traffic using HTTP Access Logs : Case Study of a Managed WordPress Hosting Environment
Järvilehto, Joel (2025)
Järvilehto, Joel
2025
Tietotekniikan DI-ohjelma - Master's Programme in Information Technology
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2025-07-31
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202507307925
https://urn.fi/URN:NBN:fi:tuni-202507307925
Tiivistelmä
This thesis presents a security-focused analysis of HTTP access logs collected from WordPress-hosted websites across several server clusters and geographic locations. The study explores how user-agent strings can serve as a key signal for distinguishing between legitimate human traffic and automated or malicious activity. A wide spectrum of spoofed, outdated, or deceptive user-agents is identified, many of which are deliberately crafted to bypass filters or impersonate legitimate browsers and crawlers.
While user-agent analysis is central to the work, the thesis also incorporates the detection of suspicious request patterns indicative of common attack behaviors in WordPress environments. These include brute-force login attempts, XML-RPC exploitation, SQL injection, cross-site scripting, plugin and theme vulnerability scans, and obfuscated malware payloads. Notably, many attack requests combine multiple techniques, highlighting the modular and automated nature of modern web-based threats.
The results confirm that automated threats are not only persistent but are becoming more sophisticated in their evasion and targeting strategies. Simple heuristics are often insufficient to filter out malicious traffic, particularly when requests appear superficially benign. To mitigate these challenges, the thesis proposes future directions for improving detection and classification of HTTP traffic, including verification of user-agents, the use of machine learning for anomaly detection, and scalable log-processing pipelines capable of analyzing large volumes of access logs across longer timeframes. Additionally, reinforcing server-side configurations—such as disabling PHP execution in certain directories and restricting access to sensitive files—remains essential for mitigating common attack vectors and strengthening overall security.
This research provides actionable insights into the evolving security landscape of WordPress hosting, and lays some groundwork for improved tools and defenses to protect against automated exploitation at scale.
While user-agent analysis is central to the work, the thesis also incorporates the detection of suspicious request patterns indicative of common attack behaviors in WordPress environments. These include brute-force login attempts, XML-RPC exploitation, SQL injection, cross-site scripting, plugin and theme vulnerability scans, and obfuscated malware payloads. Notably, many attack requests combine multiple techniques, highlighting the modular and automated nature of modern web-based threats.
The results confirm that automated threats are not only persistent but are becoming more sophisticated in their evasion and targeting strategies. Simple heuristics are often insufficient to filter out malicious traffic, particularly when requests appear superficially benign. To mitigate these challenges, the thesis proposes future directions for improving detection and classification of HTTP traffic, including verification of user-agents, the use of machine learning for anomaly detection, and scalable log-processing pipelines capable of analyzing large volumes of access logs across longer timeframes. Additionally, reinforcing server-side configurations—such as disabling PHP execution in certain directories and restricting access to sensitive files—remains essential for mitigating common attack vectors and strengthening overall security.
This research provides actionable insights into the evolving security landscape of WordPress hosting, and lays some groundwork for improved tools and defenses to protect against automated exploitation at scale.