User Behavior Analysis using Browser History and to Support Forensic Investigation

Authors: Neetha Natesh, Ajay D, Hruthik R, Pavan Kumar B M, Poorvika V

DOI Link: https://doi.org/10.22214/ijraset.2025.75781

Abstract

In the digital age, browser history data serves as a rich source of behavioral insights, reflecting user preferences, routines, and potential threats. This project, titled User Behavior Analysis using Browser History and to Support Forensic Investigation, introduces a scalable and privacy conscious solution for analyzing and classifying user behavior from web browsing activity. The system utilizes advanced machine learning algorithms, such as Random Forest and XGBoost, to categorize users into normal and abnormal classes, while further subclassifying normal users based on interests like education, shopping, and sports. Abnormal behaviors such as phishing, spam, or defacement are flagged using anomaly detection models. To enhance interpretability, the project incorporates an interactive visualization dashboard with heatmaps, bubble charts, and network graphs using D3.js, enabling stakeholders to derive actionable insights with ease. The solution emphasizes data preprocessing and feature engineering to ensure model accuracy and robustness. With its dual focus on security and usability, the system has potential applications in cybersecurity, forensic investigations, and user analytics. This work highlights the importance of ethical data handling and sets a foundation for future research in user behavior modeling and threat detection.

Introduction

The project leverages browser history data to analyze user behavior and support forensic investigations. Browser histories provide insights into user interests, routines, and potential security threats, yet are often underutilized. The system collects, processes, and analyzes this data using machine learning (Random Forest, XGBoost) to classify users as Normal (e.g., education, shopping, entertainment) or Abnormal (e.g., phishing, spam, defacement). Normal users are further subclassified by interest categories.

An interactive web dashboard, built with D3.js, visualizes activity through heatmaps, bubble charts, and network graphs, aiding cybersecurity teams and forensic analysts. Ethical data handling and secure storage are emphasized.

Objectives:

Collect and process browser history for behavioral analysis.
Classify users as Normal or Abnormal.
Subclassify Normal users by interests.
Detect suspicious behavior using ML.
Provide interactive visual analytics.
Ensure scalability, security, and applicability in forensic and cybersecurity contexts.

Literature Survey Highlights:

URL and browsing behavior profiling can achieve high accuracy (up to 99%) using machine learning and deep learning.
Ensemble models, transformers (RoBERTa), and hybrid neural architectures are effective for malicious URL detection.
Feature engineering (URL length, TLD, HTTPS presence, etc.) significantly improves detection accuracy.

System Architecture:

Presentation Layer: Web interface with D3.js visualizations.
Logic Layer: Preprocessing, feature extraction, classification (Random Forest, XGBoost), subclassification, anomaly detection.
Data Layer: Stores CSVs and analysis results securely, using SQLite or cloud storage.

Methodology:

Data Collection: CSV export from browsers.
Preprocessing: Duplicate removal, normalization, feature extraction.
Feature Engineering: URL/domain structural features.
Classification: Random Forest model categorizes users.
Subclassification: Normal users divided by interests.
Abnormal Activity Detection: Flags phishing, spam, malware, unusual URLs.
Visualization: Heatmaps, bubble charts, network graphs via D3.js.
UI: Web dashboard with upload, instant analysis, and downloadable reports.

Results:

Random Forest classifier achieves >90% accuracy.
Misclassifications occur mainly with structurally similar URLs.
Real-time processing of up to 10,000 URL records is feasible.
Web UI is validated for reliable and interactive performance.

Future Scope:

Integration of deep learning models (transformers, GNNs).
Real-time threat intelligence and URL reputation checks.
Multi-user, enterprise-level support.
Privacy-preserving methods (federated learning, differential privacy).
Advanced visualizations, mobile compatibility, and cloud deployment.

Conclusion

The User Behavior Analysis using Browser History and to Support Forensic Investigation project presents a robust and innovative approach to understanding and profiling online user behavior through browser history data. By leveraging advanced machine learning techniques such as Random Forest and XGBoost, the system can accurately classify users into normal and abnormal behavior patterns, while further identifying interest based subcategories like education, sports, or shopping. The inclusion of an intuitive visualization dashboard, powered by D3.js, transforms complex data into meaningful visuals—making it easier for forensic investigators, cybersecurity analysts, and even non technical stakeholders to interpret user behavior. The visual tools such as heatmaps, bubble charts, and network graphs enhance situational awareness and support timely Browser selection

References

[1] Rahman, M. Khan, A. Ahmad, “Web User Profiling Based on Browsing Behavior Analysis,” International Journal of Computer Science Issues, vol. 19, no. 2, pp. 32–38, 2022. [2] P. Gade, P. Khandekar, R. Gharpure, “Identification and Classification of Malicious and Benign URLs using ML Classifiers,” International Journal of Engineering Research & Technology, vol. 9, no. 8, pp. 1125–1130, 2021. [3] S. Khurana, A. Jain, “Intelligent Multi Class Classification for URL Detection,” IEEE Access, vol. 10, pp. 1356–1364, 2022. [4] T. Wang, Y. Zhang, “Synthetic URL Generation using LSTM for Security Testing,” ACM Transactions on Internet Technology, vol. 21, no. 3, pp. 1–18, 2021. [5] K. Sharma, P. Singh, “Parallel Neural Networks for Malicious URL Detection,” Journal of Cybersecurity and Information Management, vol. 5, no. 1, pp. 44–50, 2023. [6] H. Patel, S. Sharma, “Detecting Web Based Attacks through Feature Engineering on URL Data,” Procedia Computer Science, vol. 191, pp. 1100–1106, 2021. [7] M. Roy, “Real Time Visualization of User Behavior with D3.js,” Journal of Interactive Data Science, vol. 4, no. 2, pp. 55–66, 2020. [8] K. Rao, “A Survey on Forensic Browser Analysis Techniques,” Forensic Informatics Journal, vol. 7, no. 1, pp. 22–30, 2022. [9] N. Kumari, “Privacy Preserving User Classification from Web Logs,” International Conference on Information Security, IEEE, pp. 311–316, 2020. [10] S. Deshmukh, R. Kulkarni, “Machine Learning Approaches to Detecting Abnormal Web Behavior,” Journal of Cyber Forensics, vol. 6, no. 3, pp. 89–96, 2021.

Copyright

Copyright © 2025 Neetha Natesh, Ajay D, Hruthik R, Pavan Kumar B M, Poorvika V. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET75781

Publish Date : 2025-11-24

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here