Cybercrime has emerged as a serious threat in the modern digital ecosystem, creating an urgent need for intelli gent and accessible reporting mechanisms. This paper presents CyberSafe AI, a multimodal cybercrime reporting and threat prediction system designed to assist both users and authorities. The proposed framework integrates machine learning techniques to analyze cybercrime complaints submitted through text and image inputs, along with real-time threat assessment of suspicious URLs and emails. Text features are extracted using TF-IDF vectorization, while Optical Character Recognition (OCR) is employed to process image-based reports. Lightweight Logistic Regression models are trained for efficient and fast prediction. The system is deployed using a Flask backend with a ReactJS based frontend for improved usability. Experimental evaluation demonstrates that the proposed approach achieves reliable accu racy while maintaining low computational overhead, making it suitable for real-time cyber threat monitoring. The platform aims to enhance early detection, streamline reporting, and support proactive cybercrime prevention.
Introduction
The paper proposes CyberSafe AI, an AI-driven autonomous cybercrime reporting system designed to improve the efficiency and accuracy of cybercrime complaint handling. As cyber threats such as phishing, ransomware, identity theft, malware, and online fraud continue to increase, existing reporting platforms often rely on manual, text-based processes that delay investigations and risk the loss of digital evidence.
Literature Review
Previous cybercrime detection systems mainly used rule-based and signature-based methods, which were effective for known threats but struggled with new attacks. Recent research has applied machine learning and NLP techniques, such as TF-IDF, Logistic Regression, SVM, Random Forest, CNNs, RNNs, and transformers, to improve phishing detection and cybercrime classification. Although some studies use OCR for image analysis, most solutions focus on a single data type and lack an integrated multimodal reporting framework with real-time threat prediction.
Proposed Methodology
CyberSafe AI uses a supervised machine learning approach to process multiple forms of evidence, including:
Text complaints
URLs
Emails
Images (via OCR)
The system performs:
Data preprocessing (cleaning text, removing noise, OCR extraction).
Feature extraction using TF-IDF vectorization and character-level n-grams for URLs.
Classification and threat prediction using Logistic Regression models.
Performance evaluation with Accuracy, Precision, Recall, and F1-score.
Deployment through a Flask backend and ReactJS frontend for real-time reporting and analysis.
Mathematical Model
The framework uses:
TF-IDF for converting textual information into numerical features.
Logistic Regression for predicting cybercrime categories and threat levels.
Accuracy metrics to evaluate model performance.
System Architecture
The architecture consists of:
User Interface
Preprocessing Module (NLP, OCR, Speech-to-Text)
Feature Extraction
Machine Learning Prediction Engine
Metadata Verification
Report Generation
Notification Module
Database Storage
The system automatically classifies complaints, predicts threat levels (Safe, Suspicious, Malicious), verifies evidence, generates reports, and alerts law enforcement for high-risk cases.
Results
The proposed system achieved:
90.3% accuracy in cybercrime classification.
92% accuracy in malicious threat detection.
Performance was strongest for:
Phishing (F1-score: 0.92)
Malware (F1-score: 0.91)
Compared to existing systems, CyberSafe AI provides:
Multimodal input support
AI-based crime classification
URL and email threat prediction
Automated evidence verification
Real-time alerts
Better integration with law enforcement agencies
Conclusion
This paper introduces an AI-powered system that automatically reports and predicts cybercrimes. It combines various reporting methods, uses machine learning to classify crimes, and provides real-time information about possible threats. The system makes it easier and faster to report cybercrimes, helps respond quicker, and improves how users and police work together. The system is very accurate in identifying different types of crimes and predicting potential threats, showing how well it works with cybercrime data. Features like creating organized reports, checking evidence, and sending real-time alerts help make the system more reliable and useful in actual situations. In general, this system provides a scalable and efficient way to manage cybercrime. Future plans include developing a mobile app, using blockchain for verifying evidence, and connecting with national cybercrime databases.
References
[1] N. Geetha et al., “Cyberspace News Prediction of Text and Image with Report Generation,” in Proc. IEEE Int. Conf. Comput. Commun. Signal Process. (ICCSP), 2020.
[2] S. Li et al., “Cybercrime Analysis Based on Bayesian Model,” in Proc. Int. Symp. Big Data Appl. Serv. (ISBDAS), 2025.
[3] G. S. et al., “Predicting Cyber-Attacks Using Machine Learning Techniques,” in Proc. Int. Conf. Smart Technol. Syst. Next Gener. Comput. (ICSTSN), 2024.
[4] A. R. et al., “Cyber Crime Prediction using Random Forest,” in Proc. Int. Conf. Signal Process. Commun. Syst. (CSITSS), 2024.
[5] Veena K. et al., “Cybercrime Identification Using Machine Learning Techniques,” Computational Intelligence and Neuroscience, vol. 2022, pp. 1–10, 2022.
[6] M. Chowdhury et al., “Blueprint: A Cyber Crime and Police Assistance Application,” in Proc. Int. Conf. Data Sci. Bus. Syst. (ICDSBS), 2025.
[7] A. Mahmoud et al., “Online Crime Reporting System for Digital Forensics,” in Proc. Int. Conf. Quality in Comput., Commun. Health, Eng. Smart Syst. (iQ-CCHESS), 2023.
[8] G. Al-Rummana et al., “Big Data Analysis in Crime Prediction,” Wiley, 2021.
[9] Z. Abbass et al., “Social Crime Prediction using Twitter,” in Proc. Int. Conf. Smart Comput. (ICSC), 2020.
[10] S. Deshmukh and P. Kamble, “CNN-LSTM Based Cyber Threat Detection,” in Proc. Int. Conf. Adv. Intell. Syst. (ICAIS), 2023.