The application of machine learning techniques to crime prediction has emerged as a transformative approach to identifying criminal patterns and improving public safety. This study presents a practical framework that utilizes a Random Forest Classifier trained on structured crime data from India to forecast crime domains based on variables such as city, type of offense, victim demographics, and year. A key contribution of this work is the development of an interactive web-based system, built using Flask, which enables both real-time and year-wise crime prediction. Through thorough preprocessing, encoding, and model evaluation using metrics like accuracy and F1-score, the system demonstrates reliable predictive performance. This paper further explores existing literature to contextualize current methodologies and highlight areas where machine learning can enhance crime prevention strategies. The outcome offers a scalable tool with practical implications for law enforcement and urban planners seeking to leverage data- driven insights for crime mitigation.
Introduction
This study develops a machine learning (ML)-based framework for predicting crime domains (e.g., violent, property-related, cybercrime) using structured Indian crime data. It uses a Random Forest Classifier due to its robustness, interpretability, and efficiency, and deploys the model via a Flask-based web application that enables both real-time and batch (year-wise) crime predictions.
Key Objectives and Features:
Leverages public crime data (NCRB, Kaggle) with features such as city, crime type, age, gender, weapon used, and year.
Focuses on crime classification and future forecasting to aid law enforcement and urban planning.
Addresses limitations in prior research (mostly Western datasets, lack of deployment) by providing a real-time, India-focused application.
Methodology:
Data Collection: From open Indian datasets.
Preprocessing:
Removal of null labels.
Imputation (mean) for missing values.
Label encoding for categorical features.
Modeling: Random Forest selected for:
High accuracy with categorical data.
Resistance to overfitting.
Ease of feature importance analysis.
Web Deployment: Flask app offers:
Single case prediction.
Year-wise state-level forecasting.
Integration of .pkl files for fast inference.
Evaluation Metrics:
Accuracy, Precision, Recall, F1-Score used to assess performance.
Visualization of state-wise and year-wise trends helps in proactive crime management.
Contributions:
End-to-end ML pipeline tailored for Indian datasets.
Real-time web interface supporting prediction and analysis.
Comprehensive preprocessing and data handling strategy.
Scalable and interpretable model architecture.
Recommendations for future enhancements like:
LSTM for advanced forecasting
Geospatial integration
Ethical impact assessment
Literature Insights:
Prior studies (both Indian and international) confirm the effectiveness of ML in crime prediction.
Random Forest and ensemble models generally outperform simpler algorithms.
Existing tools often lack deployable web interfaces or focus only on model accuracy, not usability.
Findings on Crime Distribution:
Cities like Delhi, Chennai, Mumbai show the highest crime volumes.
Mid-tier cities like Pune and Ahmedabad also show significant cases.
Smaller cities (Shillong, Panaji) report fewer crimes, possibly due to underreporting or stronger community systems.
Conclusion
Crime continues to pose a complex and dynamic challenge to urban security and public policy. Traditional crime analytics based on manual records and tabular summaries are no longer sufficient to predict, prevent, or respond to criminal activity in real time. With the exponential growth of data collected from police departments, CCTV, social media, and public reporting systems, the need for automated, data- driven, and intelligent crime forecasting systems has become urgent. This study presents a significant contribution toward that goal by developing a practical, interpretable, and deployable crime domain prediction model tailored to the Indian context.
The main goal of this research was to build a supervised learning model using structured Indian crime data to classify incidents into categories like violent, property-related, or cybercrime. A Random Forest Classifier was selected for its robustness, interpretability, and ability to work with mixed data types. The model performed well, achieving 88% accuracy with balanced precision and recall, confirming its suitability for real-world crime prediction.