Vehicle insurance fraud is a persistent challenge in the insurance sector, leading to substantial financial losses and operational inefficiencies. This research presents an intelligent and hybrid framework for fraud detection in vehicle insurance claims by combining supervised machine learning models with anomaly detection techniques. The proposed solution integrates a robust preprocessing pipeline that handles data cleaning, encoding, scaling, feature selection, and anomaly filtering using Isolation Forest and Autoencoders to remove suspicious data before model training. A strategic feature selection process was employed to retain the top 20 predictive features, ensuring model interpretability and performance. Multiple models including Random Forest, XGBoost, and Logistic Regression were trained and evaluated, with a meta-model ensemble delivering the final prediction. Additionally, the system is deployed as a secure web application featuring admin authentication, real-time predictions, and model explainability through SHAP and LIME visualizations. The results demonstrate that this hybrid approach significantly enhances fraud detection accuracy and provides transparency in decision-making. This solution has the potential to streamline claims processing and support insurance companies in reducing fraud-related risks.
Introduction
Objective:
To design and implement a smart, scalable, and explainable system for detecting fraudulent vehicle insurance claims using machine learning (ML), anomaly detection, and explainable AI (XAI).
Key Components:
Problem Statement:
Insurance fraud is rising, and traditional rule-based systems are ineffective against sophisticated fraud.
Need for data-driven, real-time, and explainable fraud detection models.
Proposed System:
A web-based application integrating:
ML classifiers (Random Forest, XGBoost, Logistic Regression)
Engineered features like Claim Ratio, Vehicle Age, and Policy Tenure.
Anomaly Detection:
Isolation Forest and Autoencoder flag outliers for removal.
Balancing:
SMOTE used to fix class imbalance in fraud vs. non-fraud claims.
Model Training:
Trained on Random Forest, XGBoost, LightGBM, CatBoost.
Meta-model combines their outputs for improved performance.
Explainability:
SHAP: Shows global and local feature impact.
LIME: Offers instance-specific explanations.
Experimental Results:
Feature Importance: Top 20 features selected using Random Forest.
Model Performance:
Meta-model outperformed all individual classifiers.
Ensemble approach improved accuracy and robustness.
Web App Features:
Admin login, claim entry form, prediction results, and XAI visualizations.
Literature Review Highlights:
Previous works (using XGBoost, Random Forest, Autoencoders, LSTM) showed promise but had limitations: outdated datasets, poor scalability, lack of real-time prediction, and limited interpretability.
This study improves on those by offering:
A real-time, scalable system.
Strong model interpretability.
Practical web-based deployment.
Conclusion
The proposed vehicle insurance fraud detection system presents a holistic and intelligent approach that combines traditional machine learning models, anomaly detection techniques, and explainable AI tools to effectively identify fraudulent claims. By addressing both the accuracy and interpretability challenges commonly associated with fraud prediction systems, the framework ensures reliable performance while remaining transparent and user-friendly. The integration of anomaly detection using Isolation Forest and Autoencoder allows the system to capture subtle or previously unseen fraud patterns, enhancing the quality of the training data for the supervised models. The use of ensemble techniques like stacking further improves predictive performance by leveraging the strengths of multiple classifiers. Additionally, explainability is achieved through SHAP and LIME, enabling stakeholders to understand and trust the system’s predictions. The deployment of the system through a secure, minimalistic web interface ensures ease of use and real-time applicability, making it suitable for operational environments.
Overall, the system not only meets the current requirements of fraud detection in vehicle insurance but also lays a strong foundation for future expansion and scalability. It serves as a significant step towards more intelligent, interpretable, and practical fraud detection solutions in the insurance industry. By balancing predictive power with transparency and usability, this work contributes meaningfully to the ongoing efforts to combat financial fraud and protect stakeholders across the insurance ecosystem.
References
[1] T. Machinya, G. Mbizo, and K. Zvarevashe, “Insurance Fraud Detection using Machine Learning,” in Proceedings of the 2022 1st Zimbabwe Conference of Information and Communication Technologies (ZCICT), Nov. 2022, DOI:https://doi.org/10.1109/ZCICT55726.2022.10046034
[2] Arif Ismail Alrais, “Fraudulent Insurance Claims Detection Using Machine Learning.” M.S. thesis, Dept. of Graduate Programs & Research, Rochester Institute of Technology, Dubai, 2022. Available:https://repository.rit.edu/cgi/viewcontent.cgi?article=12510&context=theses
[3] C.?Gomes, Z.?Jin, and H.?Yang, “Insurance fraud detection with unsupervised deep learning,” Journal of Risk and Insurance, vol.?88, no.?3, pp.?591–624, Sept.?2021, DOI:https://doi.org/10.1111/jori.12359
[4] H. Cedervall and A. Hansson, “Insurance Fraud Detection using Unsupervised Sequential Anomaly Detection.” M.S. thesis, Dept. of Computer and Information Science, Linköping University, Linköping, Sweden, 2022. [Online]. Available: https://www.diva-portal.org/smash/get/diva2:1633422/FULLTEXT01.pdf