Credit card fraud represents one of the most critical threats to the global financial ecosystem, causing losses exceeding $30 billion annually. The rapid proliferation of digital payments, e-commerce platforms, and mobile banking has significantly expanded the attack surface for fraudulent activities. This paper presents a comprehensive machine learning-based fraud detection framework applied to the benchmark Kaggle credit card transaction dataset comprising 284,807 real transactions with 492 confirmed fraud instances, representing a severe class imbalance of 578:1. Four classification algorithms are systematically implemented and evaluated: Logistic Regression, Decision Tree, Random Forest, and XGBoost. The Synthetic Minority Over-sampling Technique (SMOTE) is applied exclusively to training data to mitigate class imbalance without contaminating test evaluation. Performance is assessed using accuracy, precision, recall, F1-score, ROC-AUC, and 5-fold stratified cross-validation. Logistic Regression demonstrates the highest fraud recall of 91.84% (AUC=0.9698), while XGBoost achieves the best balance with F1-score of 0.7898 and AUC of 0.9957 on synthetic benchmarks. The system generates eight comprehensive visualizations including EDA dashboards, confusion matrices, ROC curves, precision-recall curves, cross-validation scores, and feature importance comparisons. A REST API is implemented using Flask for real-time prediction. Results confirm that ensemble and boosting methods with SMOTE provide robust, scalable solutions for financial fraud detection.
Introduction
The study focuses on applying machine learning techniques for credit card fraud detection in highly imbalanced transaction datasets. With global card fraud losses continuing to rise, traditional rule-based detection systems are becoming less effective because they cannot adapt to evolving fraud patterns. The research uses the Kaggle credit card fraud dataset containing 284,807 transactions, of which only 492 are fraudulent (0.1727%), creating a severe class imbalance challenge.
The main objectives are to compare four machine learning algorithms—Logistic Regression, Decision Tree, Random Forest, and XGBoost—while addressing class imbalance using SMOTE (Synthetic Minority Over-sampling Technique). Model performance is evaluated using multiple metrics including accuracy, precision, recall, F1-score, ROC-AUC, and 5-fold cross-validation, with the best-performing model deployed through a Flask-based REST API for real-time fraud prediction.
The literature review highlights the evolution of fraud detection from statistical and rule-based approaches to modern machine learning and deep learning methods. Previous studies demonstrated the effectiveness of ensemble models, class-balancing techniques, and anomaly detection methods, but many lacked comprehensive model comparisons, balanced evaluation metrics, or practical deployment frameworks.
The proposed system follows a seven-stage pipeline: data ingestion, exploratory data analysis, preprocessing, SMOTE-based balancing, cross-validation, model training, and deployment. The dataset includes anonymized PCA-transformed features (V1–V28), along with transaction time and amount. Data preprocessing involves feature scaling, stratified train-test splitting, and retention of outliers because fraudulent transactions often appear as anomalies.
To address class imbalance, SMOTE generates synthetic fraud samples within the training dataset, creating a balanced distribution while keeping the test data unchanged to reflect real-world conditions. The study evaluates four classifiers:
Logistic Regression as an interpretable baseline model.
Decision Tree for transparent decision-making.
Random Forest as an ensemble method that reduces overfitting and provides feature importance analysis.
XGBoost as a gradient-boosting model expected to deliver superior predictive performance.
Conclusion
A. Summary of Contributions
This paper presents a complete, reproducible machine learning framework for credit card fraud detection applied to the real Kaggle benchmark dataset (284,807 transactions). Key contributions include:
1) Systematic implementation and comparison of four ML classifiers (LR, DT, RF, XGBoost) under identical experimental conditions with real transaction data.
2) Effective SMOTE application reducing 578:1 class imbalance while preventing test data contamination.
3) Comprehensive 5-fold stratified cross-validation demonstrating model stability (RF CV-F1: 0.9994±0.0004).
4) Logistic Regression achieves highest fraud recall of 91.84% on real Kaggle data (AUC=0.9698).
5) XGBoost achieves best precision-recall balance with F1=78.98% and AUC=0.9957.
6) Eight professional visualizations including EDA, confusion matrices, ROC curves, PR curves, CV scores, feature importance, model comparison, and summary dashboard.
7) Production-ready Flask REST API for real-time single and batch fraud prediction.
B. Challenges Faced
The primary challenge is the extreme 578:1 class imbalance causing accuracy bias. SMOTE partially addresses this but introduces synthetic samples that may not perfectly represent real fraud patterns. The PCA-transformed V1-V28 features limit domain-specific interpretation. Large dataset size (284,807 rows) increases training time, particularly for ensemble methods.
C. Limitations
The current system operates in batch mode without real-time streaming capability. The static dataset does not capture concept drift—temporal evolution of fraud patterns—which may degrade deployed model performance over time. Binary classification does not distinguish between fraud categories. Privacy constraints prevent analysis of original (non-PCA) transaction features.
D. Future Work
1) Deep Learning: Implement LSTM networks for sequential transaction modeling and Transformer-based architectures for attention-based fraud detection.
2) Real-Time Streaming: Integrate Apache Kafka and Flink for millisecond-latency fraud scoring in production environments.
3) Federated Learning: Privacy-preserving collaborative training across financial institutions without sharing raw transaction data.
4) Explainable AI: Apply SHAP and LIME for regulatory-compliant, instance-level fraud decision explanations.
5) Graph Neural Networks: Model transaction networks as graphs to detect coordinated fraud rings.
6) Concept Drift Handling: Implement online learning algorithms for continuous model adaptation to evolving fraud patterns.
References
[1] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland, \"Data mining for credit card fraud: A comparative study,\" Decision Support Systems, vol. 50, no. 3, pp. 602–613, Feb. 2011.
[2] R. J. Bolton and D. J. Hand, \"Statistical fraud detection: A review,\" Statistical Science, vol. 17, no. 3, pp. 235–255, Aug. 2002.
[3] A. Dal Pozzolo, O. Caelen, R. A. Johnson, and G. Bontempi, \"Calibrating probability with undersampling for unbalanced classification,\" in Proc. IEEE Symp. Comput. Intell. Data Min. (CIDM), Orlando, FL, USA, 2015, pp. 159–166.
[4] F. Carcillo, A. Dal Pozzolo, Y.-A. Le Borgne, O. Caelen, Y. Mazzer, and G. Bontempi, \"SCARFF: A scalable framework for streaming credit card fraud detection with Spark,\" Information Fusion, vol. 41, pp. 182–194, May 2018.
[5] V. N. Dornadula and S. Geetha, \"Credit card fraud detection using machine learning algorithms,\" Procedia Computer Science, vol. 165, pp. 631–641, 2019.
[6] J. O. Awoyemi, A. O. Adetunmbi, and S. A. Oluwadare, \"Credit card fraud detection using machine learning techniques: A comparative analysis,\" in Proc. Int. Conf. Comput., Netw. Inform. (ICCNI), Lagos, Nigeria, 2017, pp. 1–9.
[7] K. Randhawa, C. K. Loo, M. Seera, C. P. Lim, and A. K. Nandi, \"Credit card fraud detection using AdaBoost and majority voting,\" IEEE Access, vol. 6, pp. 14277–14284, Feb. 2018.
[8] S. Xuan, G. Liu, Z. Li, L. Zheng, S. Wang, and C. Jiang, \"Random forest for credit card fraud detection,\" in Proc. IEEE 15th Int. Conf. Netw., Sens. Control (ICNSC), Zhuhai, China, 2018, pp. 1–6.
[9] A. Pumsirirat and L. Yan, \"Credit card fraud detection using deep learning based on auto-encoder and restricted Boltzmann machine,\" Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 1, pp. 18–25, 2018.
[10] A. Roy, J. Sun, R. Mahoney, L. Alonzi, S. Adams, and P. Beling, \"Deep learning detecting fraud in credit card transactions,\" in Proc. Syst. Inf. Eng. Design Symp. (SIEDS), Charlottesville, VA, USA, 2018, pp. 129–134.
[11] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, \"SMOTE: Synthetic minority over-sampling technique,\" J. Artif. Intell. Res., vol. 16, pp. 321–357, Jun. 2002.
[12] ULB Machine Learning Group, \"Credit Card Fraud Detection Dataset,\" Kaggle, 2016. [Online]. Available: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
[13] F. Pedregosa et al., \"Scikit-learn: Machine learning in Python,\" J. Mach. Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011.
[14] T. Chen and C. Guestrin, \"XGBoost: A scalable tree boosting system,\" in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, San Francisco, CA, USA, 2016, pp. 785–794.