This study focuses based on machine learning approaches to detect fraudulent activities in banking data — a major concern in the financial sector where preventing fraud is critical. To improve detection accuracy, the research introduces class weight tuning, a strategy that strengthens machine learning models distinguish between genuine and fraudulent transactions. The study uses three powerful machine learning algorithms — CatBoost, LightGBM, and XGBoost each known for its strengths in handling complex datasets. By combining these models, the system aims to improve overall performance in identifying fraud.
Alongside, integration of deep learning techniques is carried out to fine-tune the model’s hyperparameters, increasing its adaptability and effectiveness in recognizing evolving fraud patterns. The models are evaluated using real- world banking data, and the results show that combining LightGBM and XGBoost outperforms traditional approaches across various performance metrics. To further boost accuracy, a stacking ensemble model is implemented. It combines predictions from RandomForest and LightGBM classifiers and uses a GradientBoostingClassifier as the final estimator. This ensemble approach leverages the synergy of multiple models to make more accurate predictions.
Introduction
The rapid growth of financial transactions, especially online, has led to a surge in credit card fraud, posing ongoing challenges for detection systems as fraudsters continuously adapt their methods. Fraud involves intentional deception for financial gain, with credit card fraud specifically referring to unauthorized use of card details. Fraud prevention aims to stop fraud before it occurs, while fraud detection identifies fraudulent transactions during or after they happen, often framed as a binary classification problem (fraudulent vs. legitimate).
Due to large-scale and complex transaction data, manual fraud detection is inefficient, making machine learning (ML) and deep learning essential for uncovering hidden patterns and enabling real-time detection. Techniques like LightGBM, XGBoost, CatBoost, and Logistic Regression, along with ensemble methods (e.g., majority voting), have shown promise in improving detection accuracy. High precision is critical to minimize false positives and false negatives, ensuring trust and reducing losses.
The literature review highlights the dynamic nature of fraud patterns and the need for innovative solutions such as Fraud Islands (link analysis) and multi-layer ML models that integrate various detection strategies. In healthcare and financial sectors, fraud detection faces challenges like class imbalance and complex data relationships. Hybrid and boosting models, combined with techniques like SMOTE for imbalance handling, have improved fraud detection effectiveness.
The study proposes an advanced fraud detection system using optimized ML algorithms with hyperparameter tuning via Bayesian optimization and stacking ensembles (Random Forest, LightGBM, Gradient Boosting). The system is tested on publicly available datasets (e.g., Kaggle Bank Transactions) with anonymized features to ensure privacy.
The methodology involves preprocessing raw transaction data, feature selection to retain relevant attributes, and model training using cross-validation for robustness. Performance is evaluated through metrics such as accuracy, precision, recall, and F1-score. A user-friendly web application built on Flask with SQLite backend supports practical deployment and user interaction.
Overall, this approach leverages modern ML techniques and ensemble strategies to enhance fraud detection accuracy, adaptability, and usability in real-world banking environments.
Conclusion
The Stacking Classifier emerged as the top-performing model, delivering the highest accuracy among all evaluated algorithms, thereby showcasing its exceptional effectiveness in fraud detection. This project demonstrated strong performance across various machine learning models—including LightGBM, XGBoost, CatBoost [29, 30, 31, 32], voting classifiers, and neural networks—underscoring the system’s adaptability and robustness. The incorporation of diverse Data extraction and adjustment techniques Was essential for in enhancing detection accuracy, highlighting their significance in model optimization.
The usage of the ensemble-based Stacking Classifier further boosted performance, clearly establishing its value in tackling fraud- related challenges. Additionally, the development of a user-friendly Flask-based web interface simplified user testing and authentication processes, enhancing accessibility and real-world usability. Successful implementation and testing through Flask, with interactive inputs, confirmed the practicality and reliability of the system [1, 2, 3].
Overall, the project underscores the potential of advanced machine learning approaches in addressing complex fraud detection issues within the banking domain. It sets the stage for continued enhancements through further exploration of ensemble methods and hyperparameter optimization strategies. Ultimately, these advancements contribute to reducing financial losses, improving transaction security, and building trust within the financial ecosystem.
References
[1] J. Nanduri et al., “Ecommerce fraud detection through fraud islands and multi-layer machine learning model,” in Advances in Information and Communication, Springer, 2020, pp. 556–570.
[2] I. Matloob et al., “A sequence mining-based novel architecture for detecting fraudulent transactions in healthcare systems,” IEEE Access, vol. 10, pp. 48447–48463, 2022.
[3] H. Feng, “Ensemble-based methods in credit card fraud detection using boosting methods,” in Proc. 2nd Int. Conf. Comput. Data Sci. (CDS), 2021, pp. 7–11.
[4] M. S. Delgosha et al., “Elucidation of big data analytics in banking: A four-stage Delphi study,”
a. J. Enterprise Inf. Manage., vol. 34, no. 6, pp. 1577– 1596, Nov. 2021.
[5] M. Puh and L. Brki?, “Detecting credit card fraud using selected machine learning algorithms,” in Proc. 42nd MIPRO, 2019, pp. 1250–1255.
[6] K. Randhawa et al., “Detection of credit card fraud using AdaBoost and majority voting,” IEEE Access, vol. 6, pp. 14277–14284, 2018.
[7] N. Kumaraswamy et al., “Healthcare fraud data mining methods: A retrospective and future perspective,” Perspectives in Health Information Management, vol. 19, no. 1, p. 1, 2022.
[8] E. F. Malik et al., “A novel hybrid machine learning architecture in detecting credit card fraud,” Mathematics, vol. 10, no. 9, p. 1480, Apr.
[9] 2022.
[10] K. Gupta et al., “A review on machine learning- based credit card fraud detection,” in Proc. Int. Conf. on Advancements in Automation and Intelligent Computing (ICAAIC), 2022, pp. 362–
[11] 368.
[12] R. Almutairi et al., “Credit card fraud detection using machine learning models: An analytical study,” in Proc. IEEE Int. Conf. on Internet of Everything, Microwave, and Electronics Engineering (IEMTRONICS), Jun. 2022, pp. 1–8.
[13] N. S. Halvaiee and M. K. Akbari, “A novel approach to credit card fraud detection using artificial immune systems,” Applied Soft Computing, vol. 24, pp. 40–49, Nov. 2014.
[14] A. C. Bahnsen, D. Aouada, A. Stojanovic, and B. Ottersten, “Feature engineering strategies for credit card fraud detection,” Expert Systems with Applications, vol. 51, pp. 134–142, Jun. 2016
[15] U. Porwal and S. Mukund, “Outlier detection approach for credit card fraud detection in e- commerce,” arXiv preprint arXiv:1811.02196, 2018.
[16] H. Wang et al., “An ensemble learning framework for credit card fraud detection,” in Proc. IEEE SmartWorld Conf., Oct. 2018, pp. 94–98.
[17] F. Itoo et al., “Comparative study of logistic regression, Naïve Bayes, and KNN for credit card fraud detection,” International Journal of Information Technology, vol. 13, no. 4, pp. 1503–
[18] 1511, 2021.
[19] T. A. Olowookere and O. S. Adewale, “Cost- sensitive meta-learning framework for credit card fraud detection,” Scientific African, vol. 8, art. no. e00464, Jul. 2020.
[20] A. A. Taha and S. J. Malebary, “Optimized LightGBM-based intelligent approach for credit card fraud detection,” IEEE Access, vol. 8, pp. 25579–25587, 2020.
[21] X. Kewei et al., “Hybrid deep learning model for online fraud detection,” in Proc. Int. Conf. on Computing, Electronics, and Communications Engineering (ICCECE), Jan. 2021, pp. 431–434.
[22] T. Vairam et al., “Evaluation of Naïve Bayes and voting classifier strategies for detecting credit card fraud,” in Proc. Int. Conf. on Advanced Computing and Communication Systems (ICACCS), Mar. 2022, pp. 602–608.
[23] P. Verma and P. Tyagi, “Analysis of supervised machine learning algorithms in fraud detection,” ECS Transactions, vol. 107, no. 1, p. 7189, 2022