Credit Card Fraud Detection Using Ensemble (Stacking and Voting Classifiers) with Hybrid Techniques

Authors: P. Shyam, Dr. K. Santhi Shree

DOI Link: https://doi.org/10.22214/ijraset.2025.71710

Abstract

Credit card fraud remains a critical challenge in the financial industry due to the highly imbalanced nature of fraud detection datasets and the evolving tactics of fraudsters. This study proposes a robust framework for Credit Card Fraud Detection Using Ensemble (Stacking and Voting Classifiers) with Hybrid Techniques, integrating advanced resampling strategies with ensemble learning to enhance the detection of minority fraud cases.We evaluated various machine learning models combined with hybrid oversampling and undersampling methods, including Simple Minority Oversampling Technique(SMOTE)-Tomek, SMOTE Edited Nearest Neighbour(ENN), and Borderline-SMOTE (BSMOTE) with Tomek. Traditional classifiers such as Random Forest (RF), Extreme Gradient Boosting (XGB), and Light Gradient Boosting Machine (LGBM) were benchmarked against ensemble approaches employing stacking and voting classifiers. Experimental results demonstrate that Voting Classifier consistently outperforms individual models, achieving the highest F1-score of 0.8634 and AUC of 0.9763 on the CreditCard dataset, and an F1-score of 0.8808 with AUC 0.9961 on the PaySim dataset. The Stacking Classifier also exhibits strong performance, particularly in reducing false positives, evidenced by its superior precision. These findings confirm that integrating hybrid sampling with ensemble models significantly enhances fraud detection capabilities, making the proposed approach effective for real-world financial fraud prevention systems. These results confirm that ensemble classifiers, when combined with appropriate hybrid resampling techniques, can significantly boost fraud detection performance by effectively balancing sensitivity and specificity. The proposed framework showcases the effectiveness of stacking and voting classifiers as part of a hybrid ensemble strategy, providing a reliable, scalable, and adaptable solution for real-world fraud detection systems where early and accurate identification of fraudulent transactions is paramount.

Introduction

1. Problem Overview

The rise in digital transactions has increased the risk of credit card fraud, making traditional detection techniques inadequate.
Fraudulent transactions are rare, causing class imbalance where standard classifiers favor the majority (legitimate) class.
Detecting fraud requires intelligent, scalable, and adaptive solutions that minimize false positives and missed frauds.

2. Proposed Solution

This study introduces an integrated fraud detection framework combining:

Ensemble classifiers (Stacking, Voting)
Hybrid resampling techniques (SMOTE-Tomek, SMOTEENN, Borderline-SMOTE)

The goal is to enhance the accuracy, recall, precision, and F1-score of detecting minority (fraudulent) transactions.

3. Datasets Used

CreditCard Dataset:
- 284,807 transactions
- Only 492 are fraudulent (0.172%)
- PCA-transformed features with 'Amount' and 'Time'
PaySim Dataset:
- Simulated mobile money transactions based on real logs
- Features include transaction type, amount, origin/destination balances, and fraud indicators
- Highly imbalanced and realistic financial behavior patterns

4. Methodology

A. Preprocessing

Clean missing values and outliers
Drop irrelevant features (e.g., user identifiers)
One-hot encoding for categorical variables (e.g., transaction types)
Stratified train-test split (80/20) to preserve class balance

B. Hybrid + Ensemble Architecture

A three-stage pipeline:

Hybrid Sampling:
- SMOTE-Tomek (oversample fraud + clean borderline legitimate cases)
- SMOTEENN (oversample + remove noisy examples)
- Borderline-SMOTE + Tomek (focused oversampling near decision boundary)
Base Classifiers:
- Random Forest (RF)
- XGBoost (XGB)
- LightGBM (LGBM)
Ensemble Techniques:
- Stacking: Combines RF, XGB, LGBM → Logistic Regression as meta-learner
- Voting: Soft voting among RF, XGB, LGBM

C. Evaluation Metrics

Accuracy
Precision
Recall
F1-Score
ROC-AUC
Confusion Matrix

5. Experimental Results

Hybrid 1: RF + SMOTE-Tomek

CreditCard:
- F1-score: 0.8482
- AUC: 0.9782
PaySim:
- Recall: 0.8421
- AUC: 0.9875
Conclusion: Strong overall balance and reliability

Hybrid 2: XGB + SMOTEENN

CreditCard:
- High recall: 0.8571
- Lower precision: 0.6131
PaySim:
- Very high recall: 0.9035
- Poor precision: 0.2068
Conclusion: Best when prioritizing detection of all frauds (high recall), but causes more false alarms

Hybrid 3: XGB + Borderline-SMOTE + Tomek

CreditCard:
- F1-score: 0.8316
PaySim:
- Precision: 0.8873
- F1-score: 0.8571
Conclusion: Most balanced method across metrics

6. Literature Insights

Numerous prior studies highlight the effectiveness of SMOTE, ADASYN, Tomek Links, SMOTEENN, and Borderline-SMOTE in balancing fraud datasets.
Ensemble models like Random Forest, XGBoost, and deep learning methods consistently outperform individual classifiers.
Behavior clustering and noise removal further improve classification accuracy and generalization.

7. Conclusion

This research demonstrates that combining hybrid resampling with ensemble methods like stacking and voting:

Improves detection of rare fraud cases
Balances precision and recall
Reduces overfitting
Provides a scalable and high-performance fraud detection solution

It offers a robust framework for financial institutions to tackle fraud in highly imbalanced real-world datasets.

Conclusion

The proposed methedology demonstrates the effectiveness of combining ensemble learning with advanced resampling strategies to address the challenges posed by highly imbalanced fraud detection datasets. Two benchmark datasets—CreditCard and PaySim—were utilized to evaluate the performance of various hybrid models.Across both datasets, ensemble methods such as Stacking and Voting classifiers consistently outperformed individual hybrid approaches (e.g., Random Forest + SMOTE-Tomek, XGBoost + SMOTEENN) in terms of precision, recall, F1-score, and ROC AUC, particularly for the minority fraud class. Notably, on the CreditCard dataset, the Voting classifier achieved the highest fraud F1-score of 0.8634, with a strong balance between precision (0.9294) and recall (0.8061). Similarly, the PaySim dataset results revealed the Voting classifier as the top performer with an exceptional fraud F1-score of 0.8808, precision of 0.9891, and recall of 0.7939, indicating a robust ability to correctly identify fraudulent transactions while minimizing false positives.The use of hybrid resampling techniques such as SMOTE-Tomek, SMOTEENN, and BSMOTE + Tomek significantly contributed to improving the detection rates of fraud cases by generating synthetic examples and cleaning noisy data, thus aiding classifiers in learning more discriminative patterns. Furthermore, the ensemble frameworks effectively leveraged the strengths of base learners to build more generalized and accurate models. Overall, the results validate that ensemble methods combined with hybrid sampling techniques provide a powerful and reliable solution for credit card fraud detection, offering high predictive performance and addressing the critical issue of class imbalance. This approach not only enhances fraud detection capabilities but also reduces operational risk for financial institutions by enabling faster and more accurate identification of fraudulent activities.

References

[1] H. Shamsudin, U. K. Yusof, A. Jayalakshmi, and M. N. A. Khalid, ‘‘Combining oversampling and undersampling techniques for imbalanced classification: A comparative study using credit card fraudulent transaction dataset,’’ in Proc. IEEE 16th Int. Conf. Control Autom. (ICCA), Oct. 2020, pp. 803–808, doi: 10.1109/ICCA51439.2020.9264517. [2] Y. Zhang, J. Wang, and J. Ma, “ASN-SMOTE: A synthetic minority oversampling method with adaptive qualified synthesizer selection,” Neural Comput. Appl., vol. 34, no. 12, pp. 9939–9952, Jun. 2022, doi: 10.1007/s40747-021-00638-w. [3] Y. Wang, H. Wang, and Y. Chen, “A behavior-cluster based imbalanced classification method for credit card fraud detection,” in Proc. 28th ACM Int. Conf. Inf. Knowl. Manage. (CIKM), Nov. 2019, pp. 2397–2400, doi: 10.1145/3352411.3352433. [4] L. Douzas and F. Bacao, “SMOTE-NCL: A re-sampling method with filter for network intrusion detection,” in Proc. IEEE Int. Conf. Comput. Commun. (COMP COMM), Dec. 2016, pp. 1–6, doi: 10.1109/COMPCOMM.2016.7924886. [5] Q. Wang, Y. Zhang, and X. Liu, “NUS: Noisy-sample-removed undersampling scheme for imbalanced classification and application to credit card fraud detection,” IEEE Trans. Comput. Soc. Syst., vol. 11, no. 1, pp. 123–134, Mar. 2024, doi: 10.1109/TCSS.2023.3243925. [6] H. Shamsudin, U. K. Yusof, A. Jayalakshmi, and M. N. A. Khalid, “Combining oversampling and undersampling techniques for imbalanced classification: A comparative study using credit card fraudulent transaction dataset,” in Proc. IEEE 16th Int. Conf. Control Autom. (ICCA), Oct. 2020, pp. 803–808, doi: 10.1109/ICCA51439.2020.9264517. [7] S. A. S. Sadiq, M. A. Hussain, and S. A. Khan, “Detection of fraudulent credit card transactions: A comparative analysis of data sampling and classification techniques,” J. Phys. Conf. Ser., vol. 1742, no. 1, Art. no. 012072, 2021, doi: 10.1088/1742-6596/2161/1/012072. [8] P. Kaur and A. Gosain, ‘‘Comparing the behavior of oversampling and undersampling approach of class imbalance learning by combining class imbalance problem with noise,’’ in Advances in Intelligent Systems and Computing. Singapore: Springer, 2017, pp. 23–30, doi: 10.1007/978-981-10-6602-3_3. [9] R. Qaddoura and M. M. Biltawi, ‘‘Improving fraud detection in an imbalanced class distribution using different oversampling techniques,’’ in Proc. Int. Eng. Conf. Electr., Energy, Artif. Intell. (EICEEAI), Nov. 2022, pp. 1–5, doi: 10.1109/EICEEAI56378.2022.10050500. [10] K. Praveen Mahesh, S. Ashar Afrouz, and A. Shaju Areeckal, ‘‘Detection of fraudulent credit card transactions: A comparative analysis of data sampling and classification techniques,’’ in Proc. J. Phys., Conf., Jan. 2022, vol. 2161, no. 1, Art. no. 012072, doi: 10.1088/1742-6596/2161/1/012072. [11] N. Rtayli, ‘‘An efficient deep learning classification model for predicting credit card fraud on skewed data,’’ J. Inf. Secur. Cybercrimes Res., vol. 5, no. 1, pp. 57–71, Jun. 2022, doi: 10.26735/tlyg7256. [12] S. O. Akinwamide, ‘‘Prediction of fraudulent or genuine transactions on credit card fraud detection dataset using machine learning techniques,’’ Int. J. Res. Appl. Sci. Eng. Technol., vol. 10, no. 6, pp. 5061–5071, Jun. 2022, doi: 10.22214/ijraset.2022.44962. [13] Q. Li and Y. Xie, ‘‘A behavior-cluster based imbalanced classification method for credit card fraud detection,’’ in Proc. 2nd Int. Conf. Data Sci. Inf. Technol. New York, NY, USA: ACM, Jul. 2019, pp. 134–139, doi: 10.1145/3352411.3352433. [14] E. Esenogho, I. D. Mienye, T. G. Swart, K. Aruleba, and G. Obaido, ‘‘A neural network ensemble with feature engineering for improved credit card fraud detection,’’ IEEE Access, vol. 10, pp. 16400–16407, 2022, doi: 10.1109/ACCESS.2022.3148298. [15] X. Yi, Y. Xu, Q. Hu, S. Krishnamoorthy, W. Li, and Z. Tang, ‘‘ASN-SMOTE: A synthetic minority oversampling method with adaptive qualified synthesizer selection,’’ Complex Intell. Syst., vol. 8, no. 3, pp. 2247–2272, Jun. 2022, doi: 10.1007/s40747-021-00638-w. [16] E. F. Ullastres and M. Latifi, ‘‘Credit card fraud detection using ensemble learning algorithms MSc research project MSc data analytics,’’ M.S. thesis, Nat. College Ireland, Dublin, Ireland, May 2022. [17] H. Zhu, M. Zhou, G. Liu, Y. Xie, S. Liu, and C. Guo, ‘‘NUS: Noisy-sample-removed undersampling scheme for imbalanced classification and application to credit card fraud detection,’’ IEEE Trans. Intell. Transp. Syst., vol. 23, no. 9, pp. 17601–17611, Sep. 2022, doi: 10.1109/TITS.2022.3165638. [18] E. G. Lopez-Rojas, A. Elmir, and S. Axelsson, ‘‘PaySim: A financial mobile money simulator for fraud detection,’’ in Proc. 28th Eur. Modeling Symp. (EMS), Oct. 2014, pp. 249–255, doi: 10.1109/EMS.2014.50. [19] A. Arfeen and F. H. Khan, ‘‘Empirical analysis of machine learning algorithms for detecting fraudulent electronic fund transfers,’’ J. Artif. Intell. Data Sci., vol. 1, no. 2, pp. 71–80, Dec. 2021, doi: 10.47693/jaids.v1i2.50. [20] H. Mondal, ‘‘Handling imbalanced data for credit card fraud detection using various algorithms: An empirical study,’’ in Proc. 2nd Int. Conf. Smart Technol. Intell. Syst. (STIS), Nov. 2022, pp. 1–8, doi: 10.1109/STIS57120.2022.10000935. [21] Dataset “Credit Card Fraud Detection Anonymized European Card Holders transactions labeled as fraudulent or genuine” https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud 22. Dataset 2 : “Synthetic Financial Datasets For Fraud Detection Synthetic datasets generated by the PaySim mobile money simulator”https://www.kaggle.com/datasets/ealaxi/paysim1/data

Copyright

Copyright © 2025 P. Shyam, Dr. K. Santhi Shree. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET71710

Publish Date : 2025-05-27

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here