Credit card fraud detection by using machine learning techniques in imbalanced datasets has emerged as an important area of study in recent years, considering the increased usage of online transactions. Here, we discuss in detail the detection of credit card transactions by using machine learning techniques and avoiding imbalanced datasets in credit card transactions. Here, we used the Random Forest algorithm to detect credit card transactions and imbalanced datasets were handled by using the Logistic Regression algorithm. Synthesized Minority Over-sampling Technique was used to train the models. Here, we used the SMOTE method, and finally, evaluation of models was carried out by using accuracy, precision, recall, F1-score, and confusion matrix. Our result demonstrated that Synthesized Minority Over-sampling Technique was important in detecting credit card transactions. Also, it was evident that our Random Forest classifier was more accurate compared to our Logistic Regression classifier in detecting credit card transactions.
Introduction
The growth of digital payments, online banking, and e-commerce has significantly increased the use of credit cards worldwide. While this provides convenience for users, it has also led to a rise in credit card fraud, creating financial and reputational risks for financial institutions. Detecting fraud is challenging because fraud transactions represent only a very small portion of the overall data, causing a class imbalance problem that leads machine learning models to favor legitimate transactions and miss fraudulent ones.
To address this issue, the study applies the Synthetic Minority Over-sampling Technique (SMOTE), which generates artificial samples of fraudulent transactions to balance the dataset. The research compares two machine learning models: Logistic Regression (LR) and Random Forest (RF). Logistic Regression is used as a baseline model, while Random Forest is selected for its ability to capture complex, nonlinear relationships in transaction data.
The dataset used (fraudTrain.csv from Kaggle) contains over 1.29 million records. Data preprocessing included removing personal identifiers, encoding categorical variables, normalizing numerical features, and splitting the data into 70% training and 30% testing sets. Model performance was evaluated using accuracy, precision, recall, and F1-score.
Experimental results showed that without SMOTE, Random Forest achieved high accuracy (~99%) but relatively low recall (~61%), meaning many fraud cases were missed. After applying SMOTE, recall improved significantly (~75%), along with a better F1-score, indicating improved fraud detection capability. Although precision decreased slightly due to more false positives, the overall balance between detecting fraud and maintaining accuracy improved.
When comparing models with SMOTE applied, Random Forest outperformed Logistic Regression, achieving about 99.4% accuracy, while Logistic Regression achieved about 94% accuracy. This demonstrates that ensemble methods combined with data balancing techniques are more effective for fraud detection.
The study concludes that handling class imbalance is crucial for accurate fraud detection, and Random Forest with SMOTE provides the most reliable results. Future improvements may include using larger real-world datasets, hybrid models (such as XGBoost or LightGBM), deep learning techniques like LSTM, real-time fraud detection systems, and explainable AI for improved transparency.
Conclusion
In this paper, an approach for detecting Credit Card Frauds using Machine Learning techniques such as Logistic Regression and RandomForest has been explained. Most importantly, the approach has addressed the problem of dealing with an imbalanced data problem using SMOTE techniques. From the experimental outcome of the paper, the prediction bias for Legitimate Transactions can clearly be identified when the model is trained using an Imbalanced dataset. On the other hand, the prediction has been highly balanced for the minority class by using SMOTE techniques. Apart from that, the RandomForest approach has high efficacy over Logistic Regression by detecting the Credit Card Frauds more accurately by using the simplest form of decision-making mechanisms. The proposed system has room for exploration but is yet to be applicable for real-time scenarios. Nevertheless, the paper has provided an approach for contributing to the development of an efficient framework for dealing with Credit Card Fraud detection techniques.
References
[1] Alrasheedi, M.A. Enhancing Fraud Detection in Credit Card Transactions: A Comparative Study of Machine Learning Models. Comput Econ (2025). https://doi.org/10.1007/s10614-025-11071-3
[2] V. Sinap, “Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets”, TUJE, vol. 8, no. 2, pp. 196–208, 2024, doi: 10.31127/tuje.1386127.
[3] Khanda Hassan Ahmed, Stefan Axelsson, Yuhong Li, Ali Makki Sagheer, A credit card fraud detection approach based on ensemble machine learning classifier with hybrid data sampling, Machine Learning with Applications, Volume 20, 2025, 100675, ISSN 2666-8270, https://doi.org/10.1016/j.mlwa.2025.100675.
[4] Hafez, I.Y., Hafez, A.Y., Saleh, A. et al. A systematic review of AI-enhanced techniques in credit card fraud detection. J Big Data 12, 6 (2025). https://doi.org/10.1186/s40537-024-01048-8
[5] Albalawi Tahani , Dardouri Samia ,Enhancing credit card fraud detection using traditional and deep learning models with class imbalance mitigation,Frontiers in Artificial Intelligence,Volume 8 – 2025,2025,DOI=10.3389/frai.2025.1643292,ISSN=2624-8212
[6] Baisholan, N., Dietz, J. E., Gnatyuk, S., Turdalyuly, M., Matson, E. T., & Baisholanova, K. (2025). A Systematic Review of Machine Learning in Credit Card Fraud Detection Under Original Class Imbalance. Computers, 14(10), 437. https://doi.org/10.3390/computers14100437
[7] Breskuvien?, D., Dzemyda, G. Enhancing credit card fraud detection: highly imbalanced data case. J Big Data 11, 182 (2024). https://doi.org/10.1186/s40537-024-01059-5