Fraud Detection in UPI Payments Using Tabular Machine Learning Models

Authors: Renu Chaudhary, Sakshi Singh, Riddhima Singh, Husain Zaidi, Kanishka Jain

DOI Link: https://doi.org/10.22214/ijraset.2025.74922

Abstract

With the rapid growth of digital payment systems in India, the Unified Payments Interface (UPI) has become one of the main platforms for instant money transfers. However, as the number of transactions increases, the chances of fraud have also risen. This study presents a machine learning–based system to detect fraudulent UPI transactions using the CatBoost algorithm. The model uses important features related to user behavior, transaction details, and device information to identify whether a transaction is genuine or fraudulent. CatBoost is chosen because it works well with categorical data, provides clear results, and performs strongly on tabular datasets. The experimental results show that the model achieves a high AUC (Area Under the Curve), proving its strong ability to detect fraud. The trained model is also deployed in a Streamlit web application, allowing users to check fraud risk in real time through a simple interface. This system connects advanced machine learning with practical use, providing a reliable and scalable solution to improve the safety of UPI payments.

Introduction

India’s digital finance ecosystem, powered by the Unified Payments Interface (UPI), has revolutionized payments but faces increasing cybersecurity threats, including phishing, spoofing, and fraudulent links. Traditional rule-based fraud detection systems are inadequate against evolving attacks. Machine learning (ML), particularly ensemble methods like XGBoost, has improved fraud detection by learning patterns in transaction data, but challenges remain due to UPI’s highly imbalanced and categorical datasets, which can reduce model interpretability.

CatBoost, a gradient boosting algorithm, offers distinct advantages for UPI fraud detection: it handles categorical variables efficiently, mitigates class imbalance, reduces manual preprocessing, and enhances interpretability. Despite this, its application in UPI fraud detection is underexplored. This study proposes an end-to-end fraud detection framework using CatBoost, compares it with XGBoost via ROC-AUC, precision, recall, and F1-score, and deploys a real-time Streamlit-based fraud detection app to demonstrate operational feasibility.

Literature Review & Challenges:

Early fraud detection relied on rule-based or logistic regression methods, which struggled with non-linear patterns and high-dimensional data.
Ensemble methods (Random Forest, XGBoost, LightGBM) outperform classical models but often ignore real-world deployment issues such as latency, scalability, and adaptability.
Persistent gaps include limited focus on UPI-specific behavior, underutilization of tabular ML models like CatBoost, and lack of deployable real-time systems.

Methodology:

A synthetic UPI dataset of 100,000 transactions was generated, reflecting realistic class imbalance (fraud rate ~0.628%).
Data preprocessing included handling missing values, outliers (via IQR), and engineering behavioral features (e.g., number of transactions in 24 hours, average transaction amount over 7 days, device change flags, account age).
CatBoost’s native support for categorical features eliminated the need for one-hot or label encoding. The dataset was split 80:20 for training and testing.
The resulting model was deployed in a real-time, interpretable Streamlit app (“Quicki”), bridging academic research and operational fraud detection.

Contribution:
The study demonstrates CatBoost’s effectiveness for UPI fraud detection, addresses gaps in handling categorical and imbalanced data, and provides a deployable, interpretable system for real-world application, enhancing trust and resilience in India’s digital payment ecosystem.

Conclusion

This study presents a CatBoost-based fraud detection framework for Unified Payments Interface (UPI) transactions, designed to identify anomalous and high-risk activities in real time. Using a synthetically generated dataset of 100,000 transactions with realistic behavioral and contextual features, the model demonstrated strong predictive capability. The CatBoost classifier achieved a ROC-AUC score of 0.8696, precision of 0.49, and recall of 0.60 for the minority (fraudulent) class, reflecting a balanced trade-off between fraud detection sensitivity and false-positive control. Compared to XGBoost, CatBoost exhibited superior performance in handling categorical features and imbalanced data, achieving higher recall and overall interpretability. The inclusion of engineered features such as Device_Change_Flag, Num_Txns_Last_24H, and IP_Risk_Score significantly improved fraud detection accuracy by capturing behavioral anomalies. In future work, the model can be further enhanced by incorporating advanced class imbalance handling techniques such as Synthetic Minority Oversampling (SMOTE), focal loss optimization, or dynamic threshold adjustment to improve recall for rare fraud events. Moreover, integrating temporal and graph-based network features could enable detection of coordinated fraud rings, while leveraging deep learning architectures for sequential transaction modeling. Real-time deployment through scalable APIs on UPI platforms will further strengthen proactive fraud prevention in digital payment ecosystems.

References

[1] S. Ghosh, S. D. Kesar, and M. Mukherjee, “Digital payments, UPI and financial inclusion in India: An empirical analysis,” Financial Innovation, vol. 8, no. 1, pp. 84-97, 2022. [2] K. Gupta and R. Kohli, “Cyber frauds in digital payments: Phishing, spoofing and the UPI challenge,” Journal of Payments Strategy & Systems, vol. 15, no. 2, pp. 105-113, 2021. [3] R. Singh and P. Sharma, “Challenges of rule-based fraud detection in rapidly evolving digital payments,” Journal of Banking Technology, vol. 12, pp. 45-53, 2020. [4] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., pp. 785-794, 2016. [5] T. He, J. Yang, S. Chen, et al., “Handling class imbalance and categorical data in financial fraud detection,” IEEE Access, vol. 9, pp. 113572-113582, 2021. [6] L. Prokhorenkova, G. Gusev, A. Vorobev, A. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features,” in Advances in Neural Information Processing Systems, vol. 31, pp. 6638-6648, 2018. [7] V. Kumar, S. Srinivasan, and N. Menon, “Deploying real-time ML-powered fraud detection using Streamlit for financial institutions,” in Proc. IEEE Int. Conf. FinTech, pp. 347-354, 2023. [8] P. Jeyachandran, “Leveraging Machine Learning for Real-Time Fraud Detection in Digital Payments,” SSRN Electronic Journal, 2024. [9] Y. Ding, H. Li, X. Zhou, and J. Zhang, “Digital Payment Fraud Detection Methods in Digital Ages and Industry 4.0,” Computers & Electrical Engineering, vol. 102, 2022. [10] N. Lingareddy, “Enhancing Digital Payment Security: UPI Fraud Detection,” IEEE Xplore Digital Library, 2025. [11] R. E. Sorace, V. S. Reinhardt, and S. A. Vaughn, “High-speed digital-to-RF converter,” U.S. Patent 5 668 842, Sept. 16, 1997.

Copyright

Copyright © 2025 Renu Chaudhary, Sakshi Singh, Riddhima Singh, Husain Zaidi, Kanishka Jain. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET74922

Publish Date : 2025-10-31

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here