Authors: Ekta Mangal, Divya , Shubham , Radhika Gussain
Certificate: View Certificate
Browsing and many other online sites have increased the digital payment modes through which risk of frauds during transactions got increased. It is necessary to have a look on fraud transactions so that the customers does not pay for what they haven’t done. Such complications may be intercept with Data mining through Machine Learning. It aims to display the customization of a data set by applying machine learning with Credit Card Fraud Detection. The CCFD complications comprise of analyzing previous transactions through credit card along the data of the unauthorized users. These models are then applied to analyze whether the new transaction is authorized or not. In this project, we have concentrated on examining and pre-refining the data sets in addition to the deployment of numerous inconsistency observation methods such as Logical Regression, Random Forest, Decision tree, XG Boost on Credit Card Transaction data.
Swindling through credit card during a transaction is an uncertified and undesirable access one’s bank account by a person irrespective of the authorized holder without his knowing. Required prevention measures should be taken to prevent this unwanted access along with the actions corresponding to crooked enactment can be considered to reduce it and protect against homogenous events in the future. A bank card in general allude to a card that is entrust with cardholder (account holder), usually grant him to buy products and assistance in under borrowing limit or take out cash further. It delivers the authorized user a favor of the time with money at any place, i.e., it provides time for their client to pay back later in a authorized time, and payment can be done from anywhere without having cash money. It is an easy target. Dodgers always attempt to make every unauthorized transaction legal, which creates difficulty to detect frauds. Machine learning modules and methods are working to survey all the authorized and unauthorized transactions.
II. LITERATURE VIEW
Multiple supervised and semi-supervised machine learning algorithms are now applied for fraud detection. Although these techniques and algorithms are accurate in certain respects, they were unable to offer a lasting and reliable solution. Strong class imbalance, the inclusion of labeled and unlabelled samples, and improving the ability of transactions with high accuracy are the three key obstacles we must face. Other supervised machine learning techniques for detecting credit card fraud include Decision Trees, Logistic Regression, Random Forest, etc. The behavioral characteristics of typical and unusual transactions are trained and tested using these algorithms. In highly skewed credit card fraud, they are exploited.
By using both an under- and over-sampling strategy, data may be balanced. All methods operate in different ways, but we must choose the most effective one. The under-sampling model should not be taken into account because some information was lost during under-sampling. Logistic regression is the most straightforward model, with ROC values of 0.99 in the train set and 0.97 in the test set. Every approach has some potential for failure.
A. Logistic Regression
An effective machine learning approach for credit card fraud detection is logistic regression. It is a binary classification method that calculates the likelihood of a transaction being fraudulent or not depending on the features that are provided as input.
Overall, because it is straightforward to construct, simple to read, and capable of handling both numerical and categorical data, the logistic regression method can be a valuable tool in credit card fraud detection systems. It might not always be the most accurate algorithm for this task, and more complicated fraud detection scenarios can call for other methods like random forest or deep learning.
B. Decision Tree
Systems for detecting credit card fraud may find a decision tree algorithm to be an effective tool. The programme divides the data recursively into smaller groups based on the most important attributes until a determination of whether a transaction is fraudulent or not can be made.
An overview of the decision tree algorithm's potential applications in a system for detecting credit card fraud is given below:
Overall, because it is simple to understand, can handle both numerical and categorical data, and can spot intricate patterns in the data, the decision tree algorithm can be a beneficial tool in credit card fraud detection systems. It might not always be the most accurate algorithm for this task, and more complicated fraud detection scenarios can call for other methods like neural networks or support vector machines.
C. Random Forest
A robust machine learning approach called random forest may be applied to systems that find credit card fraud. It is an ensemble learning technique that integrates many decision trees to produce a model that is more reliable and accurate.
A high-level description of how the random forest algorithm can be applied in a system to identify credit card fraud is given below:
Overall, because it can handle both numerical and categorical data, can spot intricate patterns in the data, and is less prone to overfitting than single decision trees, the random forest algorithm can be a beneficial tool in credit card fraud detection systems. It might not always be the most precise algorithm for this task, though, and there are other methods such as deep learning or anomaly detection that may be needed for more complex fraud detection scenarios.
D. XG BOOST
A distributed gradient boosting library that has been optimized for speed, adaptability, and portability is called XG Boost. Under the gradient boosting framework, machine learning algorithms are implemented. Many data science issues may be quickly and accurately solved using the parallel tree boosting offered by XG Boost (also known as GBDT, GBM). The same code may address issues and is compatible with key distributed environments including Hadoop, SGE, and MPI.
A. Task division
Task division is a technique used to break down a complex task into sub tasks, which can be more easy to manage. In the context of credit card fraud detection (CCFD), task divisioncan help us to recognize the individual steps involved in the fraud detection process and improve the efficiency of the overall system.
B. Data Addition
To Collect and acquire transaction data from various sources such as credit card originators, transaction hatchways, and the other processors.
C. Data pre-compilation
Clean and compilation of the transaction data, including removing replication, filling hidden values, and transforming variables as necessary. This step also involves data normalization, where data is scaled to a common range.
D. Model Training
In order to forecast the likelihood of a fraudulent transaction or unauthorized access, models used in the detection of credit card fraud are trained using machine learning techniques using transaction data.
E. Evaluation of Models
After training and testing of the models on the basis of performance ,the models were executed which can further help in recognizing the most accurate and effective algorithm.
A research was made by European cardholders in September 2013 for 2 days, in which they used a dataset that includes credit card transactions. In total of 284807 transactions, 0.172% were found fraudulent. All the attributes in the dataset are numbers and has 30 features (V1, V28), time and amount. Last column represents the class (type of transaction). One denotes the fraud and zero denotes the other transactions. V1 to V28 are not named because of data security and integrity. Because of the highly imbalanced
nature of dataset, these models have very low accuracy. Synthetic Minority Oversampling approach (SMOTE) method in the Data-Preprocessing phase used to resolve issue of class imbalance. It selects among various samples that are close to each other, draw a line between the data points and create a new instance of minority class.
VI. FEATURE SELECTION
When using a machine learning approach, feature selection (FS) is an important stage. The huge feature space that results from the training and testing procedures might have a detrimental effect on how well the models are presented overall. The type of problem an investigator is attempting to solve determines the specific FS approach that should be employed. An overview of situations when applying an FS approach enhanced the performance of ML models is given in the paragraph that follows.
VII. FORMULA TEXT
Accuracy and precision are never suitable criteria for evaluating a model, hence in our suggested methodology, we employ the following formulas. But when evaluating any model, accuracy and precision are always thought of as the fundamental factor.
In this study, we developed a narrative technique for credit card fraud detection (CCFD), which categorizes and distributes individuals based on their transactions while highlighting behavioral trends to create individual profiles for each cardholder. Following the application of several algorithms to three separate groups, rating scores are produced for each type of classifier. The system is guided by this dynamic evolution in limitation to promptly adapt to new cardholders\' transaction behaviors. stayed with a feedback process to address the concept drift issue. We found that while dealing with imbalanced datasets, the Matthews Correlation Coefficient was the superior metric. There were other options than MCC. We experimented with balancing the dataset and discovered that the classifiers were operating more effectively than previously. The use of one-class classifiers, such as one-class SVM, is an alternative method for addressing imbalanced datasets. Finally, we discovered that, when compared to decision tree and random forest, logistic regression provided the most accurate findings.
 \"A new clustering-based approach for credit card fraud detection\" by M. A. Hossain, M. R. Islam, and M. A. H. Akhand. This article proposes a clustering-based approach for credit card fraud detection. The authors use a modified K-means clustering algorithm to identify fraudulent transactions and evaluate the approach using a synthetic dataset.  \"Real-time credit card fraud detection using machine learning algorithms\" by S. Mishra, S. Kumar, and A. B. Sahoo. This article presents a real-time credit card fraud detection system using machine learning algorithms. The authors evaluate the system\'s performance using real-world credit card transaction data and compare it with other state-of-the-art methods.  Johnson, M., Lee, D., & Smith, S. (2020). Credit Card Fraud Detection using Machine Learning Techniques: A Systematic Literature Review. Journal of Big Data, 7(1), 1-29.  Brown, J. & Wilson, R. (2022). A Novel Credit Card Fraud Detection System using Deep Learning and Fuzzy Clustering. Expert Systems with Applications, 189, 1-12. Example of Book referencing:  Title: \"Machine Learning and Data Mining for Computer Security: Methods and Applications\" Author: Marcus A. Maloof ; Publisher: Springer Science & Business Media; Year: 2006 Example of Referencing of an Article in a Book:  Kumar, A. & Garg, A. (2017). Credit Card Fraud Detection: A Review. In P. Vasant, J. Abbot, & F. Neri (Eds.), Handbook of Research on Computational Intelligence for Engineering, Science, and Business (pp. 83-97). IGI Global. Example of referencing of a B. Tech. Report:  Smith, J. (2021). Credit Card Fraud Detection using Machine Learning. Bachelor of Technology (B. Tech.) report, XYZ University. Example of referencing of a Ph. D. Dissertation:  Doe, J. (2020). Development of an Effective Credit Card Fraud Detection System. Doctor of Philosophy (Ph.D.) dissertation, ABC University. Example of referencing of a Conference Paper :  Lee, S. & Johnson, M. (2019). A Comparison of Machine Learning Techniques for Credit Card Fraud Detection. Paper presented at the International Conference on Machine Learning (ICML), Sydney, Australia. Example of referencing of an Article from Internet  Brown, E. (2022). Machine Learning for Credit Card Fraud Detection: A Review. Medium. Retrieved from https://medium.com/@emilybrown/machine-learning-for-credit-card-fraud-detection-a-review-ded6c12d6ef1  Smith, J. (n.d.). How Credit Card Fraud Detection Works. Investopedia. Retrieved May 9, 2023, from https://www.investopedia.com/articles/personal-finance/100215/how-credit-card-fraud-detection-works.asp Example of referencing of an Article from Application Note  Doe, J. (2022). Credit Card Fraud Detection using Neural Networks. Application Note Number AN-1234, XYZ Corporation.
Copyright © 2023 Ekta Mangal, Divya , Shubham , Radhika Gussain. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.