Authors: Soumyajit Nandi, Soumyanil Dey, Soham Bhattacherjee, Shakir Ahmed, Aratrika Ghosh, Moloy Dhar, Nirupam Saha, Biswajit Chaki Choudhury
Certificate: View Certificate
In recent years credit card became one of the essential parts of the people. Sudden increase in E-commerce, customer started using credit card for online purchasing therefore risk of fraud also increases. Instead of carrying a huge amount in hand it is easier to keep credit cards. But nowadays that too becomes unsafe. Nowadays we are facing a big problem on credit card fraud which is increasing in a good percentage. The main purpose is the survey on the various methods applied to detect credit card frauds. From the abnormalities, in the transaction, the fraudulent one is identified. We address this issue in order to implement some machine learning algorithm like random forest, logistic regression in order to detect this kind of fraud. In this paper we increase the efficiency in finding the fraud. However, we discussed and evaluated employee criteria. Currently, the issues of credit card fraud detection have become a big problem for new researchers. We implement an intelligent algorithm which will detect all kind of fraud in a credit card transaction. We handled the problem by finding a pattern of each customer in between fraud and legal transaction. Isolation Forest Algorithm and Local Outlier Factor are used to predict the pattern of transaction for each customer and a decision is made according to them. In order to prevent data from mismatching, all attribute are marked equally.
Nowadays as we can see that there is a huge increase online payment and the payment is mostly done with the help of credit cards. It becomes a big problem for marketing company to overcome with the credit card fraudulent activities. Fraudulent can be done in many ways such as tax return in any other account, taking loans with wrong information etc. Therefore, we need an efficient fraudulent detection model to minimize fraudulent activity and to minimize their losses. There are a huge number of new techniques which provide different algorithms which help in detecting number of credit card fraudulent activity. Basic understanding of these algorithms will help us in making a significant credit card fraudulent detection model. This paper helps us in finding doubtful credit card transaction by proposing a machine learning algorithm.
This type of fraud is happening from past, and till now not much research has done here in this particular area. The types of credit fraud in transactions are bankruptcy fraud, behavioral fraud, counterfeit fraud, application fraud . There are experiments done before on credit card fraudulent activity on basis of meta-learning. There is certain limit of meta-learning. There are two features which is introduced here in our report is True Positive and False alarm. Both these features play an important role in catching fraudulent because the rate of determining fraudulent behavior is quick. For the better performance of model, we need a better classifier. Different classifier can be combined together with help of meta-learning.
A. Problem Statement
Credit card fraud has emerged as a significant challenge in today's digital age. With the increasing reliance on electronic transactions and online shopping, fraudulent activities targeting credit cards have become more sophisticated and prevalent. This problem statement focuses on the detrimental effects of credit card fraud and its impact on individuals, businesses, and the overall economy. Credit card fraud refers to unauthorized and fraudulent use of someone else's credit card information to make purchases or gain access to funds. Fraudsters employ various techniques such as skimming, phishing, identity theft, and card-not-present transactions to obtain sensitive card details and exploit them for financial gain.
The consequences of credit card fraud are far-reaching. For individuals, it can result in financial loss, damaged credit scores, and psychological distress due to the invasion of privacy. Businesses also suffer substantial losses due to charge backs, reputation damage, and increased security measures. Moreover, the economy suffers as a whole, as the costs of fraud are ultimately borne by consumers in the form of higher fees and interest rates.
Addressing the problem of credit card fraud requires a multi-faceted approach involving collaboration between financial institutions, law enforcement agencies, and individuals. Enhanced security measures, robust fraud detection systems, and consumer education are vital to minimizing the risk of credit card fraud and protecting the interests of individuals and businesses alike.
In this paper, we focused on one main algorithms for recommendations: RANDOM FOREST ALGORITHM
A. Random Forest Algorithm
Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset." Instead of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions, and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy and prevents the problem of overfitting.
The below diagram explains the working of the Random Forest algorithm:
B. Data Source
The dataset was retrieved from an open-source website, Kaggle.com. it contains data of transactions that were made in 2013 by credit card users in Europe, in two days only. The dataset consists of 31 attributes, 284,808 rows. 28 attributes are numeric variables that due to confidentiality and privacy of the customers have been transformed using PCA transformation, the three remaining attributes are “Time” which contains the elapsed seconds between the first and other transactions of each attribute, “Amount” is the amount of each transaction, and the final attribute “Class” which contains binary variables where “1” is a case of fraudulent transaction, and “0” is not as case of fraudulent transaction.
Dataset Link: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
III. VARIOUS LIBRARIES USED
A. Benefits of Random Forest
Here are several key points highlighting the use of Random Forest Classification for this purpose:
IV. WORK FLOW
The workflow of credit card fraud detection using Random Forest can be summarized in the following steps:
By following this workflow, credit card issuers and financial institutions can leverage the power of Random Forest to detect and mitigate credit card fraud effectively.
F. Data splitting , Selecting Best Features and Model Train
Feature selection is an important step in machine learning that involves selecting a subset of relevant features from the original set of features. It helps to improve model performance by reducing overfitting, increasing interpretability, and decreasing training time.
SelectKBest is a feature selection technique in scikit-learn that selects the top K features based on univariate statistical tests. It operates on the principle of evaluating the relationship between each feature and the target variable independently and selecting the K features with the highest scores.
Train-Test Split is a common technique used in machine learning to evaluate the performance of a model on unseen data. It involves splitting the available dataset into two separate sets: a training set and a testing set.
The training set is used to train the model, while the testing set is used to evaluate the model's performance on unseen data. By evaluating the model on data that it has not seen during training, we can get an estimate of how well the model generalizes to new, unseen examples.
Random Forest Classifier is an ensemble learning method that combines multiple decision trees to make predictions. Each tree is trained on a random subset of the data and uses a subset of features to make decisions. The final prediction is made by aggregating the predictions of all the individual trees.
In the code below:
In conclusion, credit card fraud detection is a critical area that requires continuous improvement and innovation to combat evolving fraud techniques. Machine learning algorithms, such as Random Forest and deep learning, have shown promise in detecting fraudulent activities by analyzing transaction data and identifying patterns and anomalies. The future of credit card fraud detection holds great potential with advancements in real-time monitoring, unsupervised learning, behavioral biometrics, and collaborative networks. By leveraging these technologies and techniques, financial institutions can enhance their fraud prevention strategies, protect customers from financial losses, and maintain the integrity of the credit card ecosystem. However, it is important to remain vigilant, adapt to new fraud trends, and collaborate across industries to stay one step ahead of fraudsters. The future scope and enhancement for credit card fraud detection include leveraging advanced techniques such as deep learning, unsupervised learning, and behavioral biometrics. Real-time monitoring, enhanced feature engineering, collaborative fraud detection networks, and explainable AI are also areas of focus. These advancements can lead to improved accuracy, faster detection, better understanding of fraud patterns, and increased collaboration among financial institutions, ultimately strengthening credit card fraud prevention and protection.
 \"Credit Card Fraud Detection Using Enhanced Random Forest Classifier for Imbalanced Data\" by AlsharifHasan Mohamad Aburbeian, Huthaifa I Ashqar (2023)  \"GAN-based Data Augmentation for Credit Card Fraud Detection\" by EmilijaStrelcenia, SimantPrakoonwit (2022)  \"Credit card fraud detection using supervised learning approach\" by Rashmi More, Chetan Awati, Suresh Shirgave, Rashmi Deshmukh, Sonam Patil (2021)  \"An efficient study of fraud detection system using Ml techniques\" by S Josephine Isabella, Sujatha Srinivasan, G Suseendran (2020)  \"A cost-sensitive weighted random forest technique for credit card fraud detection\" by Debashree Devi, Saroj K Biswas, Biswajit Purkayastha (2019)  “Research on Credit Card Fraud Detection Model Based on Distance Sum” by Wen-Fang Yu, Na Wang  2009 International Joint Conference on Artificial Intelligence  “Credit Card Fraud Detection Web Application using Streamlit and Machine Learning” by Vipul Jain;H Kavitha, S Mohana Kumar, 2022 IEEE International Conference on Data Science and Information System (ICDSIS)  “Design and Implementation of Different Machine Learning Algorithms for Credit Card Fraud Detection” by Aditi Singh, Anoushka Singh, Anshul Aggarwal, Anamika Chauhan  2022 International Conference on Electrical,Computer, Communications and Mechatronics Engineering (ICECCME)
Copyright © 2023 Soumyajit Nandi, Soumyanil Dey, Soham Bhattacherjee, Shakir Ahmed, Aratrika Ghosh, Moloy Dhar, Nirupam Saha, Biswajit Chaki Choudhury. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.