Leveraging Machine Learning Techniques and NLP for Identifying Fake Accounts in Social Networks

Authors: V. Siddhartha, K. Balakrishna Maruthiram

DOI Link: https://doi.org/10.22214/ijraset.2025.71827

Abstract

The rise of malicious entities on social networking platforms has led to a significant increase in the creation and use of fake accounts, which can be exploited for misinformation, cybercrime, and social manipulation. To address this challenge, this project presents a machine learning-based approach for the detection of fake accounts using both structured metadata and natural language processing techniques. This project utilizes a comprehensive feature extraction process, considering username patterns, profile statistics, and textual metadata, followed by preprocessing and normalization. A comparative analysis of multiple classifiers - Random Forest, Support Vector Machine (SVM), Logistic Regression, and LightGBM - is conducted to evaluate detection accuracy. The models were trained on a labeled dataset and assessed on unseen test data using performance metrics including accuracy, precision, recall, and F1-score. Results indicate that the ensemble-based approaches outperform traditional models, with Random Forest and LightGBM achieving the highest accuracy. The solution is integrated into a user-friendly Streamlit application that allows real-time prediction and visual performance comparison of all models, making it suitable for non-technical users and potential deployment by platform administrators

Introduction

The rapid rise of social networking platforms has transformed communication but also led to the widespread creation of fake accounts used for scams, misinformation, and manipulation. Traditional detection methods like manual reporting are insufficient due to the scale and evolving tactics. This project proposes an automated fake account detection system combining supervised machine learning models (Random Forest, SVM, Logistic Regression, LightGBM) and NLP-based feature extraction from profile data (usernames, follower ratios, descriptions, activity).

Key contributions include a robust pipeline integrating numerical and textual features, comparative analysis of models, and deployment of a scalable, interpretable detection framework. The dataset was sourced from a public Kaggle repository, and the system processes data through cleaning, feature engineering, model training, and evaluation. Performance metrics show LightGBM as the best-performing model with 94.2% accuracy and a strong balance of precision and recall.

The system is implemented in a user-friendly application for real-time fake account prediction, aiming to improve social media trustworthiness and reduce cyber threats.

The related work reviewed various ML approaches for fake profile detection across platforms like LinkedIn, Twitter, and Facebook, including graph clustering and hybrid content-behavior analysis methods.

Conclusion

In this study, a comparative analysis of multiple supervised machine learning algorithms was conducted to effectively detect fake accounts on social networking platforms. The models evaluated - Logistic Regression, Support Vector Machine (SVM), Random Forest, and LightGBM were assessed using a real-world dataset and tested across standard performance metrics, including accuracy, precision, recall, and F1 score. Among these, LightGBM emerged as the most effective model, consistently achieving the highest scores across all evaluation parameters. Its ability to capture complex patterns, handle large-scale feature interactions, and minimize false classifications makes it highly suitable for real-time social media applications. Ensemble-based models such as LightGBM and Random Forest demonstrated significantly better performance than traditional models, confirming their robustness in scenarios involving noisy and imbalanced data. The results clearly illustrate that machine learning, particularly boosting-based ensemble techniques, provides a powerful approach for identifying fake profiles, thereby contributing to the enhancement of security and trustworthiness in online social ecosystems. Future work will explore the integration of Natural Language Processing (NLP) for analyzing textual features such as bios and posts, as well as the use of deep learning architectures to further boost detection performance.

References

[1] S. Adikari and K. Dutta, “Identifying fake profiles in LinkedIn,” arXiv preprint arXiv:2006.01381, 2020. [2] J. Kaubiyal and A. K. Jain, “A feature-based approach to detect fake profiles in Twitter,” Proc. of the 3rd Int. Conf. on Big Data and Internet of Things, pp. 135–139, 2019. [3] F. Ahmed and M. Abulaish, “An MCL-based approach for spam profile detection in online social networks,” 2013. [4] D. Ramalingam and V. Chinnaiah, “Fake profile detection techniques in large-scale online social networks: A comprehensive review,” Computers & Electrical Engineering, vol. 65, pp. 165–177, 2018. [5] G. Stringhini, C. Kruegel, and G. Vigna, “Detecting spammers on social networks,” Proceedings of the 26th Annual Computer Security Applications Conference, pp. 1–10, 2010. [6] A. Cresci, R. Di Pietro, M. Conti, and M. Petrocchi, “Fame for sale: Efficient detection of fake Twitter followers,” Decision Support Systems, vol. 80, pp. 56–71, Dec. 2015. [7] A. Almaatouq, C. Radaelli, and E. Pentland, “Detecting malicious accounts in online social networks: A review of machine learning techniques,” ACM Computing Surveys (CSUR), vol. 54, no. 5, pp. 1–36, 2021. [8] S. Fire, G. Katz, and Y. Elovici, “Strangers intrusion detection–Detecting spammers and fake profiles in social networks based on topology anomalies,” Human-centric Computing and Information Sciences, vol. 4, no. 1, pp. 1–20, 2014. [9] D. M. Freeman, “Using Naive Bayes to detect spammy names in social networks,” in Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security (AISec), pp. 3–12, 2013. [10] F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida, “Detecting spammers on Twitter,” in Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS), 2010. [11] A. Boshmaf, D. Logothetis, G. Siganos, J. L. Roberts, and M. Van Steen, “Integro: Leveraging victim prediction for robust fake account detection in OSNs,” in NDSS 2015, pp. 1–15. [12] H. Sedhai and A. Sun, “Semi-supervised spam detection in Twitter stream,” IEEE Transactions on Computational Social Systems, vol. 5, no. 1, pp. 169–175, 2018. [13] S. M. Saif, F. A. Batarfi, and Y. A. Alotaibi, “Fake account detection in social networks using supervised machine learning algorithms,” in 2020 International Conference on Computer and Information Sciences (ICCIS), pp. 1–6, IEEE. [14] S. Kudugunta and E. Ferrara, “Deep neural networks for bot detection,” Information Sciences, vol. 467, pp. 312–322, 2018. [15] D. Wang, Y. Hu, and J. Wang, “Detecting spam accounts on social networks based on content and social interaction,” International Journal of Distributed Sensor Networks, vol. 13, no. 9, pp. 1–11, 2017.

Copyright

Copyright © 2025 V. Siddhartha, K. Balakrishna Maruthiram . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET71827

Publish Date : 2025-05-29

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here