Spammer Detection and Fake User Identification in Social Networks

Authors: Mr. Boya Vishnu Vardhan, Dr. Girish Kumar D

DOI Link: https://doi.org/10.22214/ijraset.2025.74144

Abstract

Online Social Networks (OSNs) have evolved from simple communication platforms into essential tools for public discourse,informationsharing,anddigital engagement. With their rapid expansion, these networks have become breeding grounds for malicious actors who create fake accounts and spammers to spread misinformation,manipulatepublicopinion, and disrupt authentic communication. Traditional detection methods such as manual moderation and rule-based filters have proven ineffective against sophisticated, automated spam behavior. Thisresearchproposesamachinelearning- based framework for detecting spammers and fake users by analyzing behavioral features such as tweet frequency, follower- following ratios, hashtag density, and temporalactivitypatterns.Dataiscollected via APIs and web scraping tools, preprocessed, and used to train classification models including Logistic Regression, SVM, and k-NN. Principal Component Analysis (PCA) is applied for dimensionality reduction, and model performance is evaluated using accuracy, precision,recall,F1-score,andROC-AUC metrics. The experimental results demonstrate high classification accuracy and robustness, validating the system’s potential for real- time integration into social media monitoring tools. This framework offers a scalable, automated, and intelligent solutiontoenhancingtrustandauthenticity in online social environments.

Introduction

I. Introduction & Background

Online Social Networks (OSNs) like Twitter, Facebook, Instagram, and LinkedIn have become major platforms for communication, news, politics, and brand engagement.
The widespread use of OSNs has led to a rise in fake accounts and spammers, who:
- Spread misinformation
- Conduct phishing attacks
- Promote divisive content
- Manipulate trends using bots and coordinated campaigns
Traditional detection methods (manual moderation, keyword filters) are ineffective due to evolving tactics using AI-generated content and identity masking.
Hence, there's an urgent need for automated, intelligent, and scalable solutions.

II. Role of Machine Learning

Machine Learning (ML) helps detect fake users by analyzing behavior-based features:
- Posting frequency, account age
- Follower-following ratios
- Content patterns, hashtag use
Dimensionality reduction (e.g., PCA) improves interpretability and performance.
Semi-supervised learning enables detection even with limited labeled data.

III. Literature Survey

Several prior works inform the framework:

Benevenuto et al.: Supervised learning with behavior features for spam detection on Twitter.
Yang et al.: Adaptive models needed due to evolving spammer tactics.
Lee et al.: Long-term pattern recognition reveals coordinated botnets.
Stringhini et al.: Multi-platform content and structure-based detection.
Ahmed & Abulaish: Statistical models based on behavioral outliers.
Chu et al.: Classifying users into humans, bots, and cyborgs.
Ferrara et al.: Large-scale analysis of social bots and misinformation.
Zhang et al.: RNN-based deep learning for contextual bot detection.
DARPA Bot Challenge: Benchmarked and advanced detection technologies.

These studies laid the groundwork for the current modular, adaptable ML-based detection system.

IV. Proposed Framework

???? Workflow (Fig. 1)

Data Collection
- Using Twitter API, Tweepy, and Snscrape
- Collects metadata, tweets, mentions, activity patterns
- Labels from datasets: Cresci 2017, DARPA, Kaggle
Preprocessing
- Cleans missing values, removes noise (URLs, emojis)
- Normalizes numerical data (e.g., follower counts)
- Encodes categorical variables
- Derives behavioral features (e.g., posting entropy)
Feature Selection & Dimensionality Reduction
- Uses Chi-Square, ANOVA, correlation heatmaps
- Applies PCA to reduce dimensions and visualize clusters
Model Training
- Algorithms used:
  - Logistic Regression (baseline)
  - SVM (best performance)
  - k-NN
  - Optional: Random Forest
- Trained using 80:20 split, 10-fold cross-validation
- Hyperparameter tuning with GridSearchCV
Evaluation
- Metrics: Accuracy, Precision, Recall, F1-Score, ROC-AUC
- Tools: Confusion matrices, ROC curves, feature importance charts
Semi-Supervised Extension (Optional)
- Label Propagation used when labeled data is scarce
User Interface (Optional)
- Built with Streamlit
- Allows real-time predictions and visual output

V. Evaluation & Results

???? Model Performance

Support Vector Machine (SVM):
- Accuracy: 91%
- Precision: 89%
- Recall: 92%
- F1-Score: 0.905
- ROC-AUC: 0.93
Logistic Regression and k-NN performed well but below SVM.
Confusion matrix and ROC curve show SVM’s high reliability and low error rates.

Conclusion

The proposed framework for spammer detection and fake user identification in social networks presents a structured and effective solution to one of the most pressing challenges in digital communication. By leveraging behavioral analytics and machine learning, the system successfullyaddressesthelimitationsoftraditional spam detection methods, which often rely on static rules or manual reporting. The end-to-end pipeline—from data collection through preprocessing, feature engineering, dimensionality reduction,modeltraining,andevaluation— ensures a comprehensive and modular approach capable of adapting to evolving spamtactics.Theuseofpubliclyaccessible social media APIs and explainable classificationmodelsmakesthesystemnot only accurate but also reproducible and easy to deploy in real-time environments. The results obtained through extensive experimentation validate the system’s ability to accurately distinguish between legitimate and malicious accounts. Models such as Support Vector Machines have demonstrated high precision, recall, and AUC values, confirming the robustness of the selected features and the effectiveness of the training process. Visual evaluation using confusion matrices and ROC curves further reinforces the framework’s suitability for large-scale, automated deployment. The system achieves a well- balanced trade-off between accuracy and interpretability,makingitvaluablenotonly for researchers but also for cybersecurity professionals and social media platforms seeking reliable detection mechanisms. Looking ahead, the framework can be further strengthened through several enhancements. Integrating deep learning modelssuchasRecurrentNeuralNetworks (RNNs) could improve performance on more complex text and temporal patterns. Real-time detection capability can be introduced using streaming data pipelines, enabling instant flagging of suspicious activity. Incorporating Natural Language Processing (NLP) for semantic analysis of user-generatedcontentwouldaddanotherlayer of insight, especially in identifying harmfulormisleadinginformation.Finally, cross-platform integration and support for multilingual analysis would expand the system\'s applicability in diverse global contexts, making it a holistic solution for combating digital misinformation and abuse.

References

[1] Benevenuto, F., Magno, G., Rodrigues, T., &Almeida,V.(2010).DetectingspammersonTwitter. Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS). [2] Yang, C., Harkreader, R., & Gu, G. (2011). Die free or live hard? Empirical evaluation and new design for fighting evolving Twitter spammers. Recent Advances in Intrusion Detection (RAID). [3] Lee,K.,Eoff,B.D.,&Caverlee,J.(2011).Seven monthswiththedevils:Along-termstudyofcontent polluters on Twitter. ICWSM. [4] Stringhini,G.,Kruegel,C.,&Vigna,G.(2010). Detecting spammers on social networks. ACSAC. [5] Almaatouq, A., Radaelli, L., Pentland, A., & Shmueli, E. (2016). Are you your friends\' friend? Poor perception of friendship ties limits the ability to promote behavioral change. PLoS ONE. [6] Echeverría, J., & Zhou, S. (2017). Discovery of the Twitter botnet “Star Wars”. arXiv preprint arXiv:1701.02405. [7] Chu,Z.,Gianvecchio,S.,Wang,H.,&Jajodia,S.(2012).DetectingautomationofTwitteraccounts: Areyouahuman,bot,orcyborg?IEEETransactions on Dependable and Secure Computing, 9(6), 811- 824. [8] Dickerson,J.P.,Kagan,V.,&Subrahmanian,V.S.(2014).UsingsentimenttodetectbotsonTwitter: Are humans more opinionated than bots? IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). [9] Ahmed, F., &Abulaish, M. (2013). A generic statisticalapproachforspamdetectioninOnline Social Networks. Computer Communications, 36(10), 1120–1129. [10] Ferrara,E.,Varol,O.,Davis,C.,Menczer,F.,& Flammini, A. (2016). The rise of social bots. Communications of the ACM, 59(7), 96–104. Social Networks. Computer Communications, 36(10), 1120–1129. [11] Ferrara,E.,Varol,O.,Davis,C.,Menczer,F.,& Flammini, A. (2016). The rise of social bots. Communications of the ACM, 59(7), 96–104.

Copyright

Copyright © 2025 Mr. Boya Vishnu Vardhan, Dr. Girish Kumar D. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET74144

Publish Date : 2025-09-07

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here