Cyberbullying Detection on Social Media using AI

Authors: A. R. Jariya Begum, D. Steffy, B. Monisha

DOI Link: https://doi.org/10.22214/ijraset.2025.70045

Abstract

Cyberbullying on social media platforms has emerged as a critical societal challenge, causing significant psychological and emotional harm to individuals, particularly among vulnerable populations. Early detection and prevention are crucial to mitigating its impact. This paper presents a Natural Language Processing (NLP)- based approach for the automated detection of cyberbullying in user-generated content on social media. We propose a system that preprocesses textual data through techniques such as tokenization, stemming, and stopword removal, and subsequently transforms it using feature extraction methods like TF-IDF and word embeddings. A supervised machine learning model, trained on annotated datasets, classifies online comments into cyberbullying and non- cyberbullying categories. Our experimental results demonstrate that NLP techniques, when combined with appropriate machine learning algorithms such as Logistic Regression and Support Vector Machines (SVM), achieve high accuracy in identifying harmful content. The proposed framework offers a scalable solution to assist social media platforms in monitoring user behavior, ensuring safer online environments, and fostering positive digital interactions. Future work includes enhancing detection capabilities through deep learning methods and expanding the system to multilingual contexts.

Introduction

1. Overview:

The widespread use of social media has led to the rise of cyberbullying, which can have serious psychological and emotional effects—especially on youth. Due to the limitations of rule-based systems in detecting online abuse, there is increasing interest in machine learning (ML) and natural language processing (NLP) methods for effective and scalable cyberbullying detection.

2. Objective:

This research proposes a Logistic Regression-based system for classifying social media comments as bullying or non-bullying, using TF-IDF features and NLP preprocessing. It aims for high accuracy, interpretability, and efficiency, making it suitable for integration into real-time content moderation tools.

3. Methodology:

Dataset:
- Sourced from Twitter, YouTube, and Formspring.
- Manually labeled as bullying or non-bullying.
- Class imbalance addressed with stratified sampling and class weighting.
Preprocessing:
- Steps: Lowercasing, tokenization, stop word removal, stemming/lemmatization, special character and noise removal.
- Tools: Python, NLTK.
Feature Extraction:
- Used TF-IDF to convert text into numerical format.
- Included n-grams to capture abusive phrase patterns.
Model Selection:
- Logistic Regression chosen for its simplicity, strong binary classification performance, and ability to handle sparse data.
- Regularized with L2 and balanced with class weights.
Training & Evaluation:
- 5-fold stratified cross-validation for robustness.
- Metrics used: Accuracy, Precision, Recall, F1-Score.

4. Results:

Metric	Score (%)
Accuracy	92.5
Precision	89.3
Recall	81.6
F1-Score	85.3

High precision ensures low false positives.
Moderate recall indicates most bullying comments are detected, though some are missed.
F1-score reflects strong overall performance.
Confusion Matrix:
- Low false positives (84), which avoids incorrect labeling.
- False negatives (156) highlight areas for improvement in recall.

5. Tools Used:

Languages/Libraries: Python 3.9, Scikit-learn, NLTK, Pandas, NumPy
Development Environment: Jupyter Notebook / VS Code
Future Scope: Flask-based web app for deployment

6. Related Work Summary:

Earlier methods used keyword-based or rule-based systems with limited context handling.
Recent research uses ML models like SVM, Naïve Bayes, CNNs, RNNs, and BERT for improved performance.
This work balances accuracy and efficiency using a lightweight, interpretable model.

Conclusion

The experimental results demonstrate that the proposed cyberbullying detection system is both effective and efficient in identifying harmful textual content on social media platforms. Leveraging classical natural language processing techniques and a Logistic Regression model, the system achieved an accuracy of 92.5%, with an F1-score of 85.3%, highlighting a strong balance between precision and recall. The system performed particularly well in distinguishing non-bullying comments from bullying ones, with a low false positive rate. This is crucial for ensuring that innocent users are not wrongly flagged, which could otherwise result in unnecessary censorship or reputation damage. The relatively high precision (89.3%) ensures that most of the flagged content is genuinely offensive, thereby increasing trust in the system\'s outputs. However, the recall value of 81.6% indicates that some bullying content still goes undetected, often due to implicit or context-dependent language that traditional models may struggle to interpret. Compared to other machine learning models tested—such as Naïve Bayes, Support Vector Machines (SVM), and Random Forests—the Logistic Regression model offered the best trade-off between performance and interpretability. Furthermore, it required less computational overhead, making it a suitable candidate for real-time deployment in low-resource environments such as browser extensions or mobile apps.Despite its effectiveness, the model does have limitations. First, it relies purely on textual data and lacks the ability to interpret sarcasm, coded language, or content influenced by images or emojis. Second, the system’s performance is dependent on the quality and diversity of the training dataset. A lack of regional or multilingual data may reduce its effectiveness across different communities and cultures.

References

[1] N. Gitari, Z. Zuping, H. Damien and J. Long, \"A lexicon-based approach for hate speech detection,\" International Journal of Multimedia and Ubiquitous Engineering, vol. 10, no. 4, pp. 215–230, 2015. [2] M. Waseem and D. Hovy, \"Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter,\" in Proceedings of the NAACL Student Research Workshop, San Diego, CA, USA, pp. 88–93, 2016. [3] S. K. Saha, S. Senapati, S. Saha and D. Niyogi, \"A Comprehensive Study of Cyberbullying Detection Using Machine Learning and Deep Learning Techniques,\" in Procedia Computer Science, vol. 167, pp. 2441–2450, 2020. [4] M. Fortuna and N. Nunes, \"A Survey on Automatic Detection of Hate Speech in Text,\" ACM Computing Surveys (CSUR), vol. 51, no. 4, pp. 1–30, 2018. [5] A. Schmidt and M. Wiegand, \"A Survey on Hate Speech Detection using Natural Language Processing,\" in Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, Valencia, Spain, pp. 1–10, 2017. [6] J. Yin, Z. Kontostathis and L. Edwards, \"Detection of Harassment on Web 2.0,\" in Proceedings of the Content Analysis in the WEB 2.0 (CAW2.0) Workshop at WWW, Raleigh, NC, USA, 2009. [7] K. Dinakar, R. Reichart and H. Lieberman, \"Modeling the Detection of Textual Cyberbullying,\" in Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, Barcelona, Spain, 2011. [8] T. Davidson, D. Warmsley, M. Macy and I. Weber, \"Automated Hate Speech Detection and the Problem of Offensive Language,\" in Proceedings of ICWSM, pp. 512–515, 2017. [9] N. Malmasi and M. Zampieri, \"Detecting Hate Speech in Social Media,\" in Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), Varna, Bulgaria, 2017, pp. 467–472. [10] S. Potha and E. Stamatatos, \"A Computational Approach to the Detection of Online Cyberbullying,\" in Expert Systems with Applications, vol. 134, pp. 178–195, 2019. [11] P. Saha, R. A. Begum, and T. Hossain, \"Cyberbullying Detection on Social Media Using Machine Learning Algorithms,\" in 2021 24th International Conference on Computer and Information Technology (ICCIT), Dhaka, Bangladesh, pp. 1–6, 2021. [12] M. F. Islam, A. Ahmed, and M. Kamal, \"An Ensemble Approach to Cyberbullying Detection on Social Media,\" in 2020 IEEE Symposium on Computers and Communications (ISCC), Rennes, France, pp. 1–6, 2020. [13] A. Pandey, R. Goel, and S. Tiwari, \"Cyberbullying Detection Using Deep Learning Models: A Comparative Study,\" in 2022 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Tamil Nadu, India, pp. 712–717, 2022. [14] M. A. Abioye, N. A. B. Zainal, and M. A. B. Rashid, \"Cyberbullying Detection: A Comparative Study,\" IEEE Access, vol. 9, pp. 122573–122585, 2021. [15] D. Xu, D. Zhang, Y. Lu, and Y. Jin, \"Cyberbullying Detection Based on Semantic-Enhanced BERT Model,\" Electronics, vol. 11, no. 5, pp. 1–18, 2022. [16] D. Salawu, Y. He, and L. Lumsden, \"Approaches to Automated Detection of Cyberbullying: A Survey,\" IEEE Transactions on Affective Computing, vol. 12, no. 1, pp. 3–24, 2021. [17] T. Hosseinmardi et al., \"Detection of Cyberbullying Incidents on the Instagram Social Network,\" in Proceedings of the 2015 AAAI International Conference on Web and Social Media (ICWSM), Oxford, UK, 2015. [18] B. Gamba, G. de Francisci Morales, and M. Trevisan, \"Large Scale Analysis of YouTube Comment Spam Detection,\" in Proceedings of the 2020 World Wide Web Conference (WWW), pp. 1537–1548, 2020. [19] R. Zhang, E. W. Huang, and M. Ostendorf, \"Cyberbullying Detection with Robust Text Representations,\" in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2875–2884, 2021. [20] P. Potha, A. Theodorakopoulos, and G. K. Karystianis, \"Deep Learning for Cyberbullying Detection: A Comparative Study Using Twitter Data,\" in IEEE Transactions on Computational Social Systems, vol. 8, no. 3, pp. 703–712, 2021.

Copyright

Copyright © 2025 A. R. Jariya Begum, D. Steffy, B. Monisha. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET70045

Publish Date : 2025-04-30

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here