Cyberbullying and Hate Speech Detection Using DistilBERT-Based NLP Techniques

Authors: Pavan Dolle, Atharv Gadekar, Sunny Gend, Om Gojamgunde

DOI Link: https://doi.org/10.22214/ijraset.2026.78853

Abstract

The rapid growth of online social platforms has increased digital communication, but it has also led to an increase in cyberbullying and hate speech. These harmful interactions can have negative effects on individuals and reduce the safety of online environments. Due to the large amount of user-generated content, manual monitoring is not feasible, making automated detection systems essential. This paper presents a system for detecting cyberbullying and hate speech using Natural Language Processing (NLP) and the DistilBERT model. The proposed approach categorizes text into four categories: non-cyberbullying, religion-based hate, gender-based hate, and racism. A dataset of approximately 100,000 samples collected from Kaggle is used for training and evaluation. The input text is preprocessed using standard NLP techniques, including text cleaning and tokenization, before being passed to the DistilBERT model for classification. The system is further integrated with a Flask-based web application, allowing users to test the model in real time and receive immediate feedback. Experimental results show that the model achieves an accuracy of 99.6% along with high precision, recall, and F1-score values. The system effectively captures the meaning of text, allowing it to outperform traditional machine learning approaches. Additionally, the implementation of an automatic restriction mechanism helps prevent repeated posting of harmful content. Overall, the proposed system provides an efficient and practical solution for automated content moderation and contributes to improving the safety of online communication platforms.

Introduction

The text presents a cyberbullying and hate speech detection system that leverages DistilBERT, a lightweight transformer-based model, to classify social media text into four categories: non-cyberbullying, religion-based hate, gender-based hate, and racism.

Problem Context:

Social media facilitates communication but also spreads cyberbullying and hate speech, which can have severe psychological and social effects.
Existing systems often detect content only after it appears, lack multi-class classification, and provide no preventive mechanisms.

Key Contributions of the Proposed System:

Multi-Class Classification: Efficiently distinguishes between non-cyberbullying and three specific hate categories.
Real-Time Detection: Uses a Flask web application to allow instant analysis and feedback.
Automatic Restriction Mechanism: Temporarily blocks users posting harmful content, reducing repeated abuse.
Modular Architecture: Includes data collection, preprocessing, model training, prediction, and user interaction layers, ensuring scalability and flexibility.
High Accuracy: Achieves near-perfect precision, recall, and F1-score in evaluations.

Methodology:

Data Preprocessing: Text cleaning, tokenization, stopword removal, and normalization to enhance model performance.
Model Training: Fine-tuned DistilBERT captures contextual word relationships for accurate classification.
User Interface: Flask-based web app allows users to input text and receive immediate classification results.
Restriction Module: Alerts users and temporarily blocks harmful behavior to prevent further misuse.

Impact:
The system provides an efficient, scalable, and practical solution for detecting and mitigating cyberbullying and hate speech, enhancing online safety and supporting responsible social media usage.

Conclusion

In this paper, a cyberbullying and hate speech detection system based on the DistilBERT model is proposed and imple- mented. The system is designed to classify user-generated text into multiple categories, including non-cyberbullying, religion- based hate, gender-based hate, and racism. By leveraging the capabilities of transformer-based natural language processing, the model effectively captures contextual relationships in textual data, leading to highly accurate classification results. The experimental evaluation demonstrates that the pro- posed model achieves high performance across all evaluation metrics, including accuracy, precision, recall, and F1-score. The confusion matrix and training results indicate that the model generalizes well to unseen data and maintains consistent performance across different categories. A key contribution of this work is the integration of the trained model with a Flask-based web application, enabling real-time user interaction and prediction. Furthermore, the system incorporates an automatic restriction mechanism that temporarily blocks users who attempt to post harmful or offensive content. This feature plays a significant role in re- ducing the spread of cyberbullying and promoting responsible communication. In real-world scenarios, the proposed system can be effec- tively deployed in social media platforms, online communities, and communication systems to automatically monitor and control harmful content. Overall, the system provides an efficient, scalable, and practical solution for ensuring safer online environments using advanced NLP techniques.

References

[1] S. Kowalski, S. P. Limber, and P. W. Agatston, Cyberbullying: Bullying in the Digital Age, 2nd ed. Hoboken, NJ, USA: Wiley- Blackwell, 2012. [2] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ”BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proc. North American Chapter of the Association for Computational Linguistics (NAACL-HLT), 2019, pp. 4171–4186. [3] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, ”DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019. [4] A. Aluru, B. Mathew, P. Saha, and A. Mukherjee, ”Deep Learning Models for Hate Speech Detection on Social Media: A Review,” ACM Computing Surveys, vol. 54, no. 3, pp. 1–38, 2021. [5] Z. Sokolova´, M. Grigorev, and P. Hrku´t, ”Recent Trends in Hate Speech Detection Using Natural Language Processing,” Acta Electrotechnica et Informatica, vol. 22, no. 2, pp. 54–62, 2022. [6] F. M. Plaza-del-Arco, M. T. Mart´?n-Valdivia, L. A. Uren˜a-Lo´pez, and R. M. Crespo, ”Detecting Hate Speech Using Zero-Shot Learning with Language Models,” Expert Systems with Applications, vol. 213, 2023. [7] A. Bussu and S.-A. Ashton, ”Cyberbullying and Cyberstalking in Higher Education: A Study of Online Harassment,” Journal of Further and Higher Education, 2023. [8] G. Kova´cs, ”Challenges of Hate Speech Detection in Social Media,” in Proc. Int. Conf. on Recent Advances in Natural Language Processing, 2021. [9] A. S. Parihar, ”Hate Speech Detection Using Natural Language Process- ing Techniques,” International Journal of Computer Applications, vol. 183, no. 42, pp. 1–6, 2021. [10] A. K. Jain, ”Security and Privacy Issues in Online Social Networks: A Survey,” International Journal of Computer Science and Information Security, vol. 19, no. 2, pp. 45–52, 2021 [11] T. Davidson, D. Warmsley, M. Macy, and I. Weber, ”Automated Hate Speech Detection and the Problem of Offensive Language,” in Proc. Int. AAAI Conf. on Web and Social Media (ICWSM), 2017, pp. 512–515. [12] Z. Waseem and D. Hovy, ”Hateful Symbols or Hateful People? Predic- tive Features for Hate Speech Detection on Twitter,” in Proc. NAACL Student Research Workshop, 2016, pp. 88–93. [13] P. Fortuna and S. Nunes, ”A Survey on Automatic Detection of Hate Speech in Text,” ACM Computing Surveys, vol. 51, no. 4, pp. 1–30, 2018. [14] H. Zhang, M. Li, and Y. Liu, ”Deep Learning-Based Text Classification: A Comprehensive Review,” IEEE Access, vol. 8, pp. 204472–204492, 2020. [15] Y. Kim, ”Convolutional Neural Networks for Sentence Classification,” in Proc. EMNLP, 2014, pp. 1746–1751. [16] A. Joulin et al., ”Bag of Tricks for Efficient Text Classification,” in Proc. EACL, 2017, pp. 427–431. [17] F. Pedregosa et al., ”Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. [18] A. Vaswani et al., ”Attention Is All You Need,” in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998– 6008. [19] Kaggle, ”Hate Speech and Offensive Language Dataset,” [Online]. Available: https://www.kaggle.com [20] Flask Documentation, ”Flask Web Development Framework,” [Online]. Available: https://flask.palletsprojects.com

Copyright

Copyright © 2026 Pavan Dolle, Atharv Gadekar, Sunny Gend, Om Gojamgunde. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET78853

Publish Date : 2026-03-26

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here