Cyberbullying Detection and Prevention Using Machine Learning

Authors: Mrs. Elakia K, Mr. Dinesh Kumar H, Mr. Daniyalraj K, Mr. Yogesh P, Mr. Vishva J

DOI Link: https://doi.org/10.22214/ijraset.2025.71011

Abstract

Cyberbullying Detection uses a combination of MACHINE LEARNING techniques such as TF-IDF vectorization, logistic regression, multilayer perception, CNNs and LSTM networks to create a robust model for detecting cyberbullying. By Employing BERT model, it’s able achieve higher accuracy and better performance in identifying offensive content on social media platforms. The existing System for detecting cyberbullying in Indian Language Bengali on social media. The model uses text preprocessing, TF-IDF, and Instance Hardness Threshold (IHT) for resampling, it uses multiple Machine learning algorithms for detection of online harassment. However, the existing System does not address the practical challenges like Real-Time Detection and the Technique used for Resampling deduce the actual size of dataset to balance the dataset which leads to lower accuracy rate. To overcome these Limitations, the proposed system uses the BERT model, known for its advanced contextual understanding and bidirectional processing capabilities, to enhance prediction accuracy.

Introduction

Cyberbullying is a growing concern in the digital age, intensified by social media’s widespread use. Unlike traditional bullying, cyberbullying often involves anonymous, persistent online harassment that can cause severe emotional harm, including anxiety, depression, and even suicide. Detecting cyberbullying is challenging due to the complex, context-dependent nature of online language, which frequently includes slang, sarcasm, and cultural nuances.

Traditional detection methods relying on human moderators or basic machine learning models struggle with the volume and subtlety of content. Recent advances in Natural Language Processing (NLP), particularly deep learning models like BERT (Bidirectional Encoder Representations from Transformers), have improved detection by capturing context bidirectionally and understanding nuanced language. The research proposes using BERT trained on bilingual datasets (English and Tamil) to create a culturally adaptive, multilingual cyberbullying detection system.

Key points include:

Challenges: Sarcasm, slang, cultural variation, class imbalance in data, and real-time processing difficulties complicate cyberbullying detection.
Machine Learning vs. Deep Learning: Traditional ML models (e.g., SVM, Logistic Regression) show moderate success, while deep learning models (CNNs, RNNs, BERT) automatically extract complex features and better handle subtle language.
Multilingual Detection: Expanding detection beyond English to languages like Tamil helps address diverse linguistic and cultural bullying expressions.
Hybrid Models: Combining traditional ML with deep learning and advanced feature extraction improves accuracy and handles imbalanced data better.
Sentiment Analysis: Integrating sentiment and psycholinguistic analysis enhances detection of subtle, context-dependent bullying.
Challenges & Future Directions: False positives/negatives, computational demands, privacy concerns, dataset limitations, and evolving language require ongoing research. Ethical considerations around censorship and user privacy remain critical.

Conclusion

The current system effectively detects cyberbullying terms in Bengali using a deep learning model. We are Aiming to implement BERT model for better analysis outcome and also to implement Real-Time Detection while chatting by hosting an custom social media like chatting website(online) And also planning to implement Multi-Language detection of cyberbullying to maximize the efficiency of this project.

References

[1] Ahmed, N., Ahammed, R., Islam, M.M., Uddin, M.A., Akhter, A., Talukder, M.A., Paul, B.K., 2021d. Machine learning based diabetes prediction and development of smart web application. Int. J. Cogn. Comput. Eng. 2, 229–241. [2] Ahmed, M.F., Mahmud, Z., Biash, Z.T., Ryen, A.A.N., Hossain, A., Ashraf, F.B., 2021a. Bangla online comments dataset. Mendeley Data 1. [3] Ahmed, M.F., Mahmud, Z., Biash, Z.T., Ryen, A.A.N., Hossain, A., Ashraf, F.B., 2021b. Cyberbullying detection using deep neural network from social media comments in bangla language. arXiv preprint arXiv:2106.04506. [4] Ahmed, T., Mukta, S.F., Al Mahmud, T., Al Hasan, S., Hussain, M.G., 2022b. Bangla text emotion classification using LR, MNB and MLP with TF-IDF & CountVectorizer. In: 2022 26th International Computer Science and Engineering Conference. ICSEC, IEEE, pp. 275–280. [5] Ahmed, M.T., Rahman, M., Nur, S., Islam, A., Das, D., 2021c. Deployment of machine learning and deep learning algorithms in detecting cyberbullying in bangla and romanized bangla text: A comparative study. In: 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies. ICAECT, IEEE, pp. 1–10. [6] Ahmed, M., Rahman, M., Nur, S., Islam, A., Das, D., et al., 2022a. Introduction of PMI SO integrated with predictive and lexicon based features to detect cyberbullying in bangla text using machine learning. In: Proceedings of 2nd International Conference on Artificial Intelligence: Advances and Applications. Springer, pp. 685–697. [7] Akhter, S., et al., 2018. Social media bullying detection using machine learning on bangla text. In: 2018 10th International Conference on Electrical and Computer Engineering. ICECE, IEEE, pp. 385–388. [8] Akter, M., Zohra, F.T., Das, A.K., 2017. Q-MAC: QoS and mobility aware optimal resource allocation for dynamic application offloading in mobile cloud computing. In: 2017 International Conference on Electrical, Computer and Communication Engineering. ECCE, IEEE, pp. 803–808. [9] Alkhatib, K., Abualigah, S., 2020. Predictive model for cutting customers migration from banks: Based on machine learning classification algorithms. In: 2020 11th International Conference on Information and Communication Systems. ICICS, IEEE, pp. 303–307. [10] Aurpa, T.T., Sadik, R., Ahmed, M.S., 2022. Abusive bangla comments detection on facebook using transformer-based deep learning models. Soc. Netw. Anal. Min. 12 (1), 1–14. [11] Bhattacharyya, S., Jha, S., Tharakunnel, K., Westland, J.C., 2011. Data mining for credit card fraud: A comparative study. Decis. Support Syst. 50 (3), 602–613.

Copyright

Copyright © 2025 Mrs. Elakia K, Mr. Dinesh Kumar H, Mr. Daniyalraj K, Mr. Yogesh P, Mr. Vishva J. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET71011

Publish Date : 2025-05-14

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here