ToxiSafe: Hate Speech Detection System

Authors: Vijayshekhar Aratagi, Vedavyas N, Srinandu K, Venu Y, Hosamani Manikeshwari

DOI Link: https://doi.org/10.22214/ijraset.2025.67184

Abstract

In today\'s interconnected digital world, the internet serves as a vital platform for communication and information sharing. However, it has also become a source of harmful content, including hate speech expressions intended to demean or discriminate based on identity. This project addresses the growing challenge of hate speech in online spaces by developing an advanced detection system. Leveraging the BERT model, the application accurately identifies hate speech in textual content. Furthermore, it incorporates multimedia analysis using tools like Pydub, MoviePy, and SpeechRecognition to transcribe audio and video content for processing, enabling detection across diverse formats.A standout feature of the system is its integration of LIME (Local Interpretable Model-agnostic Explanations), which enhances transparency by highlighting specific words or phrases contributing to flagged content. Built on Flask, the system ensures user-friendliness, delivering results in an accessible format. This project has wide applications, from moderating social media platforms to aiding researchers and educators. It represents a meaningful step toward fostering respectful digital interactions, combining technological innovation with ethical responsibility

Introduction

Introduction

In the digital age, the internet has become a vital communication tool, but it also fosters the spread of hate speech—language intended to demean or discriminate. This project presents a user-friendly, AI-driven system for detecting hate speech across multiple media types, including text, audio, and video.

At the heart of the system is BERT (Bidirectional Encoder Representations from Transformers), a powerful NLP model capable of understanding context and meaning in language. The system also uses LIME to explain its decisions, enhancing transparency and trust.

The tool processes:

Text directly
Audio and video via transcription using tools like Pydub, MoviePy, and Speech Recognition

The application is built using Flask, making it accessible and easy to integrate for real-time content moderation.

2. Literature Review Highlights

Multimodal Hate Speech Detection: Uses deep learning models (BERT, CNN, MLP) to classify hate speech across formats like text, voice, and video, particularly in multilingual contexts.
OCR for Text Extraction: Describes how Optical Character Recognition automates text extraction from images, enhancing data usability in various industries.
Text-Based Hate Speech Detection: Reviews NLP methods (e.g., TF-IDF, CNNs, RNNs, BERT), dataset challenges, and the complexity of detecting hate speech due to contextual and cultural factors.
ML Algorithms for Hate Speech: Analyzes methods like SVM, Naïve Bayes, CNN, and RNN, their preprocessing steps, and legal implications for automated moderation.
Video-Based Detection: Focuses on converting speech in videos to text and using classifiers like Naïve Bayes and Random Forests to detect hate speech, noting the need for high-quality datasets.

3. Problem Definition

Manual moderation of hate speech on digital platforms is inefficient and biased due to high content volume. This project proposes an automated, scalable, and unbiased solution to improve online safety and reduce societal harm using machine learning and NLP techniques.

4. Methodology

System Workflow:

A. Data Collection: Gathers diverse content (text, audio, video) from social platforms and forums.
B. Data Preprocessing:
- Text cleaning: Remove symbols, URLs, lowercasing, stemming
- Multimedia: Transcribe audio/video to text
C. Model Selection: Uses BERT to classify content as "Hate Speech" or "Not Hate Speech."
D. Hate Speech Detection: Identifies hateful content based on context and meaning.
E. Sentiment Analysis: Evaluates emotional tone for deeper insights.
F. Evaluation: Uses metrics like accuracy, precision, recall, and F1-score to assess performance.
G. Deployment: Built as a web application/API using Flask for easy integration into platforms.

5. Results and Evaluation

Figure 2 (Home Page): User-friendly interface supporting text, audio, and video inputs.
Figure 3 (Text Input): Users input text to be classified using BERT as either hateful or not.
Figure 4 (Voice Input): Accepts audio files, transcribes them, and detects hate speech in spoken words.

Key Strengths of the System

Multimodal support: Analyzes text, speech, and video.
Advanced NLP: Leverages BERT for high accuracy.
Explainable AI: Uses LIME for transparency in decisions.
Scalable and real-time: Suitable for integration into live platforms.

Conclusion

This project demonstrates the potential of machine learning in addressing the critical issue of hate speech in online content. Our model effectively identifies harmful language, contributing to the creation of safer and more respectful digital spaces. By leveraging advanced natural language processing techniques and diverse data, the project underscores the capability of automated solutions to support content moderation and foster healthier online interactions. The model employs sophisticated algorithms trained on diverse and representative datasets, ensuring its ability to detect nuanced forms of hate speech across different contexts, cultures, and languages. This approach not only enhances its accuracy but also highlights its adaptability to evolving patterns of online discourse. By proactively identifying problematic content, the system empowers platforms to intervene before harm escalates, safeguarding users from toxic interactions.

References

[1] Irfan, A., & Kumar, N. (2024). Multi-Modal Hate Speech Recognition Through Machine Learning. In Proceedings of the International Conference on Advanced Computing and Applications. [2] Mansur, Z., Omar, N., & Tiun, S. (2023). Twitter Hate Speech Detection: A Systematic Review of Methods, Taxonomy Analysis, Challenges, and Opportunities. Journal of Social Media Analytics, 15(3), 45-67. [3] Mehta, H., &Passi, K. (2022). Social Media Hate Speech Detection Using Explainable Artificial Intelligence (XAI). In Proceedings of the International Conference on Artificial Intelligence Applications. [4] Wu, C. S., &Bhandary, U. (2020). Detection of Hate Speech in Videos Using Machine Learning. In Proceedings of the International Conference on Multimedia Systems. [5] Mingjun Wei, Qiwei Wu, HongyuJi, Jingkun Wang, Tao Lyu, Jinyun Liu , and Li Zhao (2023). A Skin Disease Classification Model Based on DenseNet and ConvNeXt Fusion. Journal of Multidisciplinary Digital Publishing Institute journal. [5] Alkomah, F., & Ma, X. (2022). A Literature Review of Textual Hate Speech Detection Methods and Datasets. Computational Linguistics Journal, 38(4), 123-145.

Copyright

Copyright © 2025 Vijayshekhar Aratagi, Vedavyas N, Srinandu K, Venu Y, Hosamani Manikeshwari. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET67184

Publish Date : 2025-02-28

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here