The rapid growth of online communication platforms has significantly increased the spread of extremist and harmful content. Traditional moderation techniques based on keyword filtering and manual review are insufficient for largescale monitoring and lack contextual understanding. This paper proposes a hybrid extremism detection system that integrates machine learning, deep learning, and rule-based approaches to identify extremist text with improved accuracy and interpretability.
The proposed framework combines TF-IDF–based statistical learning, a Bidirectional Long Short-Term Memory (BiLSTM) network for sequential modeling, and a DistilBERT transformer for contextual understanding. A priority override mechanism ensures that high-confidence contextual predictions are not diluted by averaging model outputs. Additionally, an explainable dashboard interface provides real-time risk scores and highlights influential keywords to support human moderators.
Experimental evaluation demonstrates that the hybrid approach improves precision, recall, and F1-score compared to standalone models. The system also supports real-time deployment through a web-based interface, making it suitable for practical content moderation applications. The results indicate that integrating statistical, sequential, and contextual modeling provides a scalable and interpretable solution for detecting extremist content in online environments.
Introduction
The text discusses the development of an AI-based hybrid system for detecting extremist content on social media platforms. Social media is increasingly used to spread radical ideologies, recruitment messages, and harmful communication, often through coded or indirect language that bypasses traditional keyword-based moderation systems. Existing moderation approaches relying only on keyword filtering are ineffective because extremist users can easily evade detection through rephrasing or symbolic language. Therefore, the paper emphasizes the need for intelligent systems capable of understanding contextual meaning rather than simple word matching.
The proposed framework combines machine learning, deep learning, and rule-based methods to improve detection accuracy and interpretability. The system uses TF-IDF feature extraction with Logistic Regression and Random Forest for statistical keyword analysis, while BiLSTM captures sequential language patterns and DistilBERT provides deep semantic understanding of contextual meaning. A priority-override mechanism ensures that high-confidence contextual threats are flagged immediately, even if other models produce weaker predictions.
The dataset consisted of over 723,000 labeled extremist and non-extremist text samples collected from platforms such as Twitter, forums, and news comments. Preprocessing included removing URLs, emojis, and special characters, followed by normalization and tokenization. The system categorizes content into high, medium, or low risk and highlights influential keywords to improve explainability for moderators.
The architecture includes:
Data preprocessing,
Hybrid detection engine,
Explainable risk scoring,
Real-time web dashboard using Flask and Streamlit.
Mathematically, the system combines weighted outputs from the Random Forest, BiLSTM, and DistilBERT models through a fusion mechanism, while the priority override prevents important contextual threats from being diluted. Experimental results showed that the hybrid system achieved higher performance than standalone machine learning models, improving accuracy from 91.2% to 94.5% and F1-score from 89.1% to 92.9%.
The system also demonstrated real-time performance with response times under one second and improved robustness against indirect or coded extremist language. Additionally, the explainable dashboard enhances transparency by showing confidence scores and highlighted keywords that influenced predictions. However, challenges remain in handling sarcasm, highly coded language, and computational requirements of transformer-based models.
Conclusion
This paper presented a hybrid extremism detection system that integrates machine learning, deep learning, and rule-based approaches to identify harmful online content. By combining statistical feature extraction, sequential modeling, and contextual transformer-based analysis, the proposed system achieves improved accuracy and interpretability compared to standalone models.
The priority override mechanism ensures that highconfidence contextual threats are detected without dilution from averaging. The integration of an explainable dashboard enables real-time monitoring and provides transparency for human moderators. Experimental results demonstrate that the hybrid approach improves detection accuracy while maintaining practical deployment capabilities.
Future research will focus on expanding multilingual datasets, incorporating multimodal detection techniques, and improving detection of implicit extremist language. The proposed system provides a scalable and interpretable solution for enhancing online safety and supporting automated content moderation.
The experimental findings highlight the importance of combining multiple modeling approaches for reliable extremist content detection.
The hybrid framework effectively balances statistical keyword detection with contextual understanding, enabling the system to identify both explicit and implicit threats. The integration of explainable outputs further improves usability by providing transparency in model decisions.
The proposed system demonstrates strong potential for realworld deployment in content moderation pipelines. Its modular architecture, real-time processing capability, and explainable outputs make it suitable for integration into social media monitoring tools, educational platforms, and enterprise moderation systems. Continued refinement of contextual models and expansion of training datasets will further enhance system performance and adaptability.
References
[1] Al-Sabaawiet al., “Detection of online extremism using machine learning,” IEEE Access, 2022. [Online]. Available: https://arxiv.org/pdf/1703.04009
[2] S. Agarwal et al., “Hate speech and extremism detection using deep learning,” Springer, 2021. [Online].
[3] R. Brena et al., “Transformer models for online radicalization detection,” in Proc. ACM, 2020.
[4] W. Warner and J. Hirschberg, “Detecting hate speech on social media,” in Proc. LSM, 2012. [Online]. Available: https://aclanthology.org/W122103.pdf
[5] T. Davidson et al., “Automated hate speech detection and the problem of offensive language,” in Proc. ICWSM, 2017.
[6] P. Badjatiyaet al., “Deep learning for hate speech detection in tweets,” in Proc. WWW Companion, 2017.
[7] J. Haddad et al., “BERT for hate speech detection: Survey and evaluation,” Journal of Artificial Intelligence Research, 2022.
[8] Z. Zhang et al., “Detecting toxic comments using CNN-LSTM,” in Proc. ACM, 2018.
[9] J. Devlin et al., “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. NAACL, 2019.
[10] A. Vaswani et al., “Attention is all you need,” in Proc. NeurIPS, 2017.
[11] Zhang and Y. Luo, “Transformer-based online extremist text detection,” ACM Trans. Intell. Syst. Technol., 2022.
[12] UN Security Council, “Digital counter-extremism policies and AI approaches,” UN Publications, 2023.
[13] W. Magdy et al., “ISIS support on Twitter: Identifying users and predicting radicalization,” in Proc. IEEE/ACM ASONAM, 2016.
[14] L. Silva et al., “The role of NLP in countering extremist content,” Wiley Online Library, 2021.
[15] F. Alatawiet al., “Countering online radicalization with AI,” IEEE Trans., 2020.
[16] Qureshi et al., “Deep learning framework for anti-terrorism content detection,” Springer, 2020.
[17] W. Y. Wang, “Fake news detection with neural networks,” in Proc. ACL Workshop, 2018.
[18] Vidgen and L. Derczynski, “Challenges in automated hate speech detection,” in Proc. ACL, 2020.
[19] P. Kumari et al., “Machine learning for cyber threat detection,” IEEE Access, 2022.
[20] Y. Khan et al., “Multilingual extremism detection in social media,” Expert Systems with Applications, 2021.
[21] Z. Waseem and D. Hovy, “Hateful conduct on Twitter: Annotated dataset,” in Proc. NAACL, 2016.
[22] S. Zannettouet al., “On the origins of memes and extremist propaganda,” in Proc. ICWSM, 2018.
[23] M. H. Ribeiro et al., “Evolving radicalization on YouTube,” in Proc. AAAI ICWSM, 2019.
[24] P. Neumann, “The trouble with extremism,” Perspectives on Terrorism, 2013.
[25] M. C. Benigni et al., “Online extremism and influence operations,” IEEE, 2017.