Living in the era of social media exposes people to vast amounts of information, much of which is unreliable. The widespread circulation of fake news poses serious risks, particularly when it mimics credible reporting. While fake news detection has been extensively studied in high-resource languages like English, low-resource languages such as Bangla remain underexplored due to limited linguistic tools and datasets. This study focuses on detecting fake news in Bengali by developing a DistilBERT-based model, achieving an accuracy of 86% on an online Bangla Fake News dataset. Model performance was evaluated using accuracy, precision, recall, and F1-score, highlighting the importance of dataset balance for effective classification. The proposed approach aims to rapidly identify fake news using natural language processing, reducing users’ exposure to misinformation. A comparative analysis with an LSTM model further demonstrates the effectiveness of DistilBERT, underscoring the promise of advanced NLP techniques in combating misinformation within Bengali-speaking digital communities.
Introduction
Fake news is deliberately created misinformation intended to mislead people for political, social, or financial purposes. It often combines truth with falsehoods and spreads rapidly through social media and digital platforms. In Bangladesh, fake news and rumors have caused serious social unrest, violence, communal attacks, economic panic, and even deaths. Examples include incidents related to the Padma Bridge rumors, attacks on minority communities, misinformation during religious events, and false health information during the COVID-19 pandemic. These events highlight the urgent need for reliable fake news detection systems, especially in the Bengali language.
The study focuses on detecting Bangla fake news using advanced Natural Language Processing (NLP) techniques, particularly the DistilBERT model. DistilBERT is a lightweight and faster version of BERT that can effectively understand context and semantic relationships in text while requiring lower computational resources. The proposed approach aims to improve fake news identification in Bengali, a relatively underdeveloped research area compared to English fake news detection.
The related work section reviews several machine learning and deep learning approaches used for Bangla fake news detection. Researchers have applied techniques such as:
Support Vector Machine (SVM)
Naive Bayes
Convolutional Neural Networks (CNN)
Long Short-Term Memory (LSTM)
Bi-LSTM
Transformer models like BERT and BanglaBERT
Many studies achieved high accuracy levels, some exceeding 95–99%, using large annotated Bangla datasets. Researchers also addressed challenges such as class imbalance, feature extraction, low-resource language limitations, and dataset creation.
The proposed methodology uses DistilBERT combined with GRU, dense layers, and dropout layers to classify Bangla news articles as real or fake. The process includes:
Preprocessing Bangla text data.
Tokenizing articles into sub-word units using DistilBERT tokenizer.
Converting text into numerical representations such as token IDs and attention masks.
Training the model to identify fake and genuine news articles.
The model performance is evaluated using metrics such as:
Accuracy
Precision
Recall
F1-score
ROC-AUC score
Conclusion
DistilBERT has great potential for identifying fake news in Bangla, as do other language models of a similar nature. Its capacity for textual data analysis in addition comprehension can help stop the spread of false information in places where Bangla is the primary language. Although it has great potential, there are a number of obstacles and restrictions in using DistilBERT for Bangla fake news identification. These include problems regarding adversarial attacks, linguistic complexity, cultural background, in addition dataset accessibility. In order to address these obstacles and improve the accuracy of Bangla fake news identification using DistilBERT, more study and development are obviously needed. Future research should concentrate on topics like contextual understanding, adversarial robustness, multimodal techniques, cross-lingual transfer learning, user-centric solutions, real-time detection systems, dataset expansion, and Bangla tuning. DistilBERT-based fake news detection systems can have a big positive impact on society if they are successfully implemented in Bangla. It can support people in making wise choices about the information they consume, lessen the negative effects of false information, in addition preserve the accuracy of public controversy.
References
[1] Pall Chowdhury, P., Eumi, E. M., Sarkar, O., & Ahamed, M. F. (2021, December 4). Bangla News Classification Using GloVe Vectorization, LSTM, and CNN. Lecture Notes on Data Engineering and Communications Technologies. [Online]. Available: https://doi.org/10.1007/978-981-16-6636-0_54.
[2] Md. Elias Hossain, Md. Nadim Kaysar, Abu Zahid Md Jalal Uddin Joy, SM Mizanur Rahman, Md. Wahidur Rahman. “A Study towards Bangla Fake News Detection Using Machine Learning and Deep Learning” In book: Sentimental Analysis and Deep Learning, Proceedings of ICSADL 2021 (pp.79-95),Oct 2022.
[3] Hossain, M.Z., et al., “A dataset for detecting fake news in bangla. arXiv preprint” arXiv:2004.08789, 2020.
[4] Md Zobaer Hossain, Md Ashraful Rahman, Md Saiful Islam, Sudipta Kar, “A Dataset for Detecting Fake News in Bangla” In:12th Conference on Language Resources and Evaluation (LREC 2020).
[5] Grave, E., Bojanowski, P., Gupta, P., Joulin, A., and Mikolov, T. (2018), “Learning word vectors for 157 languages” In : International Conference on Language Resources and Evaluation (LREC 2018).
[6] Risul Islam Rasel, Anower Hossen Zihad, Nasrin Sultana1, Mohammed Moshiul Hoque, “Bangla Fake News Detection using Machine Learning, Deep Learning and Transformer Models” In: 25th International Conference on Computer and Information Technology (ICCIT) 17-19 December, Cox’s Bazar, Bangladesh
[7] M. G. Hussain, M. Rashidul Hasan, M. Rahman, J. Protim and S. Al Hasan, \"Detection of Bangla Fake News using MNB and SVM Classifier,\" 2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE), Southend, UK, 2020, pp. 81-85, doi: 10.1109/iCCECE49321.2020.9231167.
[8] Mobs beat five dead for kidnapping, daily star. [Online]. Available: https://www.thedailystar.net/frontpage/news/mobsbeat-2-dead-kidnapping-1774471 (2019, July)
[9] Tasnuba Sraboni, Md. Rifat Uddin , Fahim Shahriar , Ruhit Ahmed Rizon ,Shakib Ibna Shameem Polock, “FakeDetect: Bangla Fake News Detection Model based on Different Machine Learning Classifiers Computer Science and Engineering Department of Computer Science and Engineering Brac University June 2021.
[10] Iftikhar Ahmad , Muhammad Yousaf,Suhail Yousaf , and Muhammad Ovais Ahmad , “Research Article Fake News Detection Using Machine Learning Ensemble Methods” . [Online].Available:https://doi.org/10.1155/2020/8885861(2020,October).
[11] Sadik Al Jarif, “Bangla And English Language Fake News Detection Using Deep Learning” Department of CSE Manarat International University, Oct 2020
[12] Md. Shahriar Rahman,Faisal Bin Ashraf , “An Efficient Deep Learning Technique for Bangla Fake News Detection” In: 25th International Conference on Computer and Information Technology (ICCIT), 17-19 December, 2022, Cox’s Bazar, Bangladesh .
[13] Md. Muzakker Hossain, Zahin Awosaf, Md. Salman Hossan Prottoy, Abu Saleh Muhammod Alvy,Md. Kishor Morol , “Approaches for Improving the Performance of Fake News Detection in Bangla: Imbalance Handling and Model Stacking” American International University-Bangladesh, Dhaka, Bangladesh. 22 Mar 2022.
[14] Md. Zahin Hossain George,Naimul ,Md. Rafiuzzaman Bhuiyan ,Abu Kaisar Mohammad Masum, “Bangla Fake News Detection Based On Multichannel Combined CNN-LSTM” In: 12th International Conference on Computing Communication and Networking Technologies (ICCCNT).July 2021.
[15] S. Rohman, J. Ferdous, S. M. R. Ullah and M. A. Rahman, \"IBFND: An Improved Dataset for Bangla Fake News Detection and Comparative Analysis of Performance of Baseline Models,\" 2023 International Conference on Next-Generation Computing, IoT and Machine Learning (NCIM), Gazipur, Bangladesh, 2023, pp. 1-6, doi: 10.1109/NCIM59001.2023.10212799.
[16] “Price hike rumour: People flock shops for salt.” https://rb.gy/ehogkn, 2019
[17] N. M. Jakilim, S. Mahamudul Hasan and E. Hassan, \"A Benchmark of Machine Learning and Deep Learning Algorithms for Detecting Fake News in Bangla Language,\" 2022 4th International Conference on Sustainable Technologies for Industry 4.0 (STI), Dhaka, Bangladesh, 2022, pp. 1-6, doi: 10.1109/STI56238.2022.10103235.
[18] Q. A. R. Adib, M. H. K. Mehedi, M. S. Sakib, K. K. Patwary, M. S. Hossain and A. A. Rasel, \"A Deep Hybrid Learning Approach to Detect Bangla Fake News,\" 2021 5th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Turkey, 2021, pp. 442-447, doi: 10.1109/ISMSIT52890.2021.9604712.
[19] MD. RAFI-UR-RASHID, MAHIM MAHBUB, and MUHAMMAD ABDULLAH ADNAN, “Breaking the Curse of Class Imbalance: Bangla Text Classification” In: ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 21, No. 5, Article 97. Publication date: April 2022
[20] Mahammed Kamruzzaman, Md. Minul Islam Shovon and Gene Louis Kim , “BanMANI: A Dataset to Identify Manipulated Social Media News in Bangla” In:arXiv:2311.02570v1 [cs.CL] 5 Nov 2023
[21] Shafayat Bin Shabbir Mugdha, Sayeda Muntaha Ferdous, and Ahmed Fahmin. “Evaluating Machine Learning Algorithms For Bengali Fake News Detection”. In: 2020 23rd International Conference on Computer and Information Technology (ICCIT). 2020, pp. 1–6. doi: 10 . 1109 / ICCIT51783 . 2020 . 9392662.
[22] Farzana Islam et al. “Bengali Fake News Detection”. In: 2020 IEEE 10th International Conference on Intelligent Systems (IS). 2020, pp. 281–287. doi: 10. 1109/IS48319.2020.9199931.
[23] BanFakeNews. Available online:https://github.com/Rowan1224/FakeNews?tab=readme-ovfile (accessed on 17 February 2023)
[24] BanFakeNews. Available online: https://www.kaggle.com/datasets/cryptexcode/banfakenews (accessed on 18 February 2023)
[25] BanMANI. Available online: https://www.kaggle.com/datasets/mozaman36/banmani (accessed on 20 February 2023)
[26] Bangla fake new detection. Available online: https://www.kaggle.com/datasets/mozaman36/banmani (accessed on 20 February 2023)
[27] Fake News Detection . Available online: https://www.kaggle.com/code/therealsampat/fake- news-detection (accessed on 23 February 2023)
[28] Final_bn_data. Available online: https://www.kaggle.com/datasets/pikuldasjoy/fake-newsdataset/data?select=final_bn_data.csv