Identifyingbiases,sentiments,andrelevancefor efficient governance has become difficult due to the rapid expan- sionofdigitalnewsacrossnumerousplatforms.Conventional approachesdon’thaveautomatedsystemstogroupnewsby government agenciesorto quicklydrawattentiontoimportant issues.Additionally,thevarietyofregionallanguagesmakes timely decision-making and extensive monitoring more difficult. Anautomatedframeworkfordigitalnewscrawling,classi- fication, andsentiment analysiswithanintegratedfeedback system is presented in this work. The framework gathers videos andarticles fromvariousnationalandregionalmediasources, usesmachinelearningmodelstocategorizethemintotheir respective ministries basedonthecontent,andusesnatural languageprocessingforsentimentanalysis.Real-timenotifica- tionstotherelevant departments aretriggeredbynegative newsitems,allowingforpromptintervention.Directlinksto originalsources,department-wisefilters, sentimentvisualization, andmultilingualsupportareallfeaturesofanintuitiveinterface. Future developments willin volve implementing thesystemas amobileapplication,addingmoreregionallanguages, andenhancingmodel accuracywith largerdatasets. Thisstrategy helpsgovernmentagenciesmaketimelyandwell-informedpolicy decisionswhile raisingpublic awareness, whichpromotesbetter governanceandsocialcohesion.
Introduction
The rapid expansion of online news platforms has created an overwhelming flow of digital information, making manual monitoring slow, inconsistent, and inefficient. To address this challenge, GovPulse is introduced as an AI-powered system that automatically collects, classifies, and analyzes governance-related news in real time. It fetches articles using Scrapy, categorizes them under the correct government ministries using transformer-based models like DistilBERT, performs sentiment analysis, and sends alerts to concerned departments whenever negative or sensitive news is detected. A web dashboard provides users with easy filtering, visualization, and interactive exploration of categorized news.
The system’s key contributions include large-scale automated news extraction, ministry-wise classification with machine learning and clustering methods, DistilBERT-based sentiment detection, real-time email alerts, and an interactive analytics dashboard. It addresses the rising need for automated monitoring, reduces subjective interpretation, and converts raw news into structured, actionable insights for better transparency and responsiveness in governance.
A review of previous research shows progress in news classification, sentiment analysis, and machine learning methods but highlights major gaps in multilingual processing, real-time alerting, and governance-specific applications. Existing works focus on high-level news categories, English-only datasets, or limited scalability. GovPulse bridges these gaps through multilingual support, transformer models, and integration with government workflows.
The proposed system pipeline consists of several stages: automated data acquisition from static/dynamic pages and videos (Scrapy, Selenium), preprocessing and translation, embedding and clustering (UMAP, K-means), supervised classification (DistilBERT, XLM-R, SVM), sentiment analysis (RoBERTa, DistilBERT), real-time alert generation, and visualization through a React + Tailwind dashboard connected to Django REST APIs. MongoDB stores raw and processed data, media assets, and model outputs.
Implementation includes large-scale scraping, multilingual preprocessing (tokenization, lemmatization, stopword removal), semantic clustering with optimized k-values, and classification using both baseline and transformer models. Linear SVM performed best among baseline models with 88.9% accuracy, while transformer architectures offered superior contextual understanding. Sentiment analysis uses improved polarity thresholds and transformer models to better detect subtle emotions.
Results show high accuracy in ministry classification, effective sentiment tagging, and positive user feedback for dashboard usability. Limitations include varying performance in low-resource languages and challenges in real-time scaling. Future improvements aim to incorporate stronger multilingual models, multimodal features, and predictive analytics.
Overall, GovPulse delivers a comprehensive, scalable solution for automated governance news intelligence, enabling timely decision-making, improved transparency, and efficient monitoring across ministries.
Conclusion
This work presented GovPulse, an AI-driven system for large-scale monitoring, classification, and sentiment analysis of governance-related news. By combining automated web scraping, multilingual preprocessing, and transformer-based NLP models, the system effectively processes high-volume digital news and organizes it into ministry-specific categories.
Experimentalresultsdemonstratethattransformermodelssig- nificantlyoutperformclassicalbaselines,withtheLinearSVM achievingan accuracy of88.9%andanF1-scoreof0.88. The integrated alert mechanism and the interactive dashboard further enhance the system’s practical utility by enabling timely detection of negative news and providing transparent, actionable insights to stakeholders.
Although the system performs robustly across major sources, challenges remain in handling code-mixed and low- resource regional languages, processing noisy multimedia content, and scaling real-time pipelines at a national level. Addressing these limitations is essential for improving the model’s coverage and deployment readiness. Future enhance- ments will focus on incorporating stronger multilingual trans- former models such as mBERT and XLM-R, extending the pipelinetomultimodalanalysisusingimagesandvideos, and integrating predictive analytics to forecast emerging pub- lic issues. Scaling GovPulse through cloud-based distributed pipelines will further improve real-time performance and en- sure long-term operational sustainability.
Overall, GovPulse establishes a strong foundation for AI- enabled news intelligence and offers meaningful potential for strengtheningtransparentgovernance,proactivepolicymaking, and informed citizen engagement.
References
[1] R. Patro, S. Sahu, S. Mohanty, Real-time News Classification usingMachine Learning Algorithms, International Journal of Computer Ap-plications, vol. 176, no. 37, pp. 1–7, 2020.
[2] J. Zhu, Deep Learning for Large-Scale Text Classification: Convolu-tional Neural Networks with Feature Hashing, Proc. 26th ACM Int.Conf. on Information and Knowledge Management (CIKM), pp. 55–64,2017.
[3] S.Bu¨yu¨ko¨z,E.Ku¨c¸u¨k, S.O¨ztu¨rk,ComparativeStudyofELMoandDistilBERTforNewsClassification,NaturalLanguageEngineering,vol.26, no. 6, pp. 691–707, 2020.
[4] N.Valmiki,A.P.S.,MachineLearningApproachesforSentimentAnal-ysis and Prediction, EPRA International Journal of MultidisciplinaryResearch, vol. 9, no. 2, pp. 45–51, 2023.
[5] A.Yadav,SentimentAnalysisinHindi:ASurvey,Proc.IEEEInt.Conf. on Advances in Computing, Communications and Informatics(ICACCI), pp. 2349–2353, 2015.
[6] M. Gupta, K. Sharma, P. Yadav, A Review on News Classification:Traditional vs Deep Learning Approaches, Journal of Information andKnowledge Management, vol. 20, no. 4, pp. 1–15, 2021.
[7] K. Patel, V. Sharma, Hybrid Approaches for Web Page and News FeedClassification,InternationalJournalofAdvancedComputerScienceandApplications, vol. 10, no. 7, pp. 112–118, 2019.
[8] T. Wolf et al., Transformers: State-of-the-Art Natural Language Pro-cessing, Proc. 2020 Conf. on Empirical Methods in Natural LanguageProcessing (EMNLP), pp. 38–45, 2020.
[9] Scrapy Developers, Scrapy: An Open Source Framework for ScalableWeb Crawling, 2023.
[10] SeleniumHQ,SeleniumWebDriver,2023.
[11] L. McInnes, J. Healy, J. Melville, UMAP: Uniform Manifold Ap-proximation and Projection for Dimension Reduction, arXiv preprintarXiv:1802.03426, 2018.
[12] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A Density-Based Algorithmfor Discovering Clusters in Large Spatial Databases with Noise, Proc.2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD), pp.226–231, 1996.
[13] R.Campello,D.Moulavi,J.Sander,Density-BasedClusteringBasedonHierarchicalDensityEstimates,Pacific-AsiaConferenceonKnowledgeDiscovery and Data Mining (PAKDD), pp. 160–172, 2013.
[14] F. Pedregosa et al., Scikit-learn: Machine Learning in Python, Journalof Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[15] Y. Liu et al., RoBERTa: A Robustly Optimized BERT PretrainingApproach, arXiv preprint arXiv:1907.11692, 2019.
[16] J.Devlin etal.,BERT:Pre-training ofDeepBidirectional Transformersfor Language Understanding, Proc. NAACL, pp. 4171–4186, 2019.
[17] DjangoSoftwareFoundation,Django:TheWebFramework,2023.
[18] Meta,React–AJavaScriptLibraryforBuildingUserInterfaces,2023.
[19] TailwindLabs,TailwindCSS–RapidlyBuildModernWebsites,2023.
[20] S. Ghosh, S. K. Ghosh, Bias detection in online news: A comprehensivesurvey, Telematics and Informatics, vol. 64, p. 101690, 2021.
[21] A. Vaswani et al., Attention is All You Need, Advances in NeuralInformation Processing Systems, vol. 30, 2017.
[22] T. K. Landauer, P. W. Foltz, D. Laham, An introduction to latentsemantic analysis, Discourse Processes, vol. 25, no. 2-3, pp. 259-284,1998.
[23] H. Allcott, M. Gentzkow, Social media and fake news in the 2016election, Journal of Economic Perspectives, vol. 31, no. 2, pp. 211-236,2017.
[24] M. A. Rosli et al., A survey of web crawling algorithms, Proc. Int. Conf.on Computing and Informatics, pp. 1-6, 2019.
[25] S. Bird, E. Klein, E. Loper, Natural Language Processing with Python,O’Reilly Media, Inc., 2009.
[26] C. Hutto, E. Gilbert, VADER: A Parsimonious Rule-based Model forSentiment Analysis of Social Media Text, Proc. ICWSM, 2014.
[27] K.Chodorow,MongoDB:TheDefinitiveGuide,O’ReillyMedia,2013.
[28] M. Honnibal, I. Montani, spaCy 2: Natural language understandingwithBloomembeddings,convolutionalneuralnetworksandincrementalparsing, 2017.
[29] R. Gala et al., IndicTrans: A Transformer-based Model for Indic Lan-guage Translation, arXiv preprint arXiv:2205.12218, 2022.
[30] A.Conneauetal.,UnsupervisedCross-lingualRepresentationLearningat Scale, Proc. ACL, pp. 8440-8451, 2020.
[31] P. J. Rousseeuw, Silhouettes: A graphical aid to the interpretation andvalidation of cluster analysis, Journal of Computational and AppliedMathematics, vol. 20, pp. 53-65, 1987.