AI-Driven Detection of Unauthorized Database Queries in Government Citizen Information Systems

Authors: Mohamed Saad, Dr. Adnan Al-Helali

DOI Link: https://doi.org/10.22214/ijraset.2026.83038

Abstract

Therapiddigitaltransformationofgovernmentservicesmeansthatsensitivecitizen data increasingly resides in centralized database systems. These platforms are now thefocusofattacksfromexternalthreatsandinsiders,whichmayincludeprivileged usersrunningunauthorizedqueries.Traditionalrule-basedACmechanismsfailedto catch such advanced, context-aware unauthorized access behaviours. This paper describes anAI-basedframeworkfordetectingin realtimetheunauthorizedaccess to a database of a government citizen information system. Our approach combines machine learning (ML)models, suchas anomaly detection, NLP(on SQLqueries) and behavior analytics, to detect anomalous queries with respect to the established query patterns. The modelis built to be adaptive, and trained on query logs, user profile, and access context to separate benign queries from malicious or unauthorized ones. Results on a simulated government database environment show that our approach can reach high detection rates with low false-positive rates and near real-time performance, which are essential features to be applied in the public critical infrastructures. These results highlight the significance of AI-based monitoring as an additional ARM-layer for citizen privacy and data sovereignty protection in future cybersecurity.

Introduction

The paper presents an AI-based database query monitoring system designed to detect unauthorized access in government citizen information systems. As governments increasingly adopt e-governance, sensitive databases containing citizen records, tax information, healthcare data, and identity records have become major targets for cyberattacks, particularly insider threats and compromised accounts. Traditional security mechanisms such as Role-Based Access Control (RBAC), Mandatory Access Control (MAC), and firewalls cannot detect authorized users who misuse their privileges. To address this challenge, the proposed framework employs Artificial Intelligence (AI) and Machine Learning (ML) to analyze database query behavior and identify suspicious activities in real time.

The proposed system combines multiple AI techniques, including unsupervised anomaly detection, Natural Language Processing (NLP) for SQL query analysis, graph-based behavioral modeling, and multi-level anomaly detection. These methods create behavioral profiles of users and roles, analyze the semantic meaning of SQL queries, monitor access relationships between users and database objects, and generate real-time alerts for suspicious activities while integrating seamlessly with existing Database Management Systems (DBMSs). The framework is designed to satisfy government-specific requirements such as privacy preservation, regulatory compliance, high availability, and compatibility with legacy systems.

The literature review highlights that existing database security research has focused on access control, database activity monitoring, anomaly detection, insider threat detection, SQL query analysis, and User and Entity Behavior Analytics (UEBA). Previous studies demonstrated the effectiveness of statistical anomaly detection, graph-based analysis, sequence modeling using LSTM networks, transformer-based models, and privacy-preserving techniques such as federated learning. However, most existing solutions either analyze SQL semantics or user behavior independently and are not specifically designed for government citizen information systems. The proposed framework addresses these gaps by integrating semantic SQL analysis, behavioral sequence modeling, and session-level anomaly detection within a privacy-aware architecture.

The system was evaluated using a realistic government-like environment consisting of Oracle Database 19c, AI frameworks including TensorFlow and Scikit-learn, and a dataset containing 2.4 million anonymized SQL query logs collected over twelve months. The study also utilized a manually labeled dataset of 45,000 queries, synthetic attack datasets containing SQL injection, privilege escalation, and data exfiltration scenarios, and benchmark intrusion detection datasets. Four AI models were developed and compared: Random Forest, Isolation Forest, LSTM, and a Transformer-based BERT-SQL model. Feature engineering included temporal, behavioral, structural, and role-based characteristics extracted from database audit logs.

Experimental results demonstrate that the proposed AI-based detection framework significantly outperforms conventional rule-based systems. The model achieved 97.6% accuracy, 96.8% precision, 95.3% recall, 96.0% F1-score, an AUC-ROC of 0.991, and a low 1.9% false positive rate, compared to 82.4% accuracy for traditional signature-based methods. Under simulated operational loads of up to 50,000 queries per hour, the system maintained an average detection latency below 41 milliseconds while sustaining detection rates above 96%, making it suitable for real-time deployment. Compared with static rule-based filtering, anomaly threshold methods, and manual auditing, the proposed framework demonstrated superior accuracy, lower false alarms, and dramatically faster detection, reducing response time from over 24 hours for manual audits to under 50 milliseconds.

The findings indicate that AI-based behavioral analytics can effectively distinguish legitimate database activity from unauthorized access, even when attackers use valid credentials. The framework provides practical benefits for government agencies by improving cybersecurity, reducing insider threats, minimizing data exfiltration risks, and supporting compliance with privacy regulations such as GDPR. Beyond government applications, the proposed approach is also applicable to sectors such as healthcare, finance, and critical infrastructure.

Despite its strong performance, the study acknowledges limitations, including dependence on historical behavioral data, potential vulnerability to adversarial behavior changes, challenges associated with class imbalance, evaluation within a simulated environment, and privacy concerns related to continuous behavioral monitoring. Future work will focus on cross-agency validation, adversarial machine learning defenses, Explainable AI (XAI), federated learning for privacy-preserving collaboration, and long-term evaluation to enhance robustness, transparency, and scalability in real-world government deployments.

Conclusion

A. RestatementoftheResearchProblem: Thepurpose of this paper is to focus on an important challenge in cybersecurity that the government faces, namely detecting the unauthorized querying of databases holding citizen information systems. With the pace of digitaltransformation increasing in public sector organizations, theexposure of sensitive citizen data to insider threats, privilege abuse, and highlyadvancedexternalattackshasbecomeacriticalissuethatlegacyrule-based security systems can no longer effectively manage. B. SummaryofMainFindings: TheresultsshowthatAI-baseddetectionmodels,especiallythosebasedon machine learning and anomaly detection, greatly outperform traditional signature-based methods in detecting unauthorized access to the database. Theabovesystemsuccessfullydistinguishesbetweennormalandmalicious administrative queries withhigh accuracy, even when threats are subtle, slow, or unknown. C. Implications: Together, these findings haveimportant implications for federal cybersecurity policy and practice. The implementation ofAI-driven query auditing mechanisms can significantly minimize the exposure of citizen recordstounauthorizedparties ,enhancingpublicconfidenceingovernment digitalinfrastructure. Inaddition,the proposedframeworkinthis workcan be seen as a generalized pattern that can be applied to any governmental database environment of any size or industry. D. AcknowledgmentofLimitations: Wealsoneedtoacknowledgethatthestudywasdoneinasimulated government database environment, and that environment may be not complex and diverse enough compared to real systems. However, the performance of the AI models is by nature tied to the quality and representativenessofthetrainingdata,anddriftofmodelsisapractical concern that requires continual retraining and monitoring. E. SuggestionsforFutureResearch: In future work, we will investigatecombining federated learning methods so that cross-agency threat detection can be performed while maintaining data sovereignty. More studies are required to assess the system\'s effectivenessagainstadvancedpersistentthreat(APT)andtoinvestigatethe ethical and legal aspects of AI-driven surveillance in the context of government data centers. F. ClosingStatement: Nowthatcitizens’dataislikelythemostsensitiveassetagovernmentholds inits hands, the deployment ofsmart, adaptive detectionmechanisms is not just a technical upgrade — it is a core commitment to the security of the nation and the public’s trust. This work represents a significant step inthat direction and provides a foundation upon which more robust, privacy-preserving government information systems can be designed.

References

[1] Agrawal, R., & Srikant, R. (2000). Privacy-preserving data mining. Proceedings of the ACM SIGMOD International Conference on ManagementofData,439–450.https://doi.org/10.1145/342009.335438 [2] Ahmad, Z., Shahid Khan, A., Wai Shiang, C., Abdullah, J., & Ahmad, F. (2021).Networkintrusiondetectionsystem:Asystematicstudyofmachine learning and deep learning approaches. Transactions on Emerging Telecommunications Technologies, 32(1), e4150. https://doi.org/10.1002/ett.4150 [3] Aleroud,A.,&Karabatis,G.(2017).Queryablesemanticstodetectcyber- attacks:A flow-based detection approach. IEEETransactions on Systems, Man, and Cybernetics: Systems, 47(10), 2queried–2773. https://doi.org/10.1109/TSMC.2016.2531671 [4] Axelsson,S.(2000).Intrusiondetectionsystems:Asurveyandtaxonomy (Technical Report 99-15). Chalmers University of Technology. https://www.cse.chalmers.se/~sm/Intrusion/axelsson00intrusion.pdf [5] Berti-Equille,L.,&Comyn-Wattiau,I.(2019).Dataqualityawarenessfor insider threat detection in databases. Journal of Data and Information Quality, 11(4), 1–28. https://doi.org/10.1145/3355401 [6] Bertino,E.,&Sandhu,R.(2005).Databasesecurity:Concepts,approaches, and challenges. IEEE Transactions on Dependable and Secure Computing, 2(1), 2–19. https://doi.org/10.1109/TDSC.2005.9 [7] Bhatt,S.,Manadhata,P.K.,&Zomlot,L.(2014).Theoperationalrole of security information and event management systems. IEEE Security & Privacy, 12(5), 35–41. https://doi.org/10.1109/MSP.2014.103 [8] Bishop,M.(2018).Computersecurity:Artandscience(2nded.).Addison- Wesley Professional. [9] Camina,J.B.,Hernandez-Gracidas,C.,Monroy,R.,&Trejo,L.A.(2019). The repeated-incremental-pruning-to-produce-error-reduction algorithm in the insider threat domain. Computers & Security, 83, 126–143. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 1–58. https://doi.org/10.1145/1541880.1541882 [10] Chen, Y., Gao, J., Li, D., & Shao, J. (2020). Anomalous query detection for database security using machine learning. Journal of Information Security and Applications, 55, 102662. https://doi.org/10.1016/j.jisa.2020.102662 [11] Cuzzocrea, A., Martinelli, F., & Mercaldo, F. (2022). A machine-learning framework for supporting intelligent web-based health data access control. Future Generation Computer Systems, 127, 325–338. https://doi.org/10.1016/j.future.2021.08.025 [12] Debar, H., Dacier, M., &Wespi, A. (1999). Towards a taxonomy of intrusion-detection systems. Computer Networks, 31(8), 805–822. https://doi.org/10.1016/S1389-1286(98)00017-6 [13] Di Martino, M., Dumas, M., La Rosa, M., Maggi, F. M., & Sadeghian, A. (2020). Prevalence of anomalous SQL queries in enterprise applications. Information Systems, 90, 101447. https://doi.org/10.1016/j.is.2019.101447 [14] Dong,B.,Wang,X.,Fan,B.,&Zhao,G.(2021).Asurveyondeeplearning and its applications in cybersecurity. Security and Communication Networks, 2021, 1–18. https://doi.org/10.1155/2021/5537510 [15] Esteves, J., & Joseph, R. C. (2008). A comprehensive framework for the assessmentofeGovernmentprojects.GovernmentInformationQuarterly, 25(1), 118–132. https://doi.org/10.1016/j.giq.2007.04.009 [16] Forrest, S., Hofmeyr, S. A., Somayaji, A., & Longstaff, T. A. (1996). A senseofselfforUnixprocesses.Proceedingsofthe1996IEEESymposium on Security and Privacy, 120–128. https://doi.org/10.1109/SECPRI.1996.502675 [17] Frank, M., Stolfo, S. J., Ye, J., & Ray, I. (2012). Game-based information security policy design: An adversarial model. Proceedings of the 5th InternationalConferenceonDecisionandGameTheoryforSecurity,45–62. https://doi.org/10.1007/978-3-642-34266-0_4 [18] Garcia-Teodoro,P.,Diaz-Verdejo,J.,Macia-Fernandez,G.,&Vazquez,E. (2009). Anomaly-based network intrusion detection: Techniques, systems and challenges. Computers & Security, 28(1–2), 18–28. https://doi.org/10.1016/j.cose.2008.08.003 [19] Gartner.(2022).Magicquadrantforsecurityinformationandevent management. Gartner Research. https://www.gartner.com/en/documents/magic-quadrant-siem [20] Goodfellow,I.,Bengio,Y.,&Courville,A.(2016).Deeplearning.MIT Press. https://www.deeplearningbook.org/ [21] Hu, Y., Panda, B., & Liu, J. (2004). Development of data masking techniquestoprotectsensitivedatabasecontents.Proceedingsofthe2004 ACM Workshop on Data and Applications Security, 39–49. https://doi.org/10.1145/1029441.1029447 [22] Hussain,F.,Abbas,S.G.,Shah,G.A.,Pires,I.M.,Fayyaz,U.U.,Shahzad, F., Garcia, N. M., &Zdravevski, E. (2021). A framework for malicious traffic detection in IoT healthcare environment. Sensors, 21(9), 3025. https://doi.org/10.3390/s21093025 [23] Kamra,A.,Terzi,E.,&Bertino,E.(2008).Detectinganomalousaccess patterns in relational databases. The VLDB Journal, 17(5), 1063–1077. https://doi.org/10.1007/s00778-007-0051-4 [24] Kanneganti,R.,&Chodavarapu,P.(2008).SOAsecurity.Manning Publications. [25] Kieseberg, P., Schrittwieser, S., Mulazzani, M., Echizen, I., & Weippl, E. (2010). An algorithm for detecting and defending against SQL injection attacks.ProceedingsoftheInternationalConferenceonInformationSecurity and Assurance, 1–9. https://doi.org/10.1109/ISA.2010.5513517 [26] Lee,S.Y.,Low,W.L.,&Wong,P.Y.(2002).Learningfingerprintsfora database intrusion detection system. Proceedings of the 7th European Symposium on Research in Computer Security (ESORICS), 264–280. https://doi.org/10.1007/3-540-45853-0_16 [27] Li,J.,Gu,C.,Wei,F.,&Chen,X.(2020).Privacy-preservingoutsourced classificationincloudcomputing.ClusterComputing,23(3),2227–2239. https://doi.org/10.1007/s10586-019-02987-9 [28] Liao,H.J.,Lin,C.H.R.,Lin,Y.C.,&Tung,K.Y.(2013).Intrusion detection system: A comprehensive review. Journal of Network and Computer Applications, 36(1), 16–24. https://doi.org/10.1016/j.jnca.2012.09.004 [29] Liu, F. T., Ting, K. M., & Zhou, Z. H. (2012). Isolation-based anomaly detection.ACMTransactionsonKnowledgeDiscoveryfromData,6(1),1–39. https://doi.org/10.1145/2133360.2133363 [30] Liu,Y.,Han,X.,&Ma,J.(2021).DetectingSQLinjectionattacksusing machine learning: A systematic review. IEEE Access, 9, 85390–85406. https://doi.org/10.1109/ACCESS.2021.3088149 [31] Mathew, S., Petropoulos, M., Ngo, H. Q., & Upadhyaya, S. (2010). A datacentricapproachtoinsiderattackdetectionindatabasesystems.Proceedings of the 13th International Symposium on Recent Advances in Intrusion Detection (RAI

Copyright

Copyright © 2026 Mohamed Saad, Dr. Adnan Al-Helali. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET83038

Publish Date : 2026-05-24

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here