A Multi-Modal Autonomous Framework for Cyber Threat Intelligence across Heterogeneous Web Environments

Authors: Subramani S, Jai Sudhan K, Arul Kumar J, Rakeshraj R

DOI Link: https://doi.org/10.22214/ijraset.2026.82308

Abstract

MANTIS-CTI (Multi-modal Autonomous Network Threat Intelligence System) is a comprehensive cyber threat intelligence framework designed to collect, normalize, analyze, persist, and present threat data originating from heterogeneous web environments such as the surface web, dark-web onion services, manual analyst submissions, and dynamically changing online resources. The project addresses the common operational gap in student-scale intelligence prototypes, where crawling, extraction, ATT&CK mapping, model-assisted enrichment, storage, and analyst-facing visibility are often implemented as separate scripts instead of a coherent platform. The proposed framework integrates a staged collector, an evidence-aware normalization pipeline, IOC extraction, triage scoring, MITRE ATT&CK mapping, model registry management, live diagnostics, and a single-page analyst dashboard into one unified system. The backend is implemented using Python 3.13, FastAPI, PostgreSQL 17, and a staged collection architecture that records crawl jobs, source runs, step events, documents, findings, connectors, audit trails, and model health across 67 REST endpoints and 23 database tables. The frontend is implemented as a Single Page Application (SPA) providing 20 analyst-facing views. Optional NLP and vision modules are integrated through a registry-first loading mechanism. The completed system demonstrates that a project can move beyond a static CTI dashboard and provide a realistic operational workflow suitable for live demonstration, incremental research extension, and future conversion into a production-oriented CTI platform.

Introduction

The document presents MANTIS-CTI, an integrated cyber threat intelligence (CTI) platform designed to convert raw cyber threat data into structured, actionable intelligence. CTI traditionally suffers from fragmentation, where data collection, analysis, and reporting are handled by separate tools, leading to poor traceability, weak observability, and inefficient workflows. MANTIS-CTI addresses these issues by unifying crawling, artifact extraction, AI-based analysis, ATT&CK mapping, and reporting into a single staged pipeline with a centralized API and analyst dashboard.

The system is built to handle multimodal intelligence sources such as web pages, dark web content, images, and obfuscated data. It supports controlled crawling, structured storage, model-assisted analysis (e.g., BERT for entity extraction, BART for summarization, ResNet for image analysis), and standardized CTI mapping using MITRE ATT&CK, STIX 2.1, and TAXII 2.1. It also introduces strong observability features, ensuring every step of the intelligence pipeline is traceable and reviewable.

The problem addressed is the lack of unified, analyst-driven CTI systems that provide end-to-end visibility, governance, and scalability. Existing approaches rely on disconnected scripts and tools, leading to poor evidence traceability and limited real-time analysis. MANTIS-CTI solves this by implementing staged execution, structured event logging, registry-controlled AI models, and a persistent database-backed workflow.

The system’s objectives include secure and controlled intelligence collection, normalization and extraction of cyber artifacts, AI-driven classification and enrichment, ATT&CK-based mapping, and export-ready structured intelligence generation. It also ensures full process transparency and supports both surface web and dark web data sources.

The literature review shows that while existing frameworks like MITRE ATT&CK, STIX/TAXII, transformer models, and graph-based CTI systems contribute valuable components, none provide a complete end-to-end operational platform. MANTIS-CTI fills this gap by integrating these capabilities into a single unified, observable, and extensible CTI system.

Conclusion

The MANTIS-CTI project demonstrates that a final year engineering project can be built around a genuinely integrated cyber threat intelligence framework rather than a narrow script or one-page dashboard. The completed work covers source onboarding, staged collection, evidence-aware normalization, IOC extraction, ATT&CK mapping, optional multimodal enrichment, persistent storage, observability, analyst workflow pages, connectors, and standards-aligned export foundations. Most importantly, the project transforms these functions into a platform that can be explained convincingly during review and live demonstration. A major strength of the project is its emphasis on transparency. Crawl Jobs shows discovered URLs and scope decisions. Job Process shows per-URL progress and analysis outputs. Live Logs shows structured event flow. Diagnostics shows component health. Model Management shows local runtime safety controls. Together, these capabilities ensure that the system is not perceived as a static UI backed by hidden scripts, but as a coherent CTI platform with visible runtime behavior. The framework successfully addresses the five core problems identified in the problem statement: uncontrolled crawl recursion is solved by scope-aware frontier classification; poor visibility into job state is solved by durable step events and live logs; weak coupling between raw content and structured findings is solved by the evidence-aware normalization and unified finding assembly; fragile handling of optional AI models is solved by the registry-first model control plane; and limited support for standards-aligned CTI export is solved by the STIX-style bundle preparation and manageable connector architecture.

References

[1] MITRE ATT&CK®. Enterprise Matrix and ATT&CK Knowledge Base. MITRE Corporation. [Online]. Available: https://attack.mitre.org/ [2] OASIS Open. STIX Version 2.1. OASIS Standard, 2021. [Online]. Available: https://docs.oasis-open.org/cti/stix/v2.1/os/stix-v2.1-os.html [3] OASIS Open. TAXII Version 2.1. OASIS Standard, 2021. [Online]. Available: https://docs.oasis-open.org/cti/taxii/v2.1/os/taxii-v2.1-os.html [4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, \"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,\" in Proc. NAACL-HLT, 2019. [5] M. Lewis et al., \"BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,\" in Proc. ACL, 2020. [6] K. He, X. Zhang, S. Ren, and J. Sun, \"Deep Residual Learning for Image Recognition,\" in Proc. IEEE CVPR, 2016. [7] TINKER: A Framework for Open Source Cyberthreat Intelligence. arXiv:2102.05571, 2021. [8] FastAPI Documentation. [Online]. Available: https://fastapi.tiangolo.com/ [9] PostgreSQL Documentation. [Online]. Available: https://www.postgresql.org/docs/ [10] Tor Project Documentation. [Online]. Available: https://community.torproject.org/ [11] Pydantic Documentation. [Online]. Available: https://docs.pydantic.dev/ [12] OWASP Foundation. Logging Cheat Sheet. [Online]. Available: https://cheatsheetseries.owasp.org/ [13] Playwright Documentation. [Online]. Available: https://playwright.dev/python/ [14] Uvicorn Documentation. [Online]. Available: https://www.uvicorn.org/ [15] OASIS CTI TC. STIX/TAXII Interoperability Guidance. [Online]. Available: https://www.oasis-open.org/committees/cti/

Copyright

Copyright © 2026 Subramani S, Jai Sudhan K, Arul Kumar J, Rakeshraj R. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET82308

Publish Date : 2026-05-11

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here