Event security teams face a problem that does not get enough attention: two distinct failure modes, crowd density exceeding safe limits and a missing person somewhere in surveillance footage, share the same video data but almost always get handled by separate tools. We built Crowdnexis AI to close that gap. The system runs a single pipeline: a React and TypeScript frontend feeds video into a Node.js and Express backend, which spawns Python subprocess scripts for all AI inference. YOLOv8 handles person detection frame by frame. The counts from that detection feed into a four-tier risk classifier: SAFE, MODERATE, RISKY, CRITICAL. In parallel, face candidates pulled from the same sampled frames get compared against enrolled missing-person embeddings using 512-dimensional FaceNet vectors and cosine similarity at a 0.60 threshold. FAISS acceleration is available for larger person databases. Every result, alert, and match outcome writes to PostgreSQL. Operators interact through a dashboard, a filterable alerts panel, and a chatbot that accepts queries in English, Hindi, and Marathi. We define an evaluation framework covering crowd-count error, risk classification accuracy, face-match precision and recall, and full-path alert latency. This paper covers the architecture, the key design decisions, the implementation, and that evaluation plan.
Introduction
Crowdnexis AI is an AI-based surveillance system designed to improve crowd safety by combining two traditionally separate tasks: crowd density monitoring and face-based missing person identification. Traditional CCTV monitoring depends heavily on human operators who struggle to simultaneously count crowds, track density changes, and identify specific individuals in large gatherings. The proposed system integrates both functions into a single AI pipeline to reduce workload, improve efficiency, and provide real-time alerts.
The system uses advanced computer vision technologies including YOLOv8 for person detection and FaceNet for face recognition. It processes video footage to estimate crowd size, classify risk levels, and search for registered missing persons. The platform also includes a database-backed operator dashboard and a multilingual chatbot supporting Hindi and Marathi for easier field operation.
The evolution of related research shows that early crowd monitoring relied on traditional machine learning methods, which struggled with dense crowds and occlusion. Deep learning approaches such as CNN, LSTM, and density estimation networks improved performance, while Transformer and attention-based methods further enhanced accuracy. Face recognition also progressed from traditional methods to embedding-based models like FaceNet and ArcFace, enabling efficient identity matching without retraining.
System architecture of Crowdnexis AI includes:
Frontend: React and TypeScript dashboard for operators
Backend: Node.js and Express for API management
AI Engine: Python-based YOLOv8, MTCNN, and FaceNet processing
Database: PostgreSQL for storing jobs, alerts, persons, and detection results
Infrastructure: Redis and Socket.IO for queue and real-time communication
The workflow includes:
User authentication through the web interface.
Registration of missing persons with face embedding generation.
Video upload and background processing.
YOLOv8 detects people and estimates crowd size.
FaceNet extracts facial features and compares them with stored identities.
High-confidence matches generate alerts and update person status.
Crowd risk is classified into four levels:
SAFE: Less than 10 people
MODERATE: 10–24 people
RISKY: 25–49 people
CRITICAL: 50 or more people
The proposed evaluation framework measures:
Crowd estimation accuracy using MAE and RMSE
Risk classification performance using precision, recall, and F1-score
Face recognition performance using FAR, FRR, ROC curves, and retrieval accuracy
System efficiency using end-to-end alert generation time
Major strengths of Crowdnexis AI include combining crowd analysis and face recognition into one system, maintaining searchable records, providing automatic alerts, and supporting multilingual interaction. The system improves operational awareness by linking face matches with crowd risk information.
However, some limitations remain:
High computational requirements limit scalability for multiple video streams.
Crowd risk thresholds are manually defined and require real-world calibration.
Current analysis focuses on total crowd count rather than spatial crowd distribution.
Face recognition accuracy decreases with poor-quality images, occlusion, and appearance changes.
Fairness evaluation across different demographic groups has not been performed.
Conclusion
Crowdnexis AI shows that crowd risk assessment and missing-person detection can run in a single operator platform without exotic infrastructure. The components have solid empirical backing: YOLOv8 for person detection, FaceNet-style embeddings for identity matching, FAISS for scalable similarity search. The system packages them into a working workflow with persistent audit trails, a dashboard, and a multilingual chatbot. The architecture reflects real engineering trade-offs, not theoretical ideals, and the limitations are listed honestly. The evaluation framework defined here gives a concrete path to measuring how well the system performs once annotated test data is available.
References
[1] L. Guo and J. Li, \"Crowd counting method based on deep neural network,\" in Proc. 2022 Int. Conf. on Machine Learning, Control, and Robotics (MLCR), pp. 78–84, IEEE, 2022.
[2] G. C. Lee, Y. C. Lee, and C. C. Chiang, \"Low-resolution face recognition in multi-person indoor environments using convolutional neural networks,\" in Proc. 2021 Int. Conf. on Computational Science and Computational Intelligence (CSCI), pp. 1629–1633, IEEE, 2021.
[3] L. Deng, Q. Zhou, S. Wang, J. M. Gorriz, and Y. Zhang, \"Deep learning in crowd counting: A survey,\" CAAI Trans. on Intelligence Technology, vol. 2, no. 4, pp. 1043–1077, 2023.
[4] Y. Li, X. Zhang, and D. Chen, \"CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes,\" in Proc. IEEE CVPR, pp. 1091–1100, 2018.
[5] C. Zhang et al., \"Cross-scene crowd counting via deep convolutional neural networks,\" in Proc. IEEE CVPR, pp. 833–841, 2016.
[6] F. Schroff, D. Kalenichenko, and J. Philbin, \"FaceNet: A unified embedding for face recognition and clustering,\" in Proc. IEEE CVPR, pp. 815–823, 2015.
[7] D. Liang et al., \"TransCrowd: Weakly-supervised crowd counting with transformers,\" Science China Information Sciences, vol. 65, no. 6, 2022.
[8] J. Redmon and A. Farhadi, \"YOLOv3: An incremental improvement,\" arXiv:1804.02767, 2018.
[9] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, \"Joint face detection and alignment using multitask cascaded convolutional networks,\" IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016.
[10] P. Dandekar, M. Narwaria, and G. Bhatnagar, \"Low-resolution face recognition: Review, challenges and research directions,\" Computers & Electrical Engineering, vol. 98, 2024.
[11] Tommy, R. Siregar, and E. R. Syahputra, \"Low-resolution face image reconstruction using multi-stage FSRCNN,\" Int. Journal on Informatics Visualization, vol. 9, no. 3, pp. 1022–1032, 2025.
[12] M. M. Abid, R. B. Mustafa, and I. Ahmad, \"Computationally intelligent real-time security surveillance using RetinaFace and FaceNet,\" Multimedia Tools and Applications, vol. 83, 2024.
[13] M. A. Khan, H. Sardar, and S. Zaidi, \"A lightweight real-time CCTV surveillance framework for educational institutions using FaceNet,\" J. Advances in Information Technology, vol. 16, no. 8, 2025.
[14] [V. Lempitsky and A. Zisserman, \"Learning to count objects in images,\" in Advances in NIPS, vol. 23, pp. 1324–1332, 2010.
[15] J. Johnson, M. Douze, and H. Jegou, \"Billion-scale similarity search with GPUs,\" IEEE Trans. on Big Data, vol. 7, no. 3, pp. 535–547, 2021.