With the rapid rise in population, public areas such as malls, supermarkets, and transport hubs are becoming increasingly crowded. Businesses depending on customer footfallpatterns requireaccurate datatooptimize operations.Toaddress this, we developed a people counting and tracking system that detects, tracks, and identifies individuals in real-time.The system uses Faster R-CNN for robust people detection, offering high accuracy even in dense environments. To ensure consistent monitoring, DeepSORT assigns unique IDs to each individual andtracks them across frames. Additionally, DeepFace is integrated for face recognition, enabling the system tomatch detected faces with previously registered identities.A face registrationmodule (register_faces.py)allowswebcam- based registration, making it user-friendly. The evaluation module (evaluate.py) computes key performance metrics such as Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). The model was tested on a dataset comprising 2416 positive and 1218 negative image samples. It achieved a True Positive Rate (TPR) of 95.03%, a False Positive Rate (FPR) of 0.08%, and an overall accuracy of 97.08%. While the model performs well, challenges such as overlapping subjects, varying clothing, and lighting conditions may occasionally affect results.
This system provides a reliable and scalable solution for people counting, face tracking, and identity verification..
Introduction
Project Overview
The People Counting System is a real-time surveillance solution designed for environments such as retail stores, malls, transport hubs, and public venues. It uses advanced deep learning and computer vision techniques to:
Detect individuals via overhead cameras
Track their movements
Recognize faces
Trigger alerts when occupancy thresholds are exceeded
This system significantly improves upon earlier generations of people counters that used infrared or thermal sensors by integrating high-accuracy image processing and AI models.
Key Technologies Used
Faster R-CNN – For accurate and efficient person detection using region proposal networks.
DeepSORT – For real-time multi-object tracking, assigning unique IDs and handling occlusions or reappearances.
DeepFace – For face recognition, matching individuals to previously registered profiles.
OpenCV + Webcam – For face registration and live camera feed handling.
MAE & RMSE – Used to evaluate the accuracy of people counting.
System Phases
1. Setup Phase
Initializes Faster R-CNN for detection
Integrates DeepSORT for tracking
Registers known faces via webcam using DeepFace
Prepares models trained on datasets (e.g., CUHK Mall Dataset)
2. Detection Phase
Live video input is processed frame-by-frame
Detected individuals are assigned consistent IDs using DeepSORT
Faces are recognized (if registered) using DeepFace
Real-time bounding boxes and IDs are drawn on screen
3. Output Phase
Maintains:
current_count: Number of people in frame
total_unique: Unique persons detected in the session
Triggers alerts if person count exceeds a set threshold
Optionally logs face recognition events (e.g., names, time, duration)
Alerts can be sent via sound, messages, APIs, MQTT
Performance metrics like accuracy (97.08%), MAE, and RMSE are calculated
Literature Context
Earlier methods:
1st-gen: Infrared beam sensors
2nd-gen: Thermal sensors
3rd-gen: Vision-based algorithms like Haar Cascades, YOLO, and SSD
Modern approaches like YOLO offer speed but struggle with close object detection
RGB-D and head-shoulder detection methods enhance overhead tracking
Experimental Results
Achieved 97.08% accuracy in real-time people detection and face recognition
Capable of handling crowded, dynamic, and low-light scenarios
Maintains accurate tracking, reducing ID switches or false counts
Efficient at detecting, tracking, and recognizing individuals simultaneously
Suitable for applications in crowd analytics, public safety, security, and retail intelligence
Conclusion
In conclusion, the integration of advanced algorithms such as faster r-cnn, deepsort, and deepface has significantly enhanced the performance of thispeople detection, tracking, and recognition system. by combining these state-of-the-art technologies, thesystem achieves impressive accuracy and reliability, making it highly suitable for real-time applications that require both object detection and facial recognition. the ability to detect, track, and recognize individuals with high precision— even in challenging environments— demonstrates the system\'s robustness and versatility.with an accuracy rate of 97.08%, the system excels in variousscenarios, includingcrowded scenes,occlusions, and dynamic lighting conditions. this makes it apowerful solution for a wide range of real-world use cases, from surveillance and security to crowd management and personalized experiences, ensuring a seamless and dependable performance in diversesettings.
References
[1] Aman Kumar Singh, Dheeraj Singh, Mohit Goyal. “People CountingSystemUsingPython.”IEEEXplorePartNumber:CFP21K25-ART(2021)
[2] MisbahAhmad, ImranAhmed,Kaleem Ullah, MaazAhmad. “A DeepNeural Network Approach for Top View People Detection andCounting.”Auckland University of Technology(2020)
[3] Jahanvi Mehariya,ChaitraGupta,NiranjanPai,SagarKoul,PrashantGadakh. ”Counting Students using OpenCV and Integration withFirebase for Classroom Allocation.” IEEE Xplore Part Number:CFP20V66-ART(2020)
[4] KhalilKhan,RehanUllahKhan,WaleedAlbattah.“CrowdCounting UsingEnd-to-EndSemanticImageSegmentation.”LicenseeMDPI,Basel, Switzerland. (2021)
[5] Gabriela Curiel,Kevin Guerrero, Diego Gómez,Daniela Charris. “AComputer Vision-Based System for Human Detection and AutomaticPeopleCounting.”TransactionsonEnergySystemsandEngineeringApplications, 5(2): 624, (2024)
[6] SungInCho(DonggukUniversity,SouthKorea).“Vision-BasedPeopleCounterUsingCNN-BasedEventClassification.”0018-9456(c)-IEEE(2019)