Real-Time Facial Recognition Using YOLOv5Face, ArcFace, and ByteTrack for Efficient Identification

Authors: Shubeka Iram M, UshaShree K S, Gousiya Banu, Raashi K, Dr. Mallikarjuna A

DOI Link: https://doi.org/10.22214/ijraset.2025.71142

Abstract

Facial recognition plays a crucial role in security, surveillance, and access control systems. However, existing methods often struggle with accuracy, real-time performance, and efficient tracking in dynamic environments. Methods Used: This paper presents a real-time facial recognition system integrating YOLOv5- Face for robust face detection, ArcFace for high- accuracy feature extraction, and ByteTrack for effective multi-face tracking. The combination of these models ensures precise detection, distinct feature embedding, and efficient tracking under challenging conditions such as occlusions and varying illumination. Results Achieved: Experiments on benchmark datasets demonstrate superior recognition accuracy and computational efficiency compared to traditional methods. The system achieves highprecision and recall while maintaining real-time performance. Concluding Remarks: The findings emphasize the model’s effectiveness in real-world applications, including securityand surveillance. Future improvements will focus on scalability, privacy considerations, and adaptive tracking under dynamic conditions.

Introduction

Introduction

Facial recognition has become vital for security, access control, and biometric identity. Traditional methods like PCA and LBP lack robustness against lighting, pose, and occlusion. Modern deep learning models like FaceNet and DeepFace greatly improved accuracy but are computationally expensive for real-time use. Current challenges include real-time detection in poor lighting and occlusion, identity tracking across video frames, and ensuring performance in dynamic environments.

Literature Review

Key studies that influenced the proposed work:

YOLO-Based Face Detection: YOLOv5-Face shows superior speed and accuracy for real-time detection, especially after applying optimization techniques like pruning and quantization.
ArcFace: Uses Additive Angular Margin Loss to create highly discriminative facial embeddings, outperforming older methods on large datasets.
ByteTrack: Enhances object tracking by maintaining low-confidence detections, enabling reliable multi-frame tracking even in occluded or busy scenes.
Face Transformer (FTrans): Replaces CNNs with self-attention mechanisms for better generalization, especially in low-resolution and occluded conditions.
Federated Learning for Edge Devices: Demonstrates real-time accuracy with reduced computational load using model compression techniques like quantization and pruning.

Phishing Detection (Non-core Section)

Several studies used BERT and hybrid deep learning models (like BERT + LSTM, XLNet + GRU) to detect phishing URLs with high accuracy. Though insightful for cybersecurity, these are only tangentially related to the facial recognition topic.

Proposed Method

The system integrates:

YOLOv5-Face for fast and accurate face detection,
ArcFace for robust feature embedding and identity verification,
ByteTrack for effective multi-frame face tracking.

Pipeline Steps:

Dataset Preparation: Datasets like WIDERFACE, VGGFace2, LFW, and AFLW were selected to represent real-world challenges (e.g., occlusion, pose, lighting).
Face Detection: YOLOv5-Face detects faces at >30 FPS with bounding boxes and face alignment using 68-point landmarks.
Feature Extraction: ArcFace converts faces into high-dimensional embeddings using Cosine Similarity for identity verification.
Tracking: ByteTrack efficiently tracks faces across video frames, outperforming DeepSORT in accuracy and fewer identity switches.
Decision & Output: System identifies individuals or triggers alerts for unknown faces, integrating with security or attendance systems.
Evaluation: Compared against models like FaceNet + MTCNN + DeepSORT and DeepFace + Haar Cascades, the proposed method outperformed others in both detection and recognition accuracy (98.2%).

Results and Discussion

System works in real-time (30–35 FPS) and handles lighting changes, occlusions, and background noise well.
ArcFace maintained high accuracy (90–95%) across varied facial expressions and angles.
ByteTrack ensured continuous identity tracking across video frames.
Cosine Similarity enabled quick and accurate identity verification.
The system demonstrates scalability and robustness for real-world applications like surveillance, attendance, and authentication.

Conclusion

A real-time optimal face recognition system using YOLOv5-Face for face detection, ArcFace for feature extraction, and ByteTrack for tracking has been implementedhere. The proposed method takes only 30 framesper second at a real-time processing speed and maintains an accuracy rate of 98.2%, which is higher than the conventional face recognition frameworks. Comprehensive trials were performed to test the system in a range of environments, such as low light, occlusions, and diverse headpositions. 1) Research Limitations Although the advantages of the present research, this work is affected by several problems: Occlusion Robustness-The model works quite well in moderate occlusions, but it can significantly reduce recognition accuracy in extreme cases (such as a full-face mask). For processing in real time, GPU use is required.GPU use is limited since it would be necessary to employee ways effectively on battery-operateddevices. TheMoreaggressivedefences arebecomingnecessary, even in other domains. The detection layers need an extra defensive layer because the rateof security against adversarial attacks, such as deepfake spoofs and adversarial perturbations, is poor. Ethics- Privacy Issues with Facial Recognition The area continues to face ethical challenges related to bias, privacy, and biometric data. 2) Future Scope: Future studies will concentrate on the following areas to lessen these constraints and improve the system even more: From Occlusion to Robust Recognition: Accurately recognizing objects under strong occlusion by usingattention-based models and GAN-based occlusion recovery approaches. Deploying Lightweight Models: Model quantization and pruning are two methods used to optimize the architecture for distribution over low-power edge devices. In order to counter spoofing attacks, adversarial training and liveness detection algorithms have been introduced. Privacy-Preserved Methods: Developing Federated Learning, which combines decentralized training with no data transmission to central nodes, thus protectinguserprivacy. Multimodal BiometricFusion: This techniquecombines many biometric modality components, like voice and iris recognition, to improve security and dependability.

References

[1] Xiaohui Mu, Siying Li, HaipengPeng, and Mr. Lixiang Li: \"A Review ofFaceRecognition Technology\" July 21, 2020, pp. 2892-2897, ISSN-2349-5162. [2] Alexandre Bernardino, Luis Lopes Chambino, and Jose Silvestre Silva: \"Multispectral Facial Recognition -A Review\" pp. 47–50, November 16, 2020, DOI 10.17148/IJARCCE.2020.9607 SSN (Online) 2278-1021 ISSN. [3] Gang Hua: \"Overview of the Special Section on Face Recognition in the Real World\" pp. 26-30, October 2011, ISSN 2582-7421. [4] Huang, G. B., E. Learned-Miller, T. Berg, and M. Ramesh: \"Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments\" October 2007, University of Massachusetts, Amherst, Technical Report 07-49. [5] \"Face Recognition: Convolutional Neural-Network Approach\" by S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back IEEE Transactions on Neural Networks, pp. 98–113, Vol. 8, No. 1, 1997. [6] In 2018, I. Masi, Y. Wu, T. Hassner, and P. Natarajan presented \"Deep Face Recognition: A Survey\" at the 31st SIBGRAPI Conference on Graphics, Patterns, and Images (SIBGRAPI), which covered pages 471-478. [7] F. Schroff, D. Kalenichenko, and J. Philbin: “FaceNet: A Unified Embedding for Face Recognition and Clustering.” 2014, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815-823. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4690-4699. [8] J. Deng, J. Guo, N. Xue, and S. Zafeiriou: \"ArcFace: Additive Angular Margin Loss for Deep Face Recognition.\" [9] \" A. Rosenfeld, P. J. Phillips, R. Chellappa, and W. Zhao: \"Face Recognition: A Literature Survey\" ACM Computing Surveys, Vol. 35, No. 4, December 2003, pp. 399–458. [10] FahadMajeed, Farrukh Khan, Muhammad JavedIqbal, and Maria Nazir: \"Facial Recognition-Based Real-Time Surveillance System Using YOLOv5\" 2021 Mohammad Ali Jinnah University International Conference on Computing (MAJICC), DOI: 10.1109/MAJICC53071.2021.9526254. [11] IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, No. 11, 2009, pp. 1955-1967, DOI: 10.1109/TPAMI.2009.50; X. Wang and X. Tang, \"FacePhoto-Sketch Synthesis and Recognition.\"

Copyright

Copyright © 2025 Shubeka Iram M, UshaShree K S, Gousiya Banu, Raashi K, Dr. Mallikarjuna A. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET71142

Publish Date : 2025-05-16

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here