Many developments have taken place in the field of face-recognition and liveness analysis to improvise various device securities and attendance verification systems. Many approaches have incorporated 3D analysis of the face to predict the liveness of the person in front of it. Our method tries to account for this problem without using advanced 3D imaging techniques or hardware. This results in a solution that is both, more economical and also much easier to deploy. It consists of two parts; the former helps in face verification and the latter to check the liveness of the face. In the first part, we have used a model based on Google\'s FaceNet Model which learns a mapping from face images to compact Euclidean space distances, which directly correspond to the measure of similarity of the images. Once the space has been produced, face verification can be easily implemented using standard techniques with embeddings as feature vectors. For the second part, we have employed a cascaded multi-task framework that extracts certain features from the facial image which are then used to check for liveness by tracking their relative displacements. These extracted features were used to check the liveness of the person\'s face by asking them to perform some tasks in a random order like head and facial movements etc.
Introduction
The text presents a facial recognition and liveness detection system that aims to provide a simpler and more practical alternative to expensive 3D face-analysis technologies. Instead of relying on specialized hardware and complex 3D imaging, the proposed approach combines the Google FaceNet Inception Model with a query-based face motion recognizer to achieve accurate face verification and liveness detection.
Face Recognition
The system uses the FaceNet Inception Model with an input size of 96 × 96 × 3 pixels for faster performance on devices with limited computational power.
FaceNet converts facial images into 128-dimensional feature embeddings.
Recognition is based on one-shot learning, where a single reference image is compared with captured images using Euclidean distance between embeddings.
The model is trained using triplet loss, which minimizes the distance between images of the same person (anchor-positive pair) and maximizes the distance from images of different people (anchor-negative pair).
Hard triplets are selected during training to improve convergence speed and accuracy.
Feature Extraction and Liveness Detection
The system employs Multi-task Cascaded Convolutional Networks (MTCNN) to detect faces and extract facial landmarks.
MTCNN operates in three stages:
P-Net identifies candidate face regions.
R-Net refines and filters face detections.
O-Net extracts facial landmarks such as eyes, nose, and mouth corners.
These landmarks are tracked in real time to monitor facial movements and verify liveness.
Liveness Verification
The method analyzes relative movements of facial features, such as:
Head turning left or right.
Smiling.
Pouting.
Changes in landmark positions (e.g., nose location, eye alignment, lip distances) are used to confirm that a real person is present rather than a photograph.
Users may be asked to perform random facial actions, and any incorrect response immediately terminates the authentication process, increasing security.
Conclusion
Though this model architecture works fairly well in realtime, there is still scope of certain optimizations and improvements to enhance its performance. We have employed an image size of 96× 96 × 3 pixels as input for the inception model, but it is speculated that it can give an increased accuracy of about 89 percent for an input image size of 266 × 226× 3 against the 83 percent of our current model but at the cost of some additional computational power of course.
Also, bigger datasets are always better in the case of such deep networks, hence there is a scope of improvement in this field as well. Several new activities (tasks) like the blinking of eyes, looking up and down, opening the mouth, raising eyebrows, etc. can also be added to the existing list of movements to enhance the accuracy and security. But even under limited resources and comparatively less computation and funding, this setup can be used for realtime security and digital device unlocking services as compared to more advanced techniques like infrared and three-dimensional imaging technologies.
An interesting implementation of this technology could be to replacing biometric verification systems that require physical contact with a centrally located device (fingerprint/retinal scanners). Instead, the actual physical presence of an individual employee could be detected by tying in the GPS location of his/her smartphone with the liveness detection in front of its front camera. This will help reduce crowding around any centrally located device (aid social-distancing), and make the attendance contact-free.
References
[1] M. T. M. C. C. F. a. S. S. A. Lagorio, \"Liveness detection base[
[2] d on 3Dface shape analysis,\"2013 International Workshop on Biometrics and Forensics (IWBF), no. 10.1109/IWBF.2013.6547310, pp. 1-4, 2013.
[3] T.DetectionP. J.-M.byF. a.BuildingF. M.YiVirtualXu, \"Virtualfrom YourU: DefeatingPublic Photos.\".Face Liveness
[4] M. N. D. R. a. J. D. M. De Marsico, \"Moving face spoofing detection via 3D projective invariants,\" 2012 5th IAPR International Conference onBiometrics (ICB), NewDelhi,no. 10.1109/ICB.2012.6199761, pp. 73-78, 2012.
[5] S. N. H. A. M. N. M. R. A. S. H. &. A. H. Shaees, \"Facial emotion recognition usingtransfer learning,\"2020 InternationalConference on Computing and Information Technology, no. ICCIT-1441, pp.1-5, 2020.
[6] K. e. a. Muhammad, \"Human action recognition using attention based LSTM network with dilated CNN features.,\" Future Generation Computer Systems, vol. 125, pp. 820-830,2021.
[7] Y. A. U. L.-M. P. a. M. L. Rehman, \"SLNet: Stereo face liveness detection via dynamic disparity-maps and convolutional neural network.,\" Expert Systems with Applications, vol. 142, no. 113002, 2020.
[8] F. D. K. a. J. P. Schroff, \"Facenet: A unified embedding for face recognition and clustering,\" Proceedings of the IEEE conference on computer visionand pattern recognition, 2015.
[9] L. e. a. Zhang, \"Multi-task cascaded convolutional networks based intelligent fruit detection for designing automated robot.,\" IEEE Access 7, no. 56028-56038, 2019.
[10] R.systemT. S.usingA. a. R.modifiedM. R. G.affineSharma,transformation\"A novel real-timeand Haarfacecascades,\"detection Recent Findings inIntelligent Computing Techniques. Springer, Singapore, no. 193-204, 2019.
[11] I.ConvolutionalS. G. E. H. AlexNeuralKrizhevsky,Networks,\"\"ImageNetAdvancesClassificationin Neural Informationwith Deep Processing Systems 25, 2012.
[12] I.convolutionalS. a.G. H. AlexneuralKrizhevsky,networks,\"\"ImagenetAdvancesclassificationin Neural Informationwithdeep Processing Systems, p. 1106–1114, 2012.
[13] D.trainingR. W.fora. T.gradientR. Martinez,descent\"Thelearning.generalNeuralinefficiencyNetworks,\"of batchno. 16(10):1429–1451, 2003.
[14] Z.cascadeL. X.forS. J.faceB. a.detection,\"G. H. H.IEEELi, \"AConferenceconvolutionalonComputerneural networkVision and Pattern Recognition, pp. 5325-5334, 2015.
[15] S. R. Y. W. X. C. a. J. S. D. Chen, \"Joint cascade face detectionand alignment,\" Proc. ECCV, 2014.
[16] D. R. X. Zhu, \"Face detection, pose estimation, and landmark localizationin the wild,\"IEEE Conference on Computer Vision and PatternRecognition, pp. 2879-2886,2012.