Gait Recognition Using GaitFormer on the CASIA-B Dataset

Authors: K. Mokshagna Anurag, M. P. S. S. S. Hari Chandra Hlada, N. Sai Kishore, P. Sai Lalith, Dr. P. Suryaprasad

DOI Link: https://doi.org/10.22214/ijraset.2026.83339

Abstract

Gait recognition is a biometric identification tech-nique that identifies individuals based on their walking patterns. Unlike fingerprint or facial recognition, gait can be observed at a distance without requiring the subject’s cooperation, making it useful for surveillance and security. This paper presents a gait recognition system based on the GaitFormer architecture, which uses convolutional neural network (CNN) layers for spatial feature extraction and Transformer-based attention for temporal modelling. The system is trained and evaluated on the CASIA-B gait dataset, which contains multi-view silhouette sequences un-der different walking conditions. The input pipeline preprocesses silhouette sequences and computes Gait Energy Images (GEIs) as compact gait cycle representations. The model learns spatial features through convolutional layers and captures temporal dependencies using self-attention. The system was implemented in Python using TensorFlow/Keras on Google Colab. Evaluation results show a test accuracy of 88.89%, Rank-1 accuracy of 88.89%, and Rank-5 accuracy of 100.00%.

Introduction

Gait recognition is a biometric identification technique that uses a person's walking pattern to distinguish individuals. Unlike fingerprints, iris scans, or facial recognition, gait recognition can be performed at a distance using standard video cameras without requiring physical contact or the subject’s awareness. This makes it particularly useful for surveillance, forensic investigations, and access control systems. However, gait recognition is challenging because factors such as viewing angle, clothing, and carried objects can significantly affect a person's appearance and walking pattern.

Recent advances in deep learning, especially Convolutional Neural Networks (CNNs) and Transformers, have improved gait recognition performance. CNNs are effective at extracting spatial features from individual frames, while Transformers use self-attention mechanisms to learn temporal relationships across an entire walking sequence. This project proposes a gait recognition system based on the GaitFormer architecture, which combines CNN-based spatial feature extraction with Transformer-based temporal modeling.

The primary objective of the study is to develop and evaluate a gait recognition system using the CASIA-B dataset, enabling accurate identification of individuals under different walking conditions. The system aims to preprocess silhouette sequences, generate Gait Energy Images (GEIs), implement the GaitFormer model, evaluate its performance using classification and retrieval metrics, compare it with a CNN + BiLSTM baseline, and analyze its strengths and limitations.

The literature review discusses two major categories of gait recognition methods: model-based approaches, which use body joint positions and structural information, and appearance-based approaches, which rely on visual representations such as silhouettes. The proposed work follows the appearance-based approach. The review also highlights the importance of Gait Energy Images (GEIs), which summarize a full gait cycle into a single image containing both static body shape and dynamic motion information.

Previous studies have shown that CNN-based methods outperform traditional handcrafted feature approaches by effectively learning body shape and movement patterns. Recurrent models such as LSTMs and BiLSTMs have been used for temporal analysis but often struggle with long-range dependencies. Transformer-based architectures address this limitation through self-attention mechanisms that capture relationships across all frames simultaneously, leading to improved recognition accuracy.

The system is trained and evaluated using the CASIA-B gait dataset, one of the most widely used datasets in gait recognition research. The dataset contains gait sequences of 124 subjects recorded under three conditions: normal walking, carrying a bag, and wearing a coat. Each subject is captured from 11 different camera angles ranging from 0° to 180°, providing a diverse set of walking scenarios.

The proposed methodology consists of four main stages:

Data Preprocessing – Silhouette images are normalized, resized, and centered to ensure consistency.
Gait Energy Image Generation – Multiple silhouette frames from a gait cycle are averaged to create compact and informative GEIs.
Spatial Feature Extraction – A CNN extracts important visual features such as body shape, edges, and limb positions from the input images.
Temporal Modeling and Classification – A Transformer encoder analyzes the sequence of extracted features to learn temporal walking patterns and perform person identification.

The GaitFormer architecture combines the strengths of CNNs and Transformers, allowing the system to capture both spatial and temporal characteristics of human gait effectively. This hybrid approach improves recognition performance across different viewpoints and walking conditions compared to traditional methods.

Conclusion

This paper presented a gait recognition system built using the GaitFormer architecture and tested on the CASIA-B dataset. The model uses CNN layers to get spatial features from silhouette frames and a Transformer encoder to learn temporal patterns in the walking sequence. The data was preprocessed by normalising silhouettes, aligning them, and computing GEIs. The implementation was done in Python with TensorFlow/Keras and run on Google Colab. On the test set, the model achieved 88.89% accuracy, 88.89% Rank-1 accuracy, and 100.00% Rank-5 accuracy. These numbers show that combining CNNs with Transformer attention works well for gait-based identification on this dataset.

References

[1] M. S. Nixon, T. N. Tan, and R. Chellappa, Human Identification Based on Gait. Springer, 2006. [2] C. Wan, L. Wang, and V. V. Phoha, “A survey on gait recognition,” ACM Computing Surveys, vol. 51, no. 5, pp. 1–35, 2018. [3] A. K. Jain, A. Ross, and S. Prabhakar, “An introduction to biometric recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 1, [4] pp. 4–20, 2004. [5] J. E. Boyd and J. J. Little, “Biometric gait recognition,” in Advanced Studies in Biometrics, Springer, 2005, pp. 19–42. [6] R. Liao, S. Yu, W. An, and Y. Huang, “A model-based gait recognition method with body pose and human prior knowledge,” Pattern Recogni-tion, vol. 98, p. 107069, 2020. [7] J. Han and B. Bhanu, “Individual recognition using gait energy image,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 2, pp. 316–322, 2006, doi: 10.1109/TPAMI.2006.38. [8] Z. Wu, Y. Huang, L. Wang, X. Wang, and T. Tan, “A comprehensive study on cross-view gait based human identification with deep CNNs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 2, pp. 209–226, 2017. [9] K. Shiraga, Y. Makihara, D. Muramatsu, T. Echigo, and Y. Yagi, “GEINet: View-invariant gait recognition using a convolutional neural network,” in Proc. ICB, 2016, pp. 1–8. [10] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. [11] A. Vaswani et al., “Attention is all you need,” in Proc. NeurIPS, 2017, [12] pp. 5998–6008. [13] J. N. Mogan, C. P. Lee, K. M. Lim, and K. S. Muthu, “Gait recognition using temporal-spatial feature learning for treadmill and overground walking,” IEEE Access, vol. 10, pp. 90280–90291, 2022. [14] Q. Wu, R. Xiao, K. Xu, J. Ni, B. Li, and Z. Xu, “GaitFormer: Revisiting intrinsic periodicity for gait recognition,” arXiv:2307.13259, 2023. [Online]. Available: https://arxiv.org/abs/2307.13259 [15] H. Chao, Y. He, J. Zhang, and J. Feng, “GaitSet: Regarding gait as a set for cross-view gait recognition,” in Proc. AAAI, vol. 33, 2019, pp. 8126–8133. [16] C. Fan et al., “GaitPart: Temporal part-based model for gait recognition,” in Proc. CVPR, 2020, pp. 14225–14233. [17] B. Lin, S. Zhang, and X. Yu, “Gait recognition via effective global-local feature representation and local temporal aggregation,” in Proc. ICCV, 2021, pp. 14648–14656. [18] S. Yu, D. Tan, and T. Tan, “A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition,” in Proc. ICPR, vol. 4, 2006, pp. 441–444, doi: 10.1109/ICPR.2006.67. [19] Institute of Automation, Chinese Academy of Sciences, “CASIA Gait Database.” [Online]. Available: https://english.ia.cas.cn/db/201610/ t20161026_169403.html [20] National Institute of Standards and Technology, “CASIA Gait Database,” Biometric and Forensic Research Database Catalog. [Online]. Available: https://tsapps.nist.gov/BDbC/Search/Details/574

Copyright

Copyright © 2026 K. Mokshagna Anurag, M. P. S. S. S. Hari Chandra Hlada, N. Sai Kishore, P. Sai Lalith, Dr. P. Suryaprasad. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET83339

Publish Date : 2026-05-31

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here