Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Yashraj Mishra, Ankita Jaiswal, Anany Shukla, Abhishek Verma, Humesh Verma, Dr. Goldi Soni
DOI Link: https://doi.org/10.22214/ijraset.2025.70544
Certificate: View Certificate
The increasing demand for accurate and real-time human activity and fitness tracking has led to the development of diverse computer vision and deep learning models. Among these, OpenPose has emerged as a powerful tool for multi-person 2D pose estimation, enabling precise body posture tracking from RGB images. This review paper presents a comprehensive analysis of state-of-the-art approaches for human activity recognition and posture detection, with a primary focus on comparing OpenPose with other convolutional neural network (CNN)-based architectures currently used in academic research and commercial applications. We investigate the strengths and limitations of different models in terms of detection accuracy, computational efficiency, robustness in dynamic environments, and application in fitness and healthcare systems. The paper consolidates findings from 30 IEEE research publications, highlighting how various approaches have evolved and been implemented for body posture recognition, real-time fitness feedback, and rehabilitation monitoring. Additionally, we discuss the integration of these models with wearable sensors and mobile applications, their performance in real-world scenarios, and future research directions aiming to improve usability, personalization, and energy efficiency. This review provides valuable insights for researchers and developers seeking to advance human activity tracking through deep learning-based posture estimation techniques.
Recent advancements in AI and computer vision have transformed human activity recognition (HAR) and fitness tracking, with significant applications in healthcare, sports, rehabilitation, and human-computer interaction. Central to these developments is accurate human posture estimation, particularly through frameworks like OpenPose and CNN-based models.
A. OpenPose
A real-time 2D multi-person pose estimation framework.
Uses Part Affinity Fields (PAFs) and confidence maps to locate body keypoints from RGB images.
Pros:
Supports multiple people.
High spatial accuracy.
Cons:
High GPU demand; not ideal for edge deployment.
B. CNN-Based HAR
Uses convolutional neural networks to classify human activities based on video or sensor data (e.g., accelerometers, gyroscopes).
Often combined with temporal models (e.g., LSTMs, GRUs).
Pros:
Automatic feature extraction.
High accuracy with structured data.
Cons:
Complex models not suitable for lightweight or real-time systems.
C. Lightweight Models for Edge Devices
Optimized for mobile and embedded deployment.
Use lightweight CNN architectures like MobileNet, SqueezeNet, and EfficientNet-lite.
Techniques: quantization, pruning, knowledge distillation.
Pros:
Energy-efficient, real-time, privacy-preserving.
Works offline.
Cons:
Lower accuracy compared to full models.
Method | Input Type | Platform | Strengths | Limitations |
---|---|---|---|---|
OpenPose | RGB images | GPU-based PC | High accuracy, multi-person support | Heavy computation, high latency |
CNN-HAR | Time-series | Mobile/cloud | Sensor fusion, automatic feature learning | Less intuitive, visual limitations |
MediaPipe | RGB images | Smartphones | Real-time, lightweight | Reduced accuracy in poor lighting |
Lightweight CNNs | Sensors/images | Edge devices | Power-efficient, deployable on MCUs | Lower accuracy than full models |
OpenPose excels in spatial posture estimation but is resource-heavy.
CNN-based models are highly effective in recognizing activities from temporal data but may lack real-time edge compatibility.
Sensor-based HAR (e.g., DeepSense, HAR-Net) offers privacy-aware and energy-efficient solutions, especially suitable for wearables.
Edge AI (e.g., TinyPose, MobileNet) is gaining traction for offline, low-latency, and private inference.
Major challenges remain:
Occlusion, variable lighting, and clutter.
Domain generalization and personalization.
Ethical concerns: data privacy, secure handling, and user consent.
Multimodal Sensor Fusion: Combining video, motion sensors, EEG, ECG, and GPS for more accurate and context-aware recognition.
Edge Optimization: Continue developing real-time, low-power models (quantization, pruning, etc.).
Personalized Models: Employ federated learning and continual learning for adaptive, privacy-respecting HAR systems.
Human activity and posture recognition has emerged as a critical area in computer vision and AI, offering wide-ranging applications in healthcare, fitness tracking, surveillance, assistive technology, and human-computer interaction. This review explored a broad spectrum of approaches used in this domain, with a specific focus on comparing traditional Convolutional Neural Network (CNN)-based methods and cutting-edge frameworks such as OpenPose and MediaPipe. Through a critical analysis of 30 IEEE research papers, we evaluated the methodological advances, performance benchmarks, and trade-offs across various models. The integration of MediaPipe in your own AI-powered human activity tracker demonstrates the capabilities of lightweight and real-time frameworks in solving real-world problems, particularly in mobile and embedded environments. These innovations have significantly reduced computational complexity while maintaining high accuracy in pose detection and activity classification. However, challenges remain, including the need for generalization in unconstrained environments, privacy preservation, and the integration of multiple data modalities for deeper contextual understanding. Furthermore, there is a growing need to move toward personalized and explainable AI systems that can adapt to individual users while providing transparency in decision-making processes. As research continues to progress, the field is poised for impactful advancements—especially in combining 3D pose estimation, sensor fusion, and real-time analytics on edge devices. The future of human activity tracking lies in scalable, efficient, and human-centric solutions that can seamlessly integrate into daily life, empowering users with actionable insights and fostering better health, safety, and productivity outcomes. From this comparative study, it is evident that: 1) OpenPose offers high accuracy for multi-person setups but isn\'t practical for real-time mobile use. 2) CNN-based HAR works well with sensor data and is suitable for structured datasets. 3) MediaPipe and BlazePose provide a balanced trade-off between speed and accuracy on smartphones. 4) Lightweight models shine in low-resource environments and embedded systems but may need ensembling for accuracy boosts.
[1] A. Bevilacqua et al., \"Human Activity Recognition with Convolutional Neural Networks,\" arXiv preprint arXiv:1906.01935, 2019. [2] M. Zeng et al., \"Semi-Supervised Convolutional Neural Networks for Human Activity Recognition,\" arXiv preprint arXiv:1801.07827, 2018. [3] Y. Tang et al., \"Layer-wise training convolutional neural networks with smaller filters for human activity recognition using wearable sensors,\" arXiv preprint arXiv:2005.03948, 2020. [4] N. Rashid et al., \"AHAR: Adaptive CNN for Energy-efficient Human Activity Recognition in Low-power Edge Devices,\" arXiv preprint arXiv:2102.01875, 2021. [5] Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 172–186, Jan. 2021. [6] G. Papandreou, T. Zhu, N. Kanazawa, et al., “Towards Accurate Multi-Person Pose Estimation in the Wild,” in Proc. CVPR, 2017, pp. 3711–3719. [7] A. Toshev and C. Szegedy, “DeepPose: Human Pose Estimation via Deep Neural Networks,” in Proc. CVPR, 2014, pp. 1653–1660. [8] S. Li, Z. Liu, and S. Lian, “Convolutional Neural Network-Based Human Pose Estimation: A Review,” IEEE Access, vol. 6, pp. 59024–59036, 2018. [9] D. Mehta et al., “VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera,” ACM Trans. Graph., vol. 36, no. 4, 2017. [10] S. Moon, H. Chang, and Y. Choi, “PoseFix: Model-agnostic General Human Pose Refinement Network,” in Proc. CVPR, 2019, pp. 7773–7781. [11] T.-Y. Lin et al., “Focal Loss for Dense Object Detection,” in Proc. ICCV, 2017, pp. 2980–2988. [12] X. Zhang et al., “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,” in Proc. CVPR, 2018, pp. 6848–6856. [13] M. Zeng, L. T. Nguyen, B. Yu, O. J. Mengshoel, J. Zhu, and P. Wu, “Convolutional Neural Networks for Human Activity Recognition Using Mobile Sensors,” in Proc. MobiCASE, 2014, pp. 197–205. [14] L. Wang, Y. Qiao, and X. Tang, “Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors,” in Proc. CVPR, 2015, pp. 4305–4314. [15] Y. Chen and Y. Xue, “A Deep Learning Approach to Human Activity Recognition Based on Single Accelerometer,” in Proc. IEEE SMC, 2015, pp. 1488–1492. [16] H. Ordonez and D. Roggen, “Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition,” Sensors, vol. 16, no. 1, p. 115, 2016. [17] R. Okita and M. Shibata, “Real-time human activity recognition using convolutional neural network and wearable motion sensors,” in Proc. IEEE GCCE, 2018, pp. 117–120. [18] S. Ha and S. Choi, “Convolutional Neural Networks for Human Activity Recognition using Multiple Accelerometer and Gyroscope Sensors,” in Proc. IEEE IEMCON, 2016, pp. 1–4. [19] B. Hammerla, S. Halloran, and T. Plotz, “Deep, Convolutional, and Recurrent Models for Human Activity Recognition Using Wearables,” arXiv preprint arXiv:1604.08880, 2016. [20] Y. Chen, Y. Shen, and K. Zhang, “DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing,” in Proc. WWW, 2017, pp. 351–360. [21] F. Jiang and C. Yin, “HAR-Net: A Hierarchical Attention Representation Network for Human Activity Recognition,” IEEE Access, vol. 8, pp. 10876–10885, 2020. [22] A. Haque, A. Alahi, and L. Fei-Fei, “Recurrent Attention Models for Depth-Based Person Identification,” in Proc. CVPR, 2016, pp. 1229–1238. [23] A. G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv preprint arXiv:1704.04861, 2017. [24] F. Iandola et al., “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size,” arXiv preprint arXiv:1602.07360, 2016. [25] H. Jin, J. Yu, and L. Wang, “TinyPose: Real-time and Accurate Human Pose Estimation on Mobile Devices,” in Proc. ECCV, 2020. [26] N. Rashid et al., “AHAR: Adaptive CNN for Energy-efficient Human Activity Recognition in Low-power Edge Devices,” IEEE Internet Things J., vol. 9, no. 3, pp. 1902–1914, Feb. 2022. [27] S. Yi, C. Li, and Q. Zhang, “A Survey of Edge Computing-Based Human Activity Recognition,” IEEE Internet Things J., vol. 8, no. 12, pp. 9789–9800, Jun. 2021. [28] J. Yoon, S. Kim, and J. Kim, “Robust Human Activity Recognition against Sensor Displacement Using Convolutional Neural Networks,” Sensors, vol. 19, no. 20, p. 4560, 2019. [29] S. Wang, L. Zhou, and Z. Wang, “Robust Pose Estimation in Unconstrained Environments,” in Proc. ICPR, 2016, pp. 191–196. [30] A. Stisen et al., “Smart Devices are Different: Assessing and Mitigating Mobile Sensing Heterogeneities for Activity Recognition,” in Proc. ACM SenSys, 2015, pp. 127–140. [31] M. Zanella et al., “Multimodal Human Activity Recognition Using CNN-LSTM Networks,” in Proc. IEEE ISPA, 2020, pp. 1–7. [32] H. Guan and A. P. Liu, “Multi-modal Sensor Fusion for Human Activity Detection in Smart Homes,” Sensors, vol. 20, no. 16, p. 4466, 2020. [33] R. Ghosh et al., “Self-Supervised Learning for HAR with Wearables,” IEEE Sensors J., vol. 22, no. 3, pp. 2191–2200, Feb. 2022. [34] C. Xu et al., “Privacy-preserving Deep Learning for Sensor-based HAR,” IEEE Trans. Mob. Comput., vol. 21, no. 6, pp. 2113–2125, Jun. 2022. [35] Ronao, C. A., & Cho, S. B. (2016). Human activity recognition with smartphone sensors using deep learning neural networks. Expert Systems with Applications, 59, 235-244. doi:10.1016/j.eswa.2016.04.032
Copyright © 2025 Yashraj Mishra, Ankita Jaiswal, Anany Shukla, Abhishek Verma, Humesh Verma, Dr. Goldi Soni. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET70544
Publish Date : 2025-05-07
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here