Ijraset Journal For Research in Applied Science and Engineering Technology

Landmark-based Dataset Generation using Mediapipe

Authors: Dheeraj P, Ananya C, Afreena A, Raniya Rahoof E

DOI Link: https://doi.org/10.22214/ijraset.2025.68160

Abstract

This paper focuses on creating structured landmarks-based feature extraction using MediPpipe. MediaPipe is an open-source framework for building pipelines to perform computer vision inference over arbitrary sensory data such as video or audio. Hand and facial expression recognition play a significant role in various domains like Human-computer interaction, assistive technology and emotion analysis. Traditional datasets primarily rely on raw images, which pose challenges in terms of computational complexity and privacy concerns. This paper represents a alternative approach for dataset creation by extracting structured landmarks-based representation for hand gestures and facial expressions using MediaPipe

Introduction

Human communication includes significant non-verbal elements like gestures and facial expressions. Recognizing these cues accurately is essential for AI/ML applications such as sign language interpretation, virtual reality, and mental health monitoring. Traditional image-based datasets pose privacy and computational challenges. To address this, the paper presents a dataset generation method based on landmark extraction using MediaPipe.

Methodology:

Data Collection: Uses MediaPipe to extract 3D hand (21 points per hand) and face (468 points) landmarks.
Preprocessing: Normalization, error filtering, and confidence-based data cleaning ensure dataset quality.
Annotation: Each frame is labeled according to gesture or facial expression for supervised learning.
Storage: Extracted data is stored efficiently in CSV format, reducing data volume while preserving relevant information.

Applications:
These landmark-based datasets are ideal for:

Gesture recognition (e.g., sign language)
Facial expression analysis (e.g., emotion detection in healthcare)
Touchless control systems
Privacy-preserving machine learning
Multimodal interaction models combining hand and facial gestures

Conclusion

This paper introduces a structured, landmark-based dataset for hand gesture and facial expression recognition, giving a computationally efficient and privacy-focused alternative to traditional image-based datasets. By making use of the MediaPipe framework for three-dimensional landmark extraction, we enable the development of robust AI models for a wide range of applications like sign language detection, human-computer interaction and emotion detection. Future work includes expanding dataset diversity, incorporating motion dynamics, and benchmarking deep learning models on the dataset.

References

[1] Google AI, “MediaPipe Solutions Setup for Python”, Google AI Developer Guide, Retrieved 10:49, February 10, 2025, from https://ai.google.dev/edge/mediapipe/solutions/setup_python [2] Python Software Foundation, “csv- CSV File Reading and Writing,” Python 3 Documentation, Retrieved 10:49, January 12, 2025, fromhttps://docs.python.org/3/library/csv.html [3] A. Weyand, M. A. Babenko, S. Cao, L. Shen, A. Kolesnikov, R. Philbin, and T. Weyand, \"Google Landmarks Dataset v2 – A Large-Scale Benchmark for Instance-Level Recognition and Retrieval,\" Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), 2020, pp. 1-10 [4] OpenCV Contributors, \"OpenCV: OpenCV-Python Tutorials,\" OpenCV Documentation, Retrieved 10:49, January 15, 2025, from https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html

Copyright

Copyright © 2025 Dheeraj P, Ananya C, Afreena A, Raniya Rahoof E. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET68160

Publish Date : 2025-04-01

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here

Submit Paper Online