The safety of construction workers is critical aspect of the industry, as a lack of compliance with protective measures often leads to serious accidents. This research presents a computer vision-based solution that employs deep learning to automatically recognize personal protective equipment (PPE), such as helmets, gloves, safety jackets, goggles, and protective footwear. The system is implemented using the YOLOv7 object detection framework, which has been trained on a carefully prepared custom dataset. Each image in the dataset was annotated with bounding boxes to indicate the position and category of safety gear. After multiple training cycles, the model demonstrated strong recognition ability across different PPE types. Evaluation metrics, including precision, recall, F1-score, and mean Average Precision (mAP@0.5), confirm the effectiveness of the approach, with the best performance achieving an mAP of 87.7%. These outcomes highlight the potential of the proposed system to support real-time monitoring of safety compliance on construction sites
Introduction
Construction sites are highly hazardous workplaces, with many injuries and fatalities caused by the absence or misuse of Personal Protective Equipment (PPE) such as helmets, goggles, jackets, gloves, and boots.
Recent advances in artificial intelligence (AI), particularly deep learning and computer vision, now allow automated monitoring of safety compliance in real-time. This study develops a YOLOv7-based detection system to identify whether workers are wearing full PPE accurately and efficiently, even in complex and noisy construction environments.
2. Literature Survey
Previous studies have applied various YOLO versions and other object detection models for PPE detection:
YOLOv3/YOLOv5/YOLOv7 variants showed good performance in detecting helmets and other PPE.
Chen et al. achieved mAP of 95.56% with Tiny YOLOv3.
Jye-Hwang Lo found YOLOv7 to outperform YOLOv3/4 with an F1-score of 95.31%.
Other models like Faster R-CNN and YOLOv5x were tested but had lower precision or speed.
These studies establish YOLOv7 as a strong candidate due to its real-time performance, high accuracy, and flexibility across domains.
3. Methodology
A. Objective
To build an automated system that detects if workers are wearing complete PPE on construction sites using YOLOv7.
B. Data Collection
~1,000 images collected from YouTube, Google Images, and construction sites.
Images manually annotated using Labeling Annotator tool.
Data split: 70% training, 15% validation, 15% testing.
C. Training Process
Models trained using Python 3.10, PyTorch, and Google Colab with Tesla V100 GPU.
YOLOv7 compared with YOLOv7-X, YOLOv5s, and YOLOv5m.
Evaluated using Precision, Recall, F1-score, and mAP.
D. YOLOv7 Architecture
Includes Backbone (Darknet-53), Neck, Head, and Input Module.
Processes images at 608×608 resolution.
Incorporates anchor boxes, focal loss, and feature pyramids for better detection, especially for small/obscured objects.
4. Results & Discussion
A. Performance Metrics
Model
Precision
Recall
F1-Score
mAP@0.5
YOLOv7
84.1%
87.1%
85.0%
87.7%
YOLOv7-X
87.3%
86.1%
86.7%
86.0%
YOLOv5s
79.2%
77.0%
78.0%
81.1%
YOLOv5m
72.5%
77.5%
74.9%
75.5%
YOLOv7 outperformed other models, offering the best balance of accuracy and speed.
The system could reliably detect various PPE items, even under real-world challenges like poor lighting, worker posture variation, and partial occlusion.
Conclusion
This project presented a YOLOv7-based model designed to detect essential safety equipment. The system achieved an average mAP@0.5 of 0.877, demonstrating its effectiveness in accurately identifying and classifying different types of protective gear. These results highlight the potential of computer vision in promoting safety compliance on construction sites. By providing a reliable and efficient method for monitoring personal protective equipment, the system can help reduce the risk of accidents, support safety enforcement, and contribute to safer working conditions in the construction industry.