Worker’s Helmet Recognition and Identity Recognition Based on Deep Learning


For decades, safety has been a concern for the construction industry. Helmet detection caught the attention of machine learning, but the problem of identity recognition has been ignored in previous studies, which brings trouble to the subsequent safety education of workers. Although, many scholars have devoted themselves to the study of person re-identification which neglected safety detection. The study of this paper mainly proposes a method based on deep learning, which is different from the previous study of helmet detection and human identity recognition and can carry out helmet detection and identity recognition for construction workers. This paper proposes a computer vision-based worker identity recognition and helmet recognition method. We collected 3000 real-name channel images and constructed a neural network based on the You Only Look Once (YOLO) v3 model to extract the features of the construction worker’s face and helmet, respectively. Experiments show that the method has a high recognition accuracy rate, fast recognition speed, accurate recognition of workers and helmet detection, and solves the problem of poor supervision of real-name channels.

Share and Cite:

Wang, J. , Zhu, G. , Wu, S. and Luo, C. (2021) Worker’s Helmet Recognition and Identity Recognition Based on Deep Learning. Open Journal of Modelling and Simulation, 9, 135-145. doi: 10.4236/ojmsi.2021.92009.

1. Introduction

Safety at the workplace has become the focal point of many organizations owing to the consequences resulting from an unsafe environment on the productivity and health of the workforce [1]. In the construction industry, workers’ behavior is one of the major causes of workplace accidents and injuries [2]. About 80% - 90% of accidents are strongly related to the unsafe acts and behavior of workers [3] [4] [5]. Therefore, there is a critical demand for on-site safety supervision to enhance construction sites safety. Behavior-based safety (BBS) is an effective approach that can be used to observe and identify people’s unsafe actions [6]. Developments in technology, aided by computer vision have been identified as an effective approach to automatically recognize people’s unsafe behavior [7]. Recently, the field safety supervision can be divided into two methods based on computer vision and sensor. Among them, the vision-based techniques occupy a dominant position compared with the high-cost sensor-based solutions [8]. The vision-based approach is applied to activity detection and tracking of construction workers. Such as, detecting near-miss incident, unsafe worker motions and assigning specific tasks to workers [9]. Helmet detection and Human Identity Recognition is an important application of computer vision in construction site.

Actually, existing safety inspection practices are predominantly reliant on inspectors’ manual monitoring and reporting [8]. Manually monitoring construction operations could be time consuming, error-prone, costly, and not applicable for larger size job sites where several operations are simultaneously on-going [10] [11]. In order to facilitate the safety monitoring work of construction sites safety inspectors, a considerable amount of studies have been published for automatic helmet wearing detection [8] [12] - [23] and human identity recognition [7] [24] - [29]. Computer vision can be used to integrate helmet wearing and identify individuals, which is separative in existing research. In other words, we usually have no way of identifying individuals when we’re testing the helmet, and vice versa.

To solve the above problems, we propose a method based on computer vision to automatically identify workers’ helmet wearing and identity. First, our method integrates two applications: helmet wearing detection and identification. Secondly, in order to detect the applicability of the algorithm in the real construction site environment, we tested the accuracy and recall rate of the algorithm under different visual conditions according to various visual conditions on the construction site. The contributions of our research are two-fold: 1) being able to identify the individuals who are no-helmet-use with computer vision; and 2) being able to identify the individuals who commit unsafe acts with computer vision.

2. Deep Learning Based Object Detection

At present, the most advanced algorithms for target detection algorithms are the two-stage region-based algorithms R-CNN series [30] [31], they divide target detection into two phases, first, the network generates candidate regions and then detects and classifies these regions, common one-stage algorithms are Single Shot MultiBox Detector (SSD) [32] and You Only Look Once (YOLO) [33], which directly generate the class probability and position coordinate values of the object. The advantage of the one-stage algorithm is that the detection speed is fast and real-time monitoring can be performed, while the two-stage detection algorithm has higher detection accuracy. Based on the region target detection method, it includes various potential region generation parts and various feature layers, which makes the real-time performance of the algorithm not guaranteed. Although the accuracy of the one-stage algorithm is slightly reduced, experiments show that the existing one-stage algorithm recognizes the accuracy can meet the requirements of this study.

Deep learning is widely used in object detection. In the field of construction engineering, deep learning is mainly used in construction sites, including construction safety and personnel monitoring [34] [35] [36] [37], resource tracking and activity monitoring [38], measurement and modeling [39] [40], inspection and condition monitoring [41] [42] [43]. Applying computer vision to the detection of personal protective equipment can improve the intelligence level of the construction site and improve the detection efficiency. Compared with the detection of personal protective equipment using wireless RF technology and sensors, it does not require the active cooperation of construction workers, great cost savings.

3. Method

YOLO v3 is an excellent network structure that transforms the problem of object detection into a regression problem. For a given image, the bounding box of the target and its classification category are returned directly at multiple image locations. Therefore, in real-time monitoring, YOLO v3 performs well.

3.1. Data Processing

In this paper, YOLO v3 was trained on the Safety Helmet Wearing Dataset (SHWD) public dataset, and 3000 intercepted real-name channel images were taken as the test set to test the algorithm performance. SHWD contains 7581 images with 9044 people wearing helmets (positive) and 111,514 people not wearing helmets (negative). At the same time, the face images contained in the SHWD dataset are also helpful to improve the accuracy of face recognition. In the process of supervised learning training based on YOLO v3, it is necessary to label classified samples and data samples labeled with location boundaries. We used labeling to tag the entire face and helmet in the SHWD common dataset of the training set, and saved these tags as an XML file in Pascal VOC format for Python to read, as shown in Figure 1.

3.2. YOLO v3

YOLO v3 unifies the various parts of target detection into a single neural network. The working principle of YOLO v3 is to divide the input image into S × S grids, and each grid consists of (x, y, w, h) and confidence C (Object). The coordinates (x, y) represent the position of the center of the detection bounding box relative to the grid. (W, h) is the width and height of the bounding box. If the center of an object falls in a grid cell, the grid cell is responsible for detecting the

Figure 1. Framework of proposed method.

object. Each cell of the grid predicts the bounding box and the confidence of that box. Confidence reflects the accuracy of the bounding box containing the object [44]. The calculation method is as follows:

C ( Object ) = Pr ( Object ) IOU ( Pred,Truth ) (3-1)

where Pr(Object) indicates whether the object is contained in the grid. If the grid contains objects, Pr(Object) = 1, if the grid contains no objects, Pr(Object) = 0. IOU (Intersection over union) indicates the accuracy of the bounding box containing the object, that is, the overlap rate of the detected candidate boundary and the ground truth value, that is, the ratio of their intersection to the union.

IOU ( Pred,Truth ) = area ( box max ) area ( box max ) area ( box max ) area ( box max ) (3-2)

The final confidence level is calculated as follows:

c = Pr ( class i | object ) × Pr ( object ) × IoU pred truth = Pr ( class i ) × IoU pred truth (3-3)

After obtaining the confidence of each prediction box, a low-score prediction box is removed by setting a threshold, and then the remaining bounding boxes are non-maximally suppressed.

YOLO v3 uses the Darknet53 network as the backbone, as shown in Figure 2.

This network is superimposed by the residual unit, which is more conducive to model convergence. In addition, due to the addition of the residual unit, the number of network layers can be expanded, and network feature extraction can be improved. The introduction of the 1 × 1 convolution kernel in the residual module reduces the number of channels in the convolution operation. This step reduces the number of parameters in the network, thus making the entire network model weigh less, and reduces the calculation amount. Unlike the previous version, YOLOv3 is predicted from three scale feature maps, which greatly improves detection rate of small targets [46].

Figure 2. The structure of YOLO v3 [45].

3.3. Performance test

3.3.1. Precision and Recall

In the evaluation of the target recognition algorithm, two indexes of accuracy and recall rate are usually used to measure the accuracy of the algorithm. Accuracy is a commonly used index to evaluate the recognition ability of models. To clarify the definition of accuracy, this article first define the meanings of TP (true example), FP (false positive example) and FN (false negative example), Where, TP represents the number of construction workers without helmets after the algorithm is run, and FP represents the number of construction workers without helmets, but the result is not accurate. For example, if a worker is wearing a helmet, but the model recognizes that the worker is not wearing a helmet, or another object is assumed to be a worker who is not wearing a helmet. FN is the number of construction workers who were wrongly judged not to be wearing helmets. The target recognition accuracy (Precision, P) represents the proportion of the real sample TP to the total sample (TP + FP), which is used to measure the reliability of the recognition performance. The Recall (R) (Recall, R) represents the proportion of the real sample TP to the total positive sample (TP + FN). The two are commonly used evaluation indicators for target recognition. [21]. The specific calculation formula is as follows:

Precision = TP TP + FP

Recall = TP TP + FN

Missrate = 1 Recall = FN TP + FN

3.3.2. Robustness

The robustness means that the algorithm can still maintain high recognition accuracy under certain conditions. Construction sites are usually located in an outdoor environment in the open air. The change of weather and illumination will affect the effect of surveillance video, and the occlusion of face features will affect the extraction of facial features. In different cases, the accuracy and recall rate of the model can well reflect the robustness of the model.

3.3.3. Speed

The speed of the YOLO v3 is the time it takes to detect the helmet and face in the image. YOLO v3 can call the computer’s GPU for image processing, greatly improving the speed of image processing. The purpose here is to apply YOLO v3 to the monitoring of real-name channels, so it is necessary to judge whether YOLO v3 meets the real-time requirements.

4. Discussion

In this part, we analyze the causes of experimental errors and the knowledge contribution of this study.

4.1. Experimental Error Analysis

The method of identification of construction workers and detection of safety helmets proposed in this paper is based on computer vision technology. Here, this paper analyzes the causes of error, analysis of the existing problems in the method.

Firstly, the accuracy of face recognition is easily affected by image sharpness. If the face features in the image are not obvious, the computer vision-based technology cannot accurately analyze the facial features. With the current resolution, objects that can be recognized by the human eye can be detected in all kinds of situations during the day, and a higher resolution camera can solve this problem. Restricted by night lighting conditions, the performance based on computer vision is poor, and manual inspection is a good supplement.

Secondly, the accuracy of face recognition is easily affected by occlusion. If the worker’s helmet and face are covered, the recognition performance will be affected. Occlusion is a common problem in computer vision applications because the camera position and Angle are fixed. Workers walking in groups, shoulder-carrying tools, wearing raincoats and contaminated helmets (e.g., mud) can all mask workers’ helmets and faces. By improving the sampling level of live video and increasing the camera arrangement, the problem can be improved and the performance of recognition can be improved.

4.2. Knowledge Contribution

The main contribution of this research is to propose a method of detecting and verifying construction workers’ helmets based on computer vision technology. At present, the identification of construction workers mainly relies on sensors and computer vision. The sensor-based approach mainly detects the location and safety behavior of construction workers by placing sensors on workers or equipment. RFID tags are also commonly used for worker identification and PPE testing. Computer vision is often used to inspect workers’ protective gear. Such as Zhao et al. [46] through the feature extraction of safety helmets and color vests, the identification of safety officials on construction sites is realized. The above methods have some limitations, such as sensor loss, high investment cost, and the resistance of construction personnel to sensors, which limit the development of sensor-based methods. Computational vision technology can distinguish different construction workers, different construction behavior and pedestrian track tracking, but these methods cannot determine the identity of workers, so face recognition is a necessary means of identification of workers. This study presents a new method to verify and detect the identity of workers and safety helmets entering the construction site. The accuracy and recall rate of the algorithm was tested under different visual conditions, which met the requirements of real name system and PPE. These rules have practical application value in engineering. This method can prevent accidents from happening at the source and improve the safety performance of construction sites [47].

5. Conclusions

Academics have been working to reduce the accident rate in the construction industry for years, but it is still one of the most dangerous industries. There are many dangerous areas on the construction site, and the construction environment is noisy and complex. Construction workers with professional safety training are not yet able to guarantee their own lives. Therefore, non-professional construction personnel is more prone to safety accidents when they enter the construction site, so identification is one of the necessary measures to ensure the safety of the site construction. Helmets can protect workers’ heads from penetrating or direct impact, but helmets have not yet achieved 100% protection against head injuries in field accidents. The main reason for this phenomenon is the relaxation of site staff; often enter the construction site without wearing safety.

The limitation of this paper is a theoretical one. A research framework of construction worker’s helmet and identity recognition based on computer vision is proposed. This method provides a new method for on-site real-time monitoring and improving the safety management of construction workers.

In future research, this framework will be applied to the actual project to realize the identity identification and helmet detection of construction personnel on the construction site and integrate this information into the real-time safety management system to improve the safety management level on the site.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.


[1] Singh, A. and Misra, S.C. (2021) Safety Performance & Evaluation Framework in Indian Construction Industry. Safety Science, 134, 105023.
[2] Han, S.U. and Lee, S.H. (2013) A Vision-Based Motion Capture and Recognition Framework for Behavior-Based Safety Management. Automation in Construction, 35, 131-141.
[3] Heinrich, H.W. and Stone, R.W. (1931) Industrial Accident Prevention. Social Service Review.
[4] Salminen, S. and Tallberg, T. (1996) Human Errors in Fatal and Serious Occupational Accidents in Finland. Ergonomics, 39, 980-988.
[5] Lingard, H. and Rowlinson, S. (2005) Occupational Health and Safety in Construction Project Management. Ringgold, Inc., Portland, USA.
[6] Wirth, O. and Sigurdsson, S.O. (2008) When Workplace Safety Depends on Behavior Change: Topics for Behavioral Safety Research. Journal of Safety Research, 39, 589-598.
[7] Wei, R., Love, P.E.D., Fang, W., et al. (2019) Recognizing People’s Identity in Construction Sites with Computer Vision: A Spatial and Temporal Attention Pooling Network. Advanced Engineering Informatics, 42, 100981.
[8] Wu, J., Cai, N., Chen, W., et al. (2019) Automatic Detection of Hardhats Worn by Construction Personnel: A Deep Learning Approach and Benchmark Dataset. Automation in Construction, 106, 102894.
[9] Sherafat, B., Ahn, C.R., Akhavian, R., et al. (2020) Automated Methods for Activity Recognition of Construction Workers and Equipment: State-of-the-Art Review. Journal of Construction Engineering and Management, 146, 3120002.
[10] Akhavian, R. and Behzadan, A.H. (2015) Construction Equipment Activity Recognition for Simulation Input Modeling Using Mobile Sensors and Machine Learning Classifiers. Advanced Engineering Informatics, 29, 867-877.
[11] Akhavian, R. and Behzadan, A.H. (2016) Smartphone-Based Construction Workers’ Activity Recognition and Classification. Automation in Construction, 71, 198-209.
[12] Rubaiyat, A.H.M., Toma, T.T., Kalantari-Khandani, M., et al. (2016) Automatic Detection of Helmet Uses for Construction Safety. 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW), Omaha, 13-16 October 2016, 135-142.
[13] Shrestha, K., Shrestha, P.P., Bajracharya, D. and Yfantis, E.A. (2015) Hard-Hat Detection for Construction Safety Visualization. Journal of Construction Engineering, 2015, Article ID: 721380.
[14] Li, K., Zhao, X., Bian, J. and Tan, M. (2017) Automatic Safety Helmet Wearing Detection. 2017 IEEE 7th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Honolulu, 31 July-4 August 2017, 617-622.
[15] Zhang, H., Yan, X., Li, H. and Jin, R. (2019) Real-Time Alarming, Monitoring, and Locating for Non-Hard-Hat Use in Construction. Journal of Construction Engineering and Management, 145, 4019006.
[16] Mneymneh, B.E., Abbas, M. and Khoury, H. (2019) Vision-Based Framework for Intelligent Monitoring of Hardhat Wearing on Construction Sites. Journal of Computing in Civil Engineering, 33, 4018066.
[17] Zhu, Z., Park, M.-W. and Elsafty, N. (2015) Automated Monitoring of Hardhats Wearing for Onsite Safety Enhancement. 5th International/11th Construction Specialty Conference, Vancouver, 8-10 June 2015, 138.
[18] Du, S., Shehata, M. and Badawy, W. (2011) Hard Hat Detection in Video Sequences Based on Face Features, Motion and Color Information. 2011 3rd International Conference on Computer Research and Development, Shanghai, 11-13 March 2011, 25-29.
[19] Park, M.-W., Elsafty, N. and Zhu, Z. (2015) Hardhat-Wearing Detection for Enhancing On-Site Safety of Construction Workers. Journal of Construction Engineering and Management, 141, 4015024.
[20] Mneymneh, B.E., Abbas, M. and Khoury, H. (2018) Evaluation of Computer Vision Techniques for Automated Hardhat Detection in Indoor Construction Safety Applications. Frontiers of Engineering Management, 5, 227-239.
[21] Fang, Q., Li, H., Luo, X., et al. (2018) Detecting Non-Hardhat-Use by a Deep Learning Method from Far-Field Surveillance Videos. Automation in Construction, 85, 1-9.
[22] Li, J., Liu, T., Wang, T., et al. (2017) Safety Helmet Wearing Detection Based on Image Processing and Machine Learning. 2017 9th International Conference on Advanced Computational Intelligence (ICACI), Doha, 4-6 February 2017, 201-205.
[23] Wu, H. and Zhao, J. (2018) An Intelligent Vision-Based Approach for Helmet Identification for Work Safety. Computers in Industry, 100, 267-277.
[24] Ma, X., Zhu, X., Gong, S., et al. (2017) Person Re-Identification by Unsupervised Video Matching. Pattern Recognition, 65, 197-210.
[25] McLaughlin, N., del Rincon, J.M. and Miller, P. (2016) Recurrent Convolutional Network for Video-Based Person Re-Identification. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 1325-1334.
[26] McLaughlin, N., del Rincon, J.M. and Miller, P.C. (2017) Person Reidentification Using Deep Convnets with Multitask Learning. IEEE Transactions on Circuits and Systems for Video Technology, 27, 525-539.
[27] Wang, T., Gong, S., Zhu, X. and Wang, S. (2014) Person Re-Identification by Video Ranking. In: Fleet, D., Pajdla, T., Schiele, B. and Tuytelaars, T., Eds., European Conference on Computer Vision, Springer, Cham, 688-703.
[28] You, J., Wu, A., Li, X. and Zheng, W.S. (2016) Top-Push Video-Based Person Re-Identification. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 1345-1353.
[29] Zheng, Z., Zheng, L. and Yang, Y. (2019) Pedestrian Alignment Network for Large-Scale Person Re-Identification. IEEE Transactions on Circuits and Systems for Video Technology, 29, 3037-3045.
[30] Girshick, R. (2015) Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 1440-1448.
[31] Ren, S., He, K., Girshick, R. and Sun, J. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149.
[32] Liu, W., Anguelov, D., Erhan, D., et al. (2016) SSD: Single Shot MultiBox Detector. In: Computer Vision—ECCV 2016, Springer International Publishing, Cham, 21-37.
[33] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788.
[34] Brilakis, I., Park, M.-W. and Jog, G. (2011) Automated Vision Tracking of Project Related Entities. Advanced Engineering Informatics, 25, 713-724.
[35] Luo, X., Li, H., Cao, D., et al. (2018) Towards Efficient and Objective Work Sampling: Recognizing Workers’ Activities in Site Surveillance Videos with Two-Stream Convolutional Networks. Automation in Construction, 94, 360-370.
[36] Ding, L., Fang, W., Luo, H., et al. (2018) A Deep Hybrid Learning Model to Detect Unsafe Behavior: Integrating Convolution Neural Networks and Long Short-Term Memory. Automation in Construction, 86, 118-124.
[37] Son, H., Seong, H., Choi, H. and Kim, C. (2019) Real-Time Vision-Based Warning System for Prevention of Collisions between Workers and Heavy Equipment. Journal of Computing in Civil Engineering, 33, 04019029.
[38] Memarzadeh, M., Golparvar-Fard, M. and Niebles, J.C. (2013) Automated 2D Detection of Construction Equipment and Workers from Site Video Streams Using Histograms of Oriented Gradients and Colors. Automation in Construction, 32, 24-37.
[39] Morgenthal, G., Hallermann, N., Kersten, J., et al. (2019) Framework for Automated UAS-Based Structural Condition Assessment of Bridges. Automation in Construction, 97, 77-95.
[40] Kang, S., Park, M.-W. and Suh, W. (2019) Feasibility Study of the Unmanned-Aerial-Vehicle Radio-Frequency Identification System for Localizing Construction Materials on Large-Scale Open Sites. Sensors and Materials, 31, 1449-1465.
[41] Abdel-Qader, I., Abudayyeh, O. and Kelly, M.E. (2003) Analysis of Edge-Detection Techniques for Crack Identification in Bridges. Journal of Computing in Civil Engineering, 17, 255-263.
[42] Jafari, B., Khaloo, A. and Lattanzi, D. (2017) Deformation Tracking in 3D Point Clouds via Statistical Sampling of Direct Cloud-to-Cloud Distances. Journal of Nondestructive Evaluation, 36, Article No. 65.
[43] Chen, J., Fang, Y. and Cho, Y.K. (2018) Performance Evaluation of 3D Descriptors for Object Recognition in Construction Applications. Automation in Construction, 86, 44-52.
[44] Wang, Y. and Zheng, J. (2018) Real-Time Face Detection Based on YOLO. 1st IEEE International Conference on Knowledge Innovation and Invention (ICKII), Jeju, 23-27 July 2018, 221-224.
[45] Redmon, J. and Farhadi, A. (2018) YOLOv3: An Incremental Improvement. ArXiv. abs/1804.02767.
[46] Zhao, Y., Chen, Q., Cao, W., et al. (2019) Deep Learning for Risk Detection and Trajectory Tracking at Construction Sites. IEEE Access, 7, 30905-30912.
[47] Hinze, J., Thurman, S. and Wehle, A. (2013) Leading Indicators of Construction Safety Performance. Safety Science, 51, 23-28.

Copyright © 2021 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.