Worker’s Helmet Recognition and Identity Recognition Based on Deep Learning ()
1. Introduction
Safety at the workplace has become the focal point of many organizations owing to the consequences resulting from an unsafe environment on the productivity and health of the workforce [1]. In the construction industry, workers’ behavior is one of the major causes of workplace accidents and injuries [2]. About 80% - 90% of accidents are strongly related to the unsafe acts and behavior of workers [3] [4] [5]. Therefore, there is a critical demand for on-site safety supervision to enhance construction sites safety. Behavior-based safety (BBS) is an effective approach that can be used to observe and identify people’s unsafe actions [6]. Developments in technology, aided by computer vision have been identified as an effective approach to automatically recognize people’s unsafe behavior [7]. Recently, the field safety supervision can be divided into two methods based on computer vision and sensor. Among them, the vision-based techniques occupy a dominant position compared with the high-cost sensor-based solutions [8]. The vision-based approach is applied to activity detection and tracking of construction workers. Such as, detecting near-miss incident, unsafe worker motions and assigning specific tasks to workers [9]. Helmet detection and Human Identity Recognition is an important application of computer vision in construction site.
Actually, existing safety inspection practices are predominantly reliant on inspectors’ manual monitoring and reporting [8]. Manually monitoring construction operations could be time consuming, error-prone, costly, and not applicable for larger size job sites where several operations are simultaneously on-going [10] [11]. In order to facilitate the safety monitoring work of construction sites safety inspectors, a considerable amount of studies have been published for automatic helmet wearing detection [8] [12] - [23] and human identity recognition [7] [24] - [29]. Computer vision can be used to integrate helmet wearing and identify individuals, which is separative in existing research. In other words, we usually have no way of identifying individuals when we’re testing the helmet, and vice versa.
To solve the above problems, we propose a method based on computer vision to automatically identify workers’ helmet wearing and identity. First, our method integrates two applications: helmet wearing detection and identification. Secondly, in order to detect the applicability of the algorithm in the real construction site environment, we tested the accuracy and recall rate of the algorithm under different visual conditions according to various visual conditions on the construction site. The contributions of our research are two-fold: 1) being able to identify the individuals who are no-helmet-use with computer vision; and 2) being able to identify the individuals who commit unsafe acts with computer vision.
2. Deep Learning Based Object Detection
At present, the most advanced algorithms for target detection algorithms are the two-stage region-based algorithms R-CNN series [30] [31], they divide target detection into two phases, first, the network generates candidate regions and then detects and classifies these regions, common one-stage algorithms are Single Shot MultiBox Detector (SSD) [32] and You Only Look Once (YOLO) [33], which directly generate the class probability and position coordinate values of the object. The advantage of the one-stage algorithm is that the detection speed is fast and real-time monitoring can be performed, while the two-stage detection algorithm has higher detection accuracy. Based on the region target detection method, it includes various potential region generation parts and various feature layers, which makes the real-time performance of the algorithm not guaranteed. Although the accuracy of the one-stage algorithm is slightly reduced, experiments show that the existing one-stage algorithm recognizes the accuracy can meet the requirements of this study.
Deep learning is widely used in object detection. In the field of construction engineering, deep learning is mainly used in construction sites, including construction safety and personnel monitoring [34] [35] [36] [37], resource tracking and activity monitoring [38], measurement and modeling [39] [40], inspection and condition monitoring [41] [42] [43]. Applying computer vision to the detection of personal protective equipment can improve the intelligence level of the construction site and improve the detection efficiency. Compared with the detection of personal protective equipment using wireless RF technology and sensors, it does not require the active cooperation of construction workers, great cost savings.
3. Method
YOLO v3 is an excellent network structure that transforms the problem of object detection into a regression problem. For a given image, the bounding box of the target and its classification category are returned directly at multiple image locations. Therefore, in real-time monitoring, YOLO v3 performs well.
3.1. Data Processing
In this paper, YOLO v3 was trained on the Safety Helmet Wearing Dataset (SHWD) public dataset, and 3000 intercepted real-name channel images were taken as the test set to test the algorithm performance. SHWD contains 7581 images with 9044 people wearing helmets (positive) and 111,514 people not wearing helmets (negative). At the same time, the face images contained in the SHWD dataset are also helpful to improve the accuracy of face recognition. In the process of supervised learning training based on YOLO v3, it is necessary to label classified samples and data samples labeled with location boundaries. We used labeling to tag the entire face and helmet in the SHWD common dataset of the training set, and saved these tags as an XML file in Pascal VOC format for Python to read, as shown in Figure 1.
3.2. YOLO v3
YOLO v3 unifies the various parts of target detection into a single neural network. The working principle of YOLO v3 is to divide the input image into S × S grids, and each grid consists of (x, y, w, h) and confidence C (Object). The coordinates (x, y) represent the position of the center of the detection bounding box relative to the grid. (W, h) is the width and height of the bounding box. If the center of an object falls in a grid cell, the grid cell is responsible for detecting the
object. Each cell of the grid predicts the bounding box and the confidence of that box. Confidence reflects the accuracy of the bounding box containing the object [44]. The calculation method is as follows:
(3-1)
where Pr(Object) indicates whether the object is contained in the grid. If the grid contains objects, Pr(Object) = 1, if the grid contains no objects, Pr(Object) = 0. IOU (Intersection over union) indicates the accuracy of the bounding box containing the object, that is, the overlap rate of the detected candidate boundary and the ground truth value, that is, the ratio of their intersection to the union.
(3-2)
The final confidence level is calculated as follows:
(3-3)
After obtaining the confidence of each prediction box, a low-score prediction box is removed by setting a threshold, and then the remaining bounding boxes are non-maximally suppressed.
YOLO v3 uses the Darknet53 network as the backbone, as shown in Figure 2.
This network is superimposed by the residual unit, which is more conducive to model convergence. In addition, due to the addition of the residual unit, the number of network layers can be expanded, and network feature extraction can be improved. The introduction of the 1 × 1 convolution kernel in the residual module reduces the number of channels in the convolution operation. This step reduces the number of parameters in the network, thus making the entire network model weigh less, and reduces the calculation amount. Unlike the previous version, YOLOv3 is predicted from three scale feature maps, which greatly improves detection rate of small targets [46].
Figure 2. The structure of YOLO v3 [45].
3.3. Performance test
3.3.1. Precision and Recall
In the evaluation of the target recognition algorithm, two indexes of accuracy and recall rate are usually used to measure the accuracy of the algorithm. Accuracy is a commonly used index to evaluate the recognition ability of models. To clarify the definition of accuracy, this article first define the meanings of TP (true example), FP (false positive example) and FN (false negative example), Where, TP represents the number of construction workers without helmets after the algorithm is run, and FP represents the number of construction workers without helmets, but the result is not accurate. For example, if a worker is wearing a helmet, but the model recognizes that the worker is not wearing a helmet, or another object is assumed to be a worker who is not wearing a helmet. FN is the number of construction workers who were wrongly judged not to be wearing helmets. The target recognition accuracy (Precision, P) represents the proportion of the real sample TP to the total sample (TP + FP), which is used to measure the reliability of the recognition performance. The Recall (R) (Recall, R) represents the proportion of the real sample TP to the total positive sample (TP + FN). The two are commonly used evaluation indicators for target recognition. [21]. The specific calculation formula is as follows:
3.3.2. Robustness
The robustness means that the algorithm can still maintain high recognition accuracy under certain conditions. Construction sites are usually located in an outdoor environment in the open air. The change of weather and illumination will affect the effect of surveillance video, and the occlusion of face features will affect the extraction of facial features. In different cases, the accuracy and recall rate of the model can well reflect the robustness of the model.
3.3.3. Speed
The speed of the YOLO v3 is the time it takes to detect the helmet and face in the image. YOLO v3 can call the computer’s GPU for image processing, greatly improving the speed of image processing. The purpose here is to apply YOLO v3 to the monitoring of real-name channels, so it is necessary to judge whether YOLO v3 meets the real-time requirements.
4. Discussion
In this part, we analyze the causes of experimental errors and the knowledge contribution of this study.
4.1. Experimental Error Analysis
The method of identification of construction workers and detection of safety helmets proposed in this paper is based on computer vision technology. Here, this paper analyzes the causes of error, analysis of the existing problems in the method.
Firstly, the accuracy of face recognition is easily affected by image sharpness. If the face features in the image are not obvious, the computer vision-based technology cannot accurately analyze the facial features. With the current resolution, objects that can be recognized by the human eye can be detected in all kinds of situations during the day, and a higher resolution camera can solve this problem. Restricted by night lighting conditions, the performance based on computer vision is poor, and manual inspection is a good supplement.
Secondly, the accuracy of face recognition is easily affected by occlusion. If the worker’s helmet and face are covered, the recognition performance will be affected. Occlusion is a common problem in computer vision applications because the camera position and Angle are fixed. Workers walking in groups, shoulder-carrying tools, wearing raincoats and contaminated helmets (e.g., mud) can all mask workers’ helmets and faces. By improving the sampling level of live video and increasing the camera arrangement, the problem can be improved and the performance of recognition can be improved.
4.2. Knowledge Contribution
The main contribution of this research is to propose a method of detecting and verifying construction workers’ helmets based on computer vision technology. At present, the identification of construction workers mainly relies on sensors and computer vision. The sensor-based approach mainly detects the location and safety behavior of construction workers by placing sensors on workers or equipment. RFID tags are also commonly used for worker identification and PPE testing. Computer vision is often used to inspect workers’ protective gear. Such as Zhao et al. [46] through the feature extraction of safety helmets and color vests, the identification of safety officials on construction sites is realized. The above methods have some limitations, such as sensor loss, high investment cost, and the resistance of construction personnel to sensors, which limit the development of sensor-based methods. Computational vision technology can distinguish different construction workers, different construction behavior and pedestrian track tracking, but these methods cannot determine the identity of workers, so face recognition is a necessary means of identification of workers. This study presents a new method to verify and detect the identity of workers and safety helmets entering the construction site. The accuracy and recall rate of the algorithm was tested under different visual conditions, which met the requirements of real name system and PPE. These rules have practical application value in engineering. This method can prevent accidents from happening at the source and improve the safety performance of construction sites [47].
5. Conclusions
Academics have been working to reduce the accident rate in the construction industry for years, but it is still one of the most dangerous industries. There are many dangerous areas on the construction site, and the construction environment is noisy and complex. Construction workers with professional safety training are not yet able to guarantee their own lives. Therefore, non-professional construction personnel is more prone to safety accidents when they enter the construction site, so identification is one of the necessary measures to ensure the safety of the site construction. Helmets can protect workers’ heads from penetrating or direct impact, but helmets have not yet achieved 100% protection against head injuries in field accidents. The main reason for this phenomenon is the relaxation of site staff; often enter the construction site without wearing safety.
The limitation of this paper is a theoretical one. A research framework of construction worker’s helmet and identity recognition based on computer vision is proposed. This method provides a new method for on-site real-time monitoring and improving the safety management of construction workers.
In future research, this framework will be applied to the actual project to realize the identity identification and helmet detection of construction personnel on the construction site and integrate this information into the real-time safety management system to improve the safety management level on the site.