Geochemical and Geostatistical Studies for Estimating Gold Grade in Tarq Prospect Area by K-Means Clustering Method ()
1. Introduction
In recent years, due to the high dependence of mineral projects on the precise determination of the tonnage of mineral materials, the various methods have been developed to estimate the grade such as geometric methods based on distance and geostatistics [1] [2].
Each method has limitations and disadvantages which affect the accuracy of the estimation [3]. One of the new methods is the grade estimation using the clustering. The cluster analysis methods are widely used in the earth sciences. The cluster grouping method is used to classify geochemical data [4].
The cluster analysis connects the observations to each other which together have many similarities, then, the observations consecutively connect them which are mostly similar to previous observations [5]. In other words, in clustering, we try to divide the data into clusters that the similarity is maximized between the data within each cluster and it is minimized between the data within the different clusters [6]. No classes already exist in the clustering method and in fact, the variables are not divided independently and dependently, but here, the search is performed to access groups of data which are similar to each other and the behaviors can be better identified by discovering these similarities and it can be operated to achieve a better result based on them [7]. The clustering method is an indirect method; this means that it can be used even when there is no previous information from the internal structure of the database. This method can be used to discover hidden patterns and improve the performance of direct methods [8].
The K-means method is one of the methods for clustering the data in data mining. It is an exclusive and planar method which has been widely studied by different researchers and attempts to cluster the samples with the specified number of k classes so that the total Euclidean intervals of each sample are minimized from the center of the class which has been assigned to it [9].
The K-means methods are used to properly analyze the data behavior and available analyses to each other. Some of its applications include: the division of the geological terrain [10] , the classification of the effect of vegetation and the recovery of water health in the Mediterranean coast forests [11] , the presentation of geochemical patterns in mineral areas [12] , predicting the organic carbon in the intelligent systems [13] , and determining the effect of gas diffusion in urban environments [14].
In this article, the behavior of the gold, arsenic, and antimony elements has been evaluated using K-means method, MATLAB and SPSS software based on the data collected from the drainage sediments and then, the gold grade is estimated using the results.
2. The Study Area
The study area is the geochemical sheet 1:100.000 of stream sediments called Tarq which is provided by the Geological Survey of Iran with code 6356 and it is located in the geographic latitudes are 24˚51' to 24˚52', and the latitudes are 55˚32' to 33˚33' [15].
This area is located in Isfahan province as well as the area divides into two distinct eastern and western parts by the Kohroud Mountain range with the west bank and Karkas Mountain with a height of 3895 m. Northeast low altitude plain has an average height of 1000 m which is outside the studied area and the southwest plain called Robat Sultan has nearly 1650 m altitude and there will be farming more or less in it [16].
Kahroud and Garrison Mountains divided the drainages of an area into two distinct networks from each other. Eastern network flows to the sabulous and southwest drainages join the Zayandehrood River in the east of Isfahan. Tame rivers have a constant flow on the eastern hillside of Karkas Mountain and most of the springs are on the hillsides of this mountain (Figure 1). Some springs have Calcium bicarbonate to a great extent which they create the sediment of travertine stone in the area. The annual rainfall was reported at 51.8 mm [17].
General Geology of the Area
The stratigraphy of the zone consists of Precambrian to Quaternary rocks, which
Figure 1. Location of samples and stream in the area [15].
are described from the old to the new, respectively. The upper Precambrian and the former Cambrian units are yellow-tufted dolomites which in the shale and lime layers are observed. This unit has a thickness of about 360 m [17] [18].
The sedimentary rocks are more clastic and the lower part of them is related to Cambrian and Ordovician which are located in the sequence of red sandstone layers. Generally, this unit consists of yellow silicified dolomite rocks, red shale, bright yellow or gray Trilobite Limestone, Red and green sandy shale [18] [19].
The doleritic rocks are formed in the Silurian and Devonian unit and red ce-ment clay and hematite sandstones are located in these dolerites. In the following, the alternation of yellow sandstone layers and dark dolomite are created with interlayers of red shale. In the upper part, the layers of limestone and dolomite have been made with a thickness of 140 which contain brachiopod, trilobite, coral, and tentaculite fossils [18] [19]. This unit is bright gray which turns into black brachiopod limestone in the upper part and it has an interlayer of oolite and pizzolite (Figure 2).
3. The Study Method
637 samples of drainage sediments have been collected from the area and have been analyzed using ICP-MS method. In this paper, in order to estimate the gold grade and to study the behavior of arsenic, antimony, and gold elements, only these three elements have been investigated due to being paragenesis as well as the correlation coefficient. As can be seen from Table 1, the values of the correlation coefficient of these three elements are good and indicate the relationship with these three elements with each other. Also, given that most geochemical data are either closed or compound, if the proper conversion is not performed for data expansion, the wrong results are obtained. Therefore, data must be separated before doing any processing on the data. In this study, ilr method [20] was used to separate the data.
The K-Means Algorithm
The K-means algorithm starts with a given value for K (number of classes) and tries to estimate the following cases.
Finding the points as centers of clusters, in fact, these points are the same average points of each cluster.
Assigning each sample data to a cluster that data has the smallest distance to the center of that cluster [9]. In the simple form of this method, first, the points are selected randomly as much as needed clusters. Then, the data is assigned to one of these clusters according to the similarity and so, new clusters are obtained [21].
New centers can be calculated for them in each of iterations by repeating the same steps and averaging of data and again the data can be attributed to new clusters [22]. The important steps of this algorithm are summarized as follows [23] [24].
Figure 2. Tarq geology sheet 1:100,000, Isfahan [19].
Table 1. Spearman correlation coefficients matrix for gold, arsenic, and antimony elements.
1) First, k members randomly are selected as the number of clusters among the n members (k is the number of clusters)
2) Zj vector is calculated based on Equation (1) which represents the center of each class Cj.
(1)
3) In this equation, x represents the vector of a sample which is a member of Cj and #Cj represents the number of samples which are members of the Cj class. It should be noted that relation (1) is used to calculate the center of each class during solving and usually, k samples are randomly selected at the start of the algorithm and are considered as the center of each class [24].
4) Calculate the target function of the classification
based on Equation (5) which calculates the total distance of samples from the center of the classes.
(2)
5) Minimize the objective function of Equation (2) and find the proper classification on the M set with the number k of classes.
And software has been introduced by the author to speed up the operation above [25].
4. Results and Discussion
4.1. Results and Discussion on the Results
In various studies, such as the relationship of altered diorite with magnetite mineral in the iron belt of Chile [26] , the relationship between copper and molybdenum of porphyry copper ore [27] , and the relationship between elements of the platinum group of porphyry copper ore [28] , the behavior of elements has been measured relative to each other in various methods. In the current study, k optimum value has been calculated using the K-means method for clustering the drainage sediment data in the Tarq area with three grade value of elements of gold, arsenic, and antimony (taking into account the coordinates of the sampling points), because the elements of arsenic and antimony are important elements in determining the geochemical halos of the gold element.
In this study, two appropriate criteria have been used to determine the appropriate value of k to determine the number of clusters. The first used benchmark is the S(i) that the number of clusters is changed from 3 to 10 based on it and then, the obtained results are analyzed to select optimal k using the above benchmark [29].
An appropriate benchmark has been determined according to Equation (3) for determining optimum k. The obtained classifications are measured based on the benchmark.
In the above equation, S(i) expresses the utility rate of the ith sample in its class, the parameter Aveg_Within(i) represents the average distance between the ith sample and the other samples in that class which ith sample is a member of that class and the parameter Aveg_Between(i,k) represents the average distance between the ith sample and the other samples which are members of another class such as k [24] [29].
The results are analyzed by calculating the utility rate as an average utility. The utility rate varies between −1 and +1; as this value approaches to +1, the sample is a member of more appropriate classification and as it approaches −1, it has an inappropriate classification and the zero number means that the presence of the sample is not very important in the current classification or another classification. So, the value of Equation (6) is calculated for each sample and then the obtained results are analyzed by calculating the average numbers as the average utility rate of the classification.
The second used benchmark is the quality function. According to the information, the best cluster maximizes the total similarity between the cluster center and all cluster members and minimizes the total similarity between cluster centers. First, a range is determined for the number of clusters to select the best cluster which is between 3 and 10 in this research. Then p(k) is calculated for each value k.
The k value is selected as the optimal number of clusters which maximizes p(k). In this way, the number of clusters can be selected for which the distance is maximized between cluster centers and the similarity of cluster centers with the members within each cluster. The quality of clustering results is defined with k clusters as follow [30] :
(3)
(4)
(5)
(6)
(7)
(8)
In these equations, O is set of cluster centers; Cn is centers of clusters; On is the set of elements which has not been selected as cluster centers; Tc is the set of all elements which is clustered; ηn is the average similarity the between the center of the cluster Cn and all the cluster elements of On, ηm is the average similarity the between the center of the cluster Cm and all the elements of the cluster Om and finally, δnm is defined as the similarity of Cn and On [30].
4.2. The Monitoring of Gold, Arsenic, and Antimony Elements Relative to Each Other
In order to study the behavior of the elements relative to each other, the cluster profile and the utility rate of each sample were determined in pair for classifications k = 3 and k = 10 for the elements of Gold, Arsenic, and Antimony, and the results of the utility rate of the classes have been compared together, the best class is determined according to the utility rate of the classes, and then the centers of the clusters of each class are determined according to it.
As shown in Figure 3, class 4 is selected as the best class according to the class profile diagrams and the utility rates of the best class for the two gold and arsenic elements, because as much as the utility rate is close to 1, the samples are more correctly located in the class. According to the diagram, the little negative values are also found in this classification.
The utility rate average in this classification is equal to 0.7574 which is greater
Figure 3. The profile of clusters and utility rates with 4 to 6 classes related to two elements of Gold and Arsenic.
than the average utility rate of other classes and based on Figure 4 and Figure 5 for other elements, the best classification is selected regarding the clusters profile and average utility rate. The class 9 and class 3 are selected as the best classification for Gold and Antimony with the average utility rate 0.7136 and for arsenic and antimony with the average utility rate 0.5872, respectively.
The value of k has been increased to 20 due to the changes in the utility rates and to ensure the results, but the utility rate has not exceeded the rate of the best classification of each class, and it has decreased for more than 20.
The diagram of Validation value S(i) can be shown in Figure 6 based on changing the number of clusters to select the optimal cluster number which is easier to compare. In other words, a cluster is selected as the optimal number of clusters which has the highest value of S(i).
Figure 6(a) shows the value of S(i) for the two elements of gold and arsenic and Figure 6(b) shows it for two elements of gold and antimony which has the highest value in the cluster 9. Figure 6(c) shows the value of S(i) for two elements of Arsenic and Antimony which has the highest value in the cluster 3.
Also, the proper number of clusters is determined according to the quality function and using the value of p(k). The value of p(k) has been calculated using
Figure 4. The profile of clusters and utility rates with 7 to 10 classes related to two elements of Gold and Antimony.
Figure 5. The profile of clusters and utility rates with 3 to 6 classes related to two elements of Antimony and Arsenic.
Figure 6. The validation value S(i) based on the number of clusters ((a) Gold-Arsenic, (b) Gold and Antimony, (c) Arsenic and Antimony).
Equation (6) for different k values to determine the number of clusters. As stated, the maximum value of p(k) represents the proper number of clusters. Table 2 shows the values of p(k) corresponding to the number of clusters. The highest value is 6198 in monitoring the two elements of gold and arsenic. Therefore, the most suitable number of clusters is 4 and so the number of clusters is 9 for two elements of gold and antimony. The number of clusters is 3 for two elements of arsenic and antimony. According to its sampling location, the most suitable number of clusters is 3 for the elements of gold, arsenic, and antimony. As can be seen, the proper number of clusters is obtained from the quality function which is consistent with the standard results of S(i).
For the best classification, the centers of classes are shown in Figure 7 for two elements of gold and arsenic per 4 classes and in Figure 8 for two elements of gold and antimony per 9 classes and in Figure 9 for two elements of arsenic and antimony per 3 classes.
Based on this classification shown in Figure 7, the grade of Arsenic element first decreases and then increases by increasing the grade of the gold element. According to this decreasing and increasing, the best curve is second order with positive concavity. The fitted line is
and its correlation coefficient has been obtained R2 = 0.9927.
The grade of the gold and antimony (Figure 8) increases by increasing the grade of the gold element. Based on this increasing, its best fit is a line with a
Table 2. The values of p(k) for the number of different clusters.
Figure 7. The best line fitted to the centers of classes per four classes for gold and arsenic elements.
Figure 8. The best line fitted to the centers of classes per nine classes for gold and Antimony elements.
Figure 9. The best line fitted to the centers of classes per three classes for arsenic and Antimony elements.
positive slope. The line equation is
and the correlation coefficient has been obtained R2 = 0.9235.
Due to the behavior of gold relative to arsenic and antimony, Arsenic decreases and antimony increases by increasing gold, for this reason, we see the unexpected behavior that two elements of arsenic and antimony do not have an absolute definite relationship. Based on the classification shown in Figure 9, the grade of Antimony element increases and then decreases by increasing the grade of Arsenic element. The fitted line equation is
and the correlation coefficient of the equation fitted to the center of the classes is equal to R2 = 1.
4.3. Investigating the Gold Behavior Regarding the Grade of Arsenic and Antimony Elements
The cluster profile and the utility rate of the classification are given in Figure 10 according to the value of k = 3 to k = 10 for the three elements of gold, arsenic, and antimony (considering the length and width of the points).
As observed, based on the obtained results for different values of k (3 to 10), the best classification of the specimens has 5 classes with the characteristic of the
Figure 10. The profile of clusters and utility rates with 3 to 6 classes for gold, arsenic, and antimony (with coordinates).
gold, arsenic, and antimony grade, and the length and width of the samples.
Also, according to Figure 11, the highest value of S(i) belongs to 5 classes. The value of k has been increased to 20 due to the changes in the utility rates and to ensure the results, but the utility rate has not exceeded the rate of this class and it has decreased for more than 20 classes.
Therefore, in order to obtain center characteristics of classes, all input values must be in a standard interval to prevent the error in calculations and obtain accurate estimation value given the fact that coordinates are considered as input features besides the grade of gold, arsenic, and antimony and the interval of coordinates variation and grade values are different. For this reason, all inputs were selected in the interval [1, 0] using Equation (10).
The characteristics of cluster centers are given with five classes in Table 3.
(9)
5. The Prediction of the Gold Grade
In this section, a relationship has been determined between gold, arsenic, and antimony using the multi-variable regression in the SPSS software according to the length and width of the samples selected from all samples or cluster centers, so that the elements grade can be estimated using obtained relationship.
The gold value is introduced to the software as a dependent variable and the values of arsenic and antimony, length, and width of the points are introduced as
Figure 11. The change in the validation value S(i) based on the number of clusters (for three Gold, Arsenic, and Antimony elements).
Table 3. The normalized characteristics of cluster centers.
independent variables. Then, the results of Table 4 and Table 5 were reported as characteristics and multi-variable regression coefficients of Equation (11) were calculated.
(10)
5.1. The Estimation with All Samples
The formula of the multiple regression line is determined (Equation (12)) according to the coefficients in Table 4.
(11)
The obtained R-value represents a parabola equation that the regression model has been able to explain the variations in terms of y (Gold). Here, the R-value is 70 and therefore, 70% of variations of gold values are due to Xs (i.e. arsenic, antimony, length and width of the sampling points). To validate the gold grade estimation based on the obtained equation (Equation (12)), a number of actual data should be compared with the values obtained from the equation to measure the accuracy of the estimator. Therefore, 30% of the samples randomly separated before the multi-variable regression based on the values of the antimony and arsenic elements and the length and width of the sampling points and replacing it in Equation (12) and then, according to these samples, the value of the gold element is estimated and is compared with its actual value. The results are shown in Figure 12 as a dispersion diagram. Figure 12 shows the correlation between the real and estimated grades with a correlation coefficient of 52% which indicates the relative accuracy of the used method.
Figure 12. Estimated value of gold versus actual value.
Considering the actual gold amount with the Kriging method, the area map (Figure 13) is drawn to compare with the resulting map with the grades obtained from Equation (12) (Figure 14). The point of these maps is the normalization of all of the parameters. For this reason, it can only be used visually for the accuracy of the estimation.
5.2. The Estimation with K-Means Clustering Centers
The estimations were performed exactly with the above method but without data and only with the centers of the classes; According to regression, it is suggested that the coordinates be eliminated from the prediction system due to lack of grade justification. Therefore, the coefficient is considered only for the two values of arsenic and antimony in this method and this fit is 100% definitive for centers and the Equation (13) is obtained from the results of Table 5.
Given the generality of estimation, it was tested on 100 data randomly (Figure 15).
This method is about 72% more accurate than the previous one and it can be said that the K-means method is a more efficient method.
Figure 13. Schematic representation of actual Gold grades on the map from the highest (red) to the lowest grade (green).
Figure 14. Schematic representation of predicated Gold grades on the map from the highest (red) to the lowest grade (green).
Table 5. Regression coefficients with the centers of classes.
Figure 15. Estimated value of gold versus actual value.
6. Conclusion
Considering the evidence of gold mineralization in the Khoni zone, investigating the extent of geochemical halos and the behavior of gold paragenesis elements is important in the region. For this purpose, the behavior of the elements of gold, arsenic, and antimony was compared with each other in the area using K-means method and it was examined between two elements. The equations were presented along with correlation coefficients. Then, the relationship of elements was determined using this method by taking into account the latitude and longitude of the samples in order to estimate the grade and more accurate estimation of the appearance and extent of the geochemical halos in the studied area. According to the results obtained from the process of the mentioned elements, the equation is as Equation (13) for the estimation of the gold grade based on four parameters of arsenic grade, antimony grade, length and width of sampling points and the correlation coefficient has been reported 72%. These results can be used in a variety of topics [31] - [37].
Acknowledgements
Thank you Dr. Seyyed Saeed Ghanadpour and Ms. Neda Mahoosh Mohammadi for help on the above topic as well as providing specialized help and creating this method [31] [32].