Regimentation of geochemical indicator elements employing convolutional deep learning algorithm

Recently, deep learning algorithms have been popularly developed for identifying multi-element geochemical patterns related to various mineralization occurrences. Effective recognition of multi-element geochemical anomalies is essential for mineral exploration, and effective recognition is extremely dependent on integral clustering. Deep learning algorithms can achieve impressive results in comparison to the prior methods of clustering indicator elements correlated to mineralization for a region of interest due to their superb capability of extracting features from complex data. Although numerous supervised and unsupervised deep learning algorithms have been executed for the recognition of geochemical anomalies, employing them for clustering geochemical indicator elements is rarely observed. In this research, a convolutional deep learning (CDL) algorithm was architected to recognize and regiment geochemical indicator elements in Takht-e Soleyman District, Iran. Various opinions and experiments were considered to reach optimum parameters of this architecture. Fortunately, the achieved root mean square error (RMSE) values were in the appropriate range (<20%) which display the predicted values of the dependent variables (Pb as a pioneer of the first group and Ag as a pioneer of the second group) through their independent variables that are so close to their actual values. Also, the great R2adj calculated (more than 90%) for the last stage of regimentation confirms impressive accuracy and performance of the convolutional deep learning algorithm for clustering geochemical indicator elements of the study area.


Introduction
Appropriate regimentation of geochemical indicator elements of mineralization occurrences is a challenging issue due to the complexity of geological features, especially for a big geochemical data collection of stream sediments on a regional scale (Ghezelbash et al., 2019;Ghezelbash et al., 2020). Dividing geochemical indicator elements associated with mineral deposits into efficient and inefficient groups can be different due to employing various traditional clustering methods or factor analysis techniques (Templ et al., 2008;Yang et al., 2016). Thus, employment of a suitable methodology such as convolutional deep learning (CDL) algorithm can regiment big geochemical data into meaningful groups of indicator elements. In fact, complexity and diversity of geological features and application of various clustering methods can influence numbers and types of indicator element groups and complicate geochemical anomaly detection. Therefore, clustering indicator elements analyzed into efficient groups is a fundamental operation in the initial stages of mineral exploration (Ghezelbash et al., 2020). Traditional clustering procedures such as fuzzy c-means, K-medoids, and K-means classified as unsupervised techniques are carried out for discriminating geochemical data into homogeneous groups or clusters for distinguishing background and anomaly populations (Clare and Cohen 2001) or alterations and lithological units (Yang et al., 2016), while machine learning algorithms and the CDL structure can especially be executed for professional regimentation of geochemical indicator elements. The big amount of geochemical data on stream sediments is commonly regarded as compositional data that display multivariate behavior (Ghezelbash et al., 2020). Hence, architecting a CDL structure that includes multivariate regression can be so effective. Previously, response surface regression, polynomial regression, factorial regression, and multiple regression were executed by many researchers (Howarth 2001;Coburn, Freeman et al., 2012;Granian, Tabatabaei et al., 2015;Sabbaghi 2018). Response surface regression has been widely applied in earth science fields due to its nature. In fact, the function achieved from response surface regression has been prevalently employed to present the behavior of a dependent variable with respect to independent variables to discover the optimum position of effective variables through particular parameters in environmental sciences. Convolutional neural networks (LeCun et al., 2015), deep belief nets (Hinton et al., 2006), artificial neural networks (Anderson 1972), logistic regression (Cox 1959), ensemble learning (Dietterich 2002), random forest (Breiman 2001), and support vector machine (Vapnik 1999) classified as supervised learning algorithms are concentrated on classifying situations, issues, or objects according to the known data labeled into a machine. Moreover, feature extraction (Coates et al., 2011;Sabbaghi and Moradzadeh 2018;Sabbaghi and Tabatabaei 2023), dimensional reduction (Redlich 1993), and density estimation (Scott and Knott 1974;Silverman 2018) classified as unsupervised learning algorithms are applied to detect hidden potential patterns of a big dataset without the known labeled data (Pal, Ruidas et al., 2022). The machine learning methods such as support vector machine and random forest are considered shallow learning methods and only include one hidden layer or can even be without hidden layers. Therefore, their development ability is generally restricted to distribution issues of big complex data. The observable difference between the aforementioned networks with a hidden layer and deep learning networks is epitomized in their depth (Chakrabortty, Pal et al., 2021;Ruidas, Pal et al., 2021;Roy et al., 2022;Saha et al., 2022). Thus, more complex features can be extracted through deep learning networks because high-level features are created by combining low-level features. A deep autoencoder network was initially applied to map mineralization zones of an iron polymetallic deposit by Xiong and Zuo (2016). Subsequently, deep learning networks became more popular in several fields of mineral exploration (Zuo 2017;Zuo 2020;Zhang et al., 2021). For example, the GoogLeNet, as a convolutional neural network, was employed to map potential zones of gold deposits by Yang, Zhang et al. (2021). However, disregarding domain knowledge and experiments in purely data-driven deep learning networks can frequently lead to interpretation trouble from the geochemical perspective. In conclusion, incorporating geochemical knowledge and expert's opinions into deep learning networks can create new challenges in this field. This research intends to present an unsupervised CDL algorithm for clustering geochemical indicator elements of the Mississippi Valley-type (MVT) Pb-Zn deposit in Takht-e Soleyman District in Iran. We attached a regression layer to the constructed network and performed a known forward strategy of multivariate regression for predicting values of pioneer elements of each cluster.
2 Study area 2.1 Takht-e Soleyman zone This area is considered a significant part of the Takab mineralization zone in West Azerbaijan Province, Iran. The Takhte Soleyman region is restricted between 47°0′ 0} E and 47°30′ 0} E longitudes and 36°30′ 0}N and 37°0′ 0}N latitudes, which is exactly situated between the Urumieh-Dokhtar Volcanic Arc (UDVA) and the Sanandaj-Sirjan Zone (SSZ) (Figure 1). The extensional faults of the region commonly has an E-W or NE-SW trend, which is considered the mineralization factor for MVT Pb-Zn deposits and epithermal gold deposits. The geological structures of this zone mostly contain carbonated, metamorphic, and sedimentary rocks and volcanic outcrops that are rarely observed.

MVT Pb-Zn mineralization
The MVT Pb-Zn deposits typically occur as stratiform in passive margin settings and have continuous and huge orebodies that are weakly associated with their alterations (dolomitization and silicification) (Wei et al., 2020). It is known that 25% of lead-zinc requirements of the world are supplied through the MVT Pb-Zn deposits. Therefore, they are remarkable in mineral prospectivity mapping (Sabbaghi and Tabatabaei 2020;Sabbaghi and Tabatabaei 2022). These deposits are hosted by carbonate rocks (dolostone and limestone), which are observed in foreland basins of orogenic belts (Wei et al., 2020). Their simple ore mineralogy primarily includes Fe sulfides, galena, and sphalerite (Hosseini-Dinani and Aftabi 2016).

Multivariate regression
The regression procedure was introduced as a statistical method for considering relationships between variables. For instance, a dependent variable (Y) can be delineated through the function of independent variables (x i ) given as follows: When Y is a linear function of x i , regression is commonly named linear. Otherwise, regression would be non-linear with a delineated non-linear function. Vugrinovich (1989), Saunders et al. (1991), and Karathanasis (1999) have adequately performed linear and non-linear regression for investigating the behavior of different geoscience variables. Accordingly, the multivariate regression function is expressed as follows: where a 0 and a i (i = 1, 2, ..., n) are the constant factor and partial coefficients, respectively, and Ɛ represents the random error. The random error value reveals the deviation of Y values predicted from their actual values. Prior studies (Granian et al., 2015;Karbalaei Ramezanali et al., 2020) have suggested the measured variable p for each sample, when a dataset contains n samples. Therefore, Eq. 2 can be represented as follows: also, its matrix form is calculated as follows: Regression coefficients are estimated by applying the least squares method expressed as follows: where [G] is the covariance matrix between the independent variable and samples, [Ʃ] -1 is the inverse of the variance-covariance matrix of the samples, and [A] is the coefficient matrix. For accepting the function fitted in regression analysis, the following three criterions should be prepared: 1) the variance and mean of the random error (ε) should be equal to the constant value and zero, respectively; 2) variance analysis should be performed until the function fitted into the data is significant (significance level α = 0.05, can be considered); and 3) calculating the determination coefficient (R 2 ) using the following equation: where Ŷ i , Y i , and Y i are considered the estimated value of the ith dependent variable, mean of the dependent variable, and the ith dependent variable, respectively. While predicted values for the dependent variable (Ŷ i ) are close to their actual values (Y i ), it means that the regression model has been properly fitted and the determination coefficient (R 2 ) is close to 1. Under the same condition, models have a higher priority while including a lower degree of complexity. The determination coefficient may be an appropriate parameter for considering multivariate regression models with the same number of independent variables, but it is not suitable for the comparison of models with various numbers of independent variables (Granian et al., 2015). Accordingly, the adjusted determination coefficient (R 2 adj) should be calculated as follows: where n and t are the number of samples and variables, respectively.

Convolutional neural network
A convolutional neural network (CNN) (Figure 2) regularly includes a convolutional layer, pooling layers, and fully connected layers that is recognized as a feedforward neural network (LeCun et al., 2015). The CNN was first executed for anomaly recognition and image classification in remote sensing data. A significant section of the CNN

Frontiers in Environmental Science
frontiersin.org is the convolutional layer, which extracts high-level features of big datasets by using a convolutional procedure. Several advantages of a convolutional layer are as follows: 1) applying the weight sharing procedure for reducing parameters of a model and 2) maintaining invariance of an object location. The common two-dimensional convolution formula has been expressed by the following equation: where f (x, y) plays the role of a filter for the convolving matrix G with n × m dimensions, resulting in the central result G*(x, y) around the central coordinate c. Convolutional layers are learned through some filters such as f which are generally followed by the application of the operation of a down-sampling in m and n for condensing spatial information. These forcing functions commonly aid in learning complicated representations in next convolutional layers progressively (Figure 2A). Pooling layers generally interfere along convolutional layers in a CNN for decreasing dimensions of network parameters and output features. Pooling layers are similar to convolutional layers because of considering neighboring features and can maintain translation invariant. The most applicable pooling operations are generally average pooling and max pooling. For example, a max pooling layer can reduce an 8 × 8 feature tensor to a 4 × 4 feature tensor through employing a window with 2 × 2 dimensions and a two-stride size ( Figure 2B). The last layers of the CNN are commonly the fully connected layers, which are applied for classification and feature union ( Figure 2C) (Krizhevsky et al., 2017). In fact, flattening output features into a column vector and subsequently converting them into a specific division for classification are the most significant duty of a fully connected layer .

Deep learning algorithm
Deep learning algorithms are a subset of machine learning algorithms which are employed to minimize the contrastive divergence of deep networks (comprising more processing layers) by applying an iterative training procedure. These algorithms generally encode training samples and rebuild them when they are consecutively represented to the network. Accordingly, interlayer connection weights are being continuously moderated. In machine learning algorithms, suitable iterations can only create a well-trained model for converging into a general solution. For machine learning algorithms, a necessary number of iterations are only revealed using the trial-and-error procedure, while deep learning algorithms can present a number of suitable iterations with the least value of loss through loss function, which has been embedded in their structures. The requirement data for deep learning algorithms were typically divided into training, testing, and validation data. Training data are applied for the descending gradient procedure on the objective function. In the training procedure, the model result is tested through testing data (unseen data). Validation data are ultimately applied to evaluate network performance.

Data preparation
A total of 868 stream sediment samples were collected from the region of interest. The collected samples were analyzed to consider 38 elements by the induced coupled plasma method. For each 20 measurements, the duplicated sub-samples were analyzed for considering the precision of the analyzing procedure. The

FIGURE 2
Common CNN framework.

Frontiers in Environmental Science
frontiersin.org analyzing error was less than ±10%. The stream sediment data are generally compositional data that are concerned with the problem of data closure. Accordingly, moderating outlier values were calculated by applying the robust Mahalanobis distance procedure. Then, stream sediment data were preprocessed by applying the isometric log-ratio transformation for removing the data closure problem and transforming into the range of [0, 1] (Wang et al., 2014).

Results and discussion
The CDL framework generally requires training, validation, and testing data. Hence, the 868 collected samples were prevalently divided as follows: 80% training data (608 samples), 10% validation data (87 samples), and 20% test data (173 samples). The CDL parameters (such as learning rate and minimum batch size) and the number of network layers were regulated through the trial-anderror procedure for extracting high-level features of multivariate geochemical data. The speed of back propagation as the performance of the training procedure is generally determined by the learning rate factor. Empirical studies displayed that for conducting the best training, the learning rate factor should be 0.01. We architected a CDL network employing the convolutional layer for clustering geochemical indicator elements with the learning rate of 0.01, learning rate drop factor of 0.7, learning rate drop period of 50, validation frequency of 50, maximum epoch of 30, and minimum batch size of 16. Also, we employed a kernel window with 2 × 2 dimensions and a one-stride size for the convolutional layer. Prior studies established an essential role of indicator elements such Pb, Zn, Ag, As, Cd, and Sb in the detection of primary and secondary dispersion halos of the MVT deposit (Wang et al., 2017;Li et al., 2018;Williams et al., 2020). As the first step of regimentation, we choose Pb and Ag elements as pioneers of the first and second groups, respectively. Based on Pearson's correlation coefficients (Table 1), Pb presented a high correlation to the Zn element, while Ag exhibited a great correlation to Cd, As, and Sb. In this research, a forward strategy of multivariate regression was applied for clustering geochemical indicator elements by training a CDL network. Figure 3A clears the difference between actual values of Pb (as a dependent variable) and its predicted values through Zn values (as an independent variable) for the seventh training of the network. It is clear that the dependent variable (Pb) has been properly predicted via the correlated independent variable (Zn). The great adjusted determination coefficient (R 2 adj) of this process (up to 0.9) establishes this claim (Table 2). Furthermore, the seventh testing procedure was concurrently performed to consider the aforementioned difference in the test data, which have been assumed as unseen data ( Figure 3B). The R 2 adj of the testing data is also exhibited in Table 2. In fact, the seventh training of the dataset has reached the best training performance with the least training loss which has decreased to 0.1 (Figure 4). Furthermore, the root mean square error (RMSE), which is commonly employed to evaluate predicted values of the dependent variable, has been presented as a diagram in Figure 4. This parameter clears that the estimation of Pb values through Zn values is an ideal condition (less than 0.2). So the first group that is regimented can play an essential role in detecting mineralization zones. The regimentation of the second group assumes the Cd element as its second member.
The predicted values of Ag via Cd values display minor errors in their actual values ( Figure 5A). Although, the R 2 adj calculated for training data (equal to 0.869) and testing data (equal to 0.852) highlights these minor errors (Table 2). In fact, the fifth training ( Figure 5A) and testing ( Figure 5B) of the network have achieved these results and have great conformity together. Also, the loss and RMSE values of the fifth training were depicted in Figure 6. Accordingly, it can be observed that the loss value has decreased up to 0.1 again and the RMSE is less than 0.2. The regimentation procedure is continued by selecting the As element as the third member of the second group. Figure 7A establishes that selecting As next as an independent variable for aiming the estimation of Ag values can smooth the prediction procedure because the achieved R 2 adj is more than the back stage. The forward strategy of multivariate regression claims that if adding an element into a group is associated with an impressive increment of the R 2 adj, the selected element will be a steady member, otherwise it should be eliminated (Granian et al., 2015). Therefore, the As element is the third permanent member of the second group because the R 2 adj calculated [training (0.889) and testing (0.878)] for this stage has impressive differences in the coefficients of the back stage (0.869 and 0.852). Also, choosing As as the third element of this group can be evaluated considering the test data condition ( Figure 7B). Running the network for the fourth time created this progress, whose results have been presented in Figure 8. The loss value has similarly decreased close to 0.1 with the RMSE up to 0.2. Finally, the Sb element is imported as the fourth element into the second group. In addition to the proper estimation of Ag values via Sb values in training data ( Figure 9A), testing data have achieved acceptable results ( Figure 9B). Hence, Table 2 clearly shows impressive progress of the R 2 adj for both training (0.934) and testing data (0.926) again. This is a pleasant consequence, which maintains the Sb element in the second group. Furthermore, the loss value (up to 0.05) and the RMSE value (up to 0.15) ( Figure 10) of the third training of the network can remunerate the regimentation procedure of the second

FIGURE 4
Plot of the seventh training of the dataset applying the CDL network for calculating the loss value and RMSE parameter from the first group of geochemical indicator elements (Pb-Zn).

FIGURE 3
Plot of the running CDL network for clustering the first group of geochemical indicator elements (Pb-Zn); (A) training data and (B) testing data.

FIGURE 5
Plot of the running CDL network for the first stage of clustering the second group of geochemical indicator elements (Ag-Cd); (A) training data and (B) testing data.

FIGURE 6
Plot of the fifth training of the dataset applying the CDL network for calculating the loss value and RMSE parameter from the second group of geochemical indicator elements (Ag-Cd).
Frontiers in Environmental Science frontiersin.org 07

FIGURE 7
Plot of the running CDL network for the second stage of clustering the second group of geochemical indicator elements (Ag-Cd-As); (A) training data and (B) testing data.

FIGURE 8
Plot of the fourth training of the dataset applying the CDL network for calculating the loss value and RMSE parameter from the second stage regimentation of the second group (Ag-Cd-As).

Frontiers in Environmental Science
frontiersin.org 08

FIGURE 9
Plot of the running CDL network for the third stage of clustering the second group of geochemical indicator elements (Ag-Cd-As-Sb); (A) training data and (B) testing data.

FIGURE 10
Plot of the third training of the dataset applying the CDL network for calculating the loss value and RMSE parameter from the third stage regimentation of the second group (Ag-Cd-As-Sb).

Frontiers in Environmental Science
frontiersin.org 09 group and terminate the clustering process. In fact, loss and RMSE values have sufficiently decreased and can validate the regimentation procedure.

Conclusion
In this research, a hybrid procedure was created for multivariate regression, and a CDL algorithm was employed for clustering geochemical indicator elements of the MVT Pb-Zn deposit in the Takht-e Soleyman region, which is situated in West Azerbaijan Province, Iran. This hybrid network was architected to divide big geochemical data into training, validation, and testing samples randomly, and the utility and performance degree of the CDL were established using them. The results of this research exhibited that the CDL network with a multivariate regression layer can discriminate significant geochemical indicator elements related to a region of interest in appropriate clusters, while knowledge and experiments are incorporated into the network. The forward strategy of multivariate regression was performed for regimenting based on comparing the calculated R 2 adj after adding an indicator element into the group and before adding it. A CDL framework was constructed with optimum model parameters which regimented geochemical indicator elements into two groups: Pb and Zn as the first group and Ag, Cd, As, and Sb as the second group. For each stage of the training network, the R 2 adj of testing data was computed to evaluate the performance of the trained network, showing that all of them were in the acceptable range (more than 0.8). Furthermore, the loss function results of all acceptable trainings reached the least value expected (up to 0.1). Also, the RMSE value as a parameter for the validation of predicted values by the regression process can validate the training results by reaching the least value. Fortunately, the achieved RMSE values were in the appropriate range which display that predicted values of dependent variables (Pb as a pioneer of the first group and Ag as a pioneer of the second group) through their independent variables are so close to their actual values. Also, the great R 2 adj calculated (more than 90%) for the last stage of regimentation confirms impressive accuracy and performance of the CDL algorithm for clustering geochemical indicator elements of the study area. In fact, this study proposes a new approach for an unsupervised deep learning algorithm which includes the multivariate regression procedure for clustering or other targeting in other fields of geoscience.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Materials; further inquiries can be directed to the corresponding author.

Ethics statement
Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author contributions
ST was the supervisor for the research, and HS is the author of the paper.