Research on weighting method of geological hazard susceptibility evaluation index based on apriori Algorithm

Given the inconsistency between the information value and the weight value in the weighted information value model, a weight model based on the Apriori algorithm is established in this paper to analyze the correlation between the second-level intervals of disaster factors and the susceptibility of geological disasters. The objective weight of the second-level intervals of each index factor is calculated through the mining of association rules by the Apriori algorithm. The subjective uncertainty of the existing second-level factor weighting method is eliminated. Taking the geological disaster data of Xiangtan urban area as an example, 10 evaluation indexes were selected to establish the entropy weight method-information value (EWM-IV) model and the entropy weight method-Apriori algorithm-information value (EWM-Apriori-IV) model to evaluate the geological disaster susceptibility, and the disaster area ratio and the receiver operating characteristic curve (ROC) verification method were used to test and analyze the evaluation results. The results showed that compared with the EWM-IV model, the EWM-Apriori-IV model is used to evaluate the disaster area ratio of high-prone area increased by 58.3%, and the disaster area ratio of low-prone area decreased by 43.1%, the area under the curve (AUC) increased by 7.4%, and the evaluation accuracy was relatively improved compared with the former. This paper proves the rationality and practicability of the weighting method of the geological hazard susceptibility evaluation index based on the Apriori algorithm.


Introduction
Geological hazard susceptibility evaluation is an important link and basis for disaster prevention and reduction (Chen et al., 2005;Ma et al., 2021). Currently, the commonly used susceptibility evaluation models include analytic hierarchy process model (Chung and Fabbri, 1999;Wang et al., 2009;Xu et al., 2009), weighted information value model (Wang et al., 2014;Jiao et al., 2019;Alsabhan et al., 2022), logistic regression model (Budimir et al., 2015;Tang and Ma, 2015), artificial neural network model (Nourani et al., 2014), support vector machine model (Kavzoglu et al., 2014), etc., among which the weighted information value model is widely used in the research field of geological disaster susceptibility evaluation due to its clear physical significance and simple algorithm. Shen H et al. (Shen et al., 2021) established a weighted information value model based on the weight value and information quantity value of each index obtained by the analytic hierarchy process (AHP) and information value model to conduct a comprehensive assessment of landslide susceptibility. Liang L et al. (Liang et al., 2019) used the certainty factor model to determine the first-level weight of each index factor, and then multiplied the information value to establish a weighted information value model to evaluate the susceptibility of geological disasters. Yang P et al. (Yang et al., 2020) multiplied the weight of each first-level index factor determined by the random forest model and the information value of each second-level index factor determined by the information value model respectively, and established the weighted information value model for the evaluation of landslide susceptibility. All of the above studies are based on the establishment of weighted information value model for susceptibility evaluation by obtaining the weight of first-level factors and combining with the second-level interval information value of each factor. However, the problem of inconsistency between the weight value of first-level factors and the information value of second-level factors may occur, thus affecting the accuracy of geological disaster susceptibility evaluation. Aiming at this problem, some scholars (Xie, 2011;Wang et al., 2012) determined the second-level interval weights of indicators based on the trapezoidal fuzzy number with subjective experience for susceptibility evaluation, but the evaluation results were worse than the mathematical statistics model with objective weighting. In view of this, this paper introduces the weighting method of geological disaster susceptibility evaluation index based on Apriori algorithm, and obtains the second-level factor weight value by analyzing objective data. Apriori algorithm was proposed by R.Aglawal et al. based on previous research results of AIS algorithm (Agrawal and Srikant, 1994;Yu, 2004). This algorithm is applicable to transaction database association rule mining and can reflect the interdependence and correlation between one thing and other things through association rules. Wu T et al. (Wu and Niu, 2011) used the Apriori algorithm to dig for the correlation between disaster scale and various single and multiple factors. Li J et al. (Li and Niu, 2013) obtained the relationship between land use type and landslide stability through the Apriori algorithm. Jie Q et al. (Jie et al., 2015) used the Apriori algorithm to excavate the deformation laws of several landslides. The above scholars use Apriori algorithm to dig out the internal connection between geological disasters and first-level disaster factors, and prove the feasibility of Apriori algorithm in the field of geological disasters. Therefore, according to the principle that the Apriori algorithm can mine the commonness of historical disaster data, it is integrated into the weighted information value model and weighted to the second-level interval information value of each evaluation factor to solve the problem that the factor weight value is inconsistent with the caliber of the information value.
Based on this, this paper takes Xiangtan urban area in Hunan Province as the research area, introduces the Apriori algorithm to mine and analyze the association rules between historical disaster data and geological disaster susceptibility, establishes the evaluation index system of geological disaster susceptibility in Xiangtan urban area, determines the objective weights of the second-level intervals of each index factor, and combines with the objective weight of first-level index factor of entropy weight method. The EWM-IV model and the EWM-Apriori-IV model were established respectively to evaluate the susceptibility of geological disasters. The feasibility and accuracy of the weight model based on the Apriori algorithm were demonstrated through accuracy verification and comparative analysis.
2 Materials and methods 2.1 Overview of the study area and data source Xiangtan urban area, Hunan Province, with a total area of 657.4 km 2 , is located in the middle part of the Hengshan Mountain range and belongs to the alluvial plain and red soil terrace on both sides of the Xiangjiang River. The terrain is high and convex in the east and west, and low and concave in the middle. The overall terrain is relatively flat, with an elevation between −82 m and 289 m. The outlying strata in Xiangtan urban area are relatively complete and are characterized by the wide distribution of red beds and pre-Devonian shallow metamorphic clastic rocks and a complex and diverse sedimentary environment. From old to new, Lengjiaxi Group, Banxi Group, Sinian System, Cambrian system, Ordovician system, Devonian system, Carboniferous system, Permian system, Triassic system, Jurassic system, Cretaceous system, Paleogene system and Quaternary system (including alluvium and residual slope layer) are exposed successively. The regional structure can be divided into three types: the Heling-Nanzhushan fault folds belt in the northwest, the Xiangtan fault depression basin in the middle, and the Baimalong-Shuangmazhen fault folds belt in the east. The river system in the territory is mainly composed of the Xiangjiang River and its main tributaries Lianshui River and JinJiang River. Affected by the subtropical humid monsoon climate, it enjoys abundant sunshine, and high temperature and rain in summer. The annual rainfall is between 1,250 and 1,500 mm, mainly in spring and summer, with 68% of the annual rainfall. There are 121 geological disaster spots in Xiangtan urban area (Figure 1), mainly landslide, collapse, and ground collapse disasters. The threatened population is about 2,500, and the potential economic loss is nearly 10,000 yuan. The main data sources of this paper are shown in Table 1.

Establishment of weight model based on apriori algorithm
The Apriori algorithm is the most classic algorithm for mining frequent item sets, which can extract association rules from large data sets (Zhang, 2016;Hidayanto et al., 2017). Algorithm steps (Han and Kamber, 2001;Yu, 2004;Li et al., 2020) are shown in Figure 2 below.
Apriori algorithm adopts iterative method to find frequent item sets. In the mining process of association rules, the frequent item sets of k-1 items will be connected all the time to generate k item sets. Then, the minimum support threshold is set by calculating the Frontiers in Earth Science frontiersin.org support degree of k item sets, namely, the probability of X and Y appearing simultaneously in the data set, and the frequent item sets of k items are obtained by pruning the k item sets that do not meet the threshold. The frequent item sets of k items are searched and iterated layer by layer until there are no new k item sets. The Apriori algorithm is applied to mine the correlation between the susceptibility of geological disasters and the secondlevel interval of the disaster factor. The second-level interval of the disaster factor is a data type in the form of Boolean (binary type). According to the requirement of the Apriori algorithm that the data type must be a Boolean value, the objective weight of the secondlevel interval of each disaster factor reflecting the correlation of geological disaster susceptibility can be obtained. The specific implementation process is as follows: (1) Storing data sets. The data set contains historical geological disaster data and all geological data in the study area. Boolean data after data processing, each set in the data set contains a disaster point and the second-level interval to which all corresponding disaster factors belong. Each set represents the occurrence of a disaster point and the emergence of second-level intervals of all disaster factors. The data set is scanned and the Apriori algorithm is run, searching for each transaction until the search result is obtained.
(2) Generate the candidate item set.

FIGURE 1
Geological hazard distribution map of Xiangtan urban area. Scan the transaction set and defines all the second-level intervals in the transaction set and the occurrence of geological hazards as the members of the candidate set. Each member of the candidate item set is an independent item set.
(3) Calculate the weight of the index factor second-level interval. Eq. 1 was used to calculate the support degree of each member in the candidate set. Support degree of association rule X→Y refers to the probability of the combination of second-level intervals in the geographic data set of the study area and the occurrence of geological disasters at the same time. If the occurrence occurs once, the support degree accumulates once. The support result is the weight of the second-level intervals of each indicator factor.
(4) Determine the weight of the index factor's second-level interval.
In the process of mining association rules, by setting the minimum support threshold, the association rules that do not meet the minimum support threshold are pruned to improve the efficiency of the algorithm. If the minimum threshold of the above support is met, it can be determined as the second-level weight of the index factor.

Entropy weight method
As an objective weight method with a good evaluation effect, the entropy weight method (EWM) has been widely used in research and practice of geological disaster susceptibility evaluation (Liang et al., 2010;Jiang et al., 2019). The specific calculation method is as follows (Devkota et al., 2013): Where FR ij is the occurrence frequency of geological disasters; P ij is frequency density; a ij and b ij are the disaster quantity and regional area in the j second-level interval of the i first level factor, respectively; n is the total number of second-level intervals of the i first-level factor. The entropy value of the i first-level factor can be expressed as H i : Where e 1 ln n is constant; In order to ensure the 0 ≤ H i ≤ 1, regulation: if P ij = 0, the ln P ij = 0.
Finally, the objective weight w i of the index first-level factor can be obtained: Where N is the total number of index factors.

Information value model
The information value (IV) model is a mathematical statistics method commonly used to analyze data. Through statistical analysis of historical data, the information value of each impact factor is calculated to determine the importance of each impact factor (Fan et al., 2012;Chen et al., 2021). The calculation method is as follows: Where I ij is the information value of the j second-level interval of the i first-level factor; S is the layer area of each evaluation index; S ij is the interval area of the j second-level interval of the i first-level factor; N is the total number of disaster points in the study area; N ij is the number of disaster points in S ij interval.
Combined with the weight model, the weighted information value (WIV) model is constructed. The calculation formula is as follows: Where I is the total information value of each evaluation unit in the overlay layer; n is the total number of index factors; m is the total number of second-level intervals in the first-level factors of each index; Q ij is the index weight value. If the entropy weight method is used for weighting, Q ij is the first-level weight of each index factor calculated by the entropy weight method. If the entropy weight method-Apriori algorithm is used for weighting, Q ij is the secondlevel comprehensive weight of each index factor, the objective weight w i of each first-level factor calculated by the entropy weight method is multiplied with the weight r ij of the corresponding second-level

FIGURE 2
Flow chart of the Apriori algorithm.
Frontiers in Earth Science frontiersin.org 04 interval calculated by the Apriori algorithm, as shown in Eq. 8. Figure 3 shows the calculation process of EWM-Apriori-IV model based on the entropy weight method-Apriori algorithm.

Data processing
According to the characteristics of historical geological disaster data in the study area and previous research results (Lan et al., 2002;Shahri et al., 2019), 10 geological disaster susceptibility factors including altitude, slope angle, slope aspect, landform, lithology, vegetation coverage, average annual rainfall, distance to faults, distance to roads and land use pattern were initially selected to evaluate the geological disaster susceptibility in the study area (Meng et al., 2010;Chen et al., 2013;Zhao et al., 2021). With 30 m×30 m grid units as evaluation units, the research area was divided into 907,823 units. In order to facilitate data processing by the Apriori algorithm, each factor needs to be classified to convert into Boolean data. The natural breakpoint method was used to classify each factor in ArcGIS software, and the data characteristics of each factor in the study area were analyzed by adhering to the principle of "similar within the interval, different outside the interval". The partitioning results of each evaluation factor were shown in Figure 4.
According to the historical landslide disaster point and the second-level interval Boolean data of each index factor in the study area (Table 3), the Apriori algorithm is applied for data mining to analyze the association rules that have guiding significance for the construction of geological disaster susceptibility evaluation index system in the study area, as shown in Table 2.
According to Table 2, the confidence level of the association rule "G4, D1, J2, E9, A5, B5, F2″ is the highest, indicating that the index factors contained in the disaster causing condition obtained when mining this association rule should be the favorable candidate factors for the evaluation index system of geological disaster susceptibility in the study area. Meanwhile, four association rules in the table cover all index factors with high

Frontiers in Earth Science
frontiersin.org confidence, indicating that 10 index factors initially selected, including altitude, slope angle, slope aspect, landform, lithology, vegetation coverage, average annual rainfall, distance to faults, distance to roads, and land use pattern, should be selected as index factors for the evaluation of geological disaster susceptibility in the study area.

Determination of weight based on the apriori algorithm
Through the analysis and processing of geographical data and historical geological disaster data of the study area, the internal commonness of historical disaster points is explored. According to the above 10 susceptibility evaluation indexes such as altitude, slope angle, slope aspect, lithology, and average annual rainfall, the Apriori algorithm is used to calculate the correlation between the second-level interval of each index factor and the occurrence of geological disasters. To determine the contribution rate (weight) of different second-level intervals of each indicator factor to the geological disasters in the study area (Zhang and Jiang, 2004), that is, to explore the statistical relationship between the point value data of each indicator factor located in different secondlevel intervals and the distribution of geological disasters in the study area.
The different second-level intervals of each susceptibility evaluation index factor were numbered, that is, the altitude was "1″, and the five second-level factor intervals were "1.1, 1.2, 1.3, 1.4, and 1.5″respectively. The slope angle is classified as "2″, and its 5 second-level factor intervals are respectively "2.1, 2.2, 2.3, 2.4, 2.5″, and so on. Data of all second-level factor intervals of each historical geological disaster point are numbered. According to the implementation process of the Apriori algorithm in Section 2.2.1, the statistical data set of the disaster point is input into the software for program implementation by Python language, and the objective weight value of the second-level interval of each index factor is finally obtained, as shown in Table 3.

Evaluation of susceptibility
The entropy weight method was used to calculate the first-level objective weight of each evaluation index, and the information value of the second-level factor interval of each evaluation index was obtained by the information value model. The calculation results are shown in Table 3. According to the superposition analysis function of ArcGIS software, combined with the weight results of the above evaluation indicators and the information value calculation results, the EWM-IV model and the EWM-

Frontiers in Earth Science
frontiersin.org Apriori-IV model was established according to Eq. 7 respectively to evaluate the susceptibility of geological disasters in the study area. The natural breakpoint method was used to partition the evaluation results of the two models, and the results of geological disaster susceptibility evaluation of Xiangtan urban area were respectively obtained, as shown in Figure 5.

Comparison of evaluation accuracy
At present, there are disaster area ratio verification methods and ROC curve verification methods to verify the evaluation results of geological disaster susceptibility (Kamp et al., 2008;Bai et al., 2010;Fan et al., 2014). The disaster area ratio       verification method compares and verifies the ratio between the number of historical disaster points and the area in each prone area. The larger the disaster area ratio in the high-prone area and the smaller the disaster area ratio in the low-prone area, the more accurate and effective the evaluation results will be (Chen et al., 2019). ROC curve refers to the receiver operating characteristic curve, and the area under the curve (AUC) is used to judge the accuracy of model evaluation results. The higher the AUC value, the better the prediction ability of the model (Pradhan, 2013;Wang et al., 2013).Tables 4, 5 indicates the disaster area ratio statistics of the EWM-IV model and the EWM-Apriori-IV model.
By comparing the statistical results of the disaster area ratio evaluated by the two models, the disaster area ratio of the highprone areas evaluated by the EWM-Apriori-IV model is 1.141, which is higher than 0.721 of the EWM-IV model, and a relative increase of 58.3%. The disaster area ratio of the low-prone areas evaluated by the EWM-Apriori-IV model is 0.033, lower than the EWM-IV model's 0.058, which is a relative decrease of 43.1%. The results show that the use of the EWM-Apriori-IV model to evaluate the susceptibility of geological disasters in the study area greatly improves the disaster area ratio of the highprone area and reduces the disaster area ratio of the lowprone area, and the evaluation effect is more accurate and effective.
The ROC curve test was conducted according to the evaluation results of geological disaster susceptibility generated by the two evaluation models. The sensitivity was taken as the vertical coordinate and the specificity was taken as the horizontal coordinate, and the evaluation results were imported into SPSS software for analysis. The results were presented in Figure 6.
The area under the ROC curve (AUC) generated by the two models is both 0.7%-0.9, indicating that the success rate and prediction degree of the results obtained by the two evaluation models used in this paper are between 70% and 90% when evaluating the susceptibility of geological disasters in the study area, with high accuracy. It can be seen from Figure 6 that the AUC values of the EWM-IV model and the EWM-Apriori-IV model are respectively 0.753 and 0.809, that is, the predictive ability of the model is 75.3% and 80.9%, and the latter is 7.4% higher than the former, showing better predictive ability.

Discussion
Accuracy of geological hazard susceptibility evaluation is affected by the weighting method of evaluation index. As a commonly used  Frontiers in Earth Science frontiersin.org susceptibility evaluation model, the weighted information value model is established in most studies by combining the first-level index factor weighting method with the second-level interval information value of each factor, while the second-level interval weight of the index factor is rarely analyzed. Therefore, the inconsistent caliber of information value and weight value will be generated by using this model for evaluation, which will affect the evaluation accuracy. In this paper, the Apriori algorithm is introduced to improve the weighted information value model. By analyzing the correlation between the second-level interval of each disaster causing factor and the occurrence of geological disasters, the objective weight of the second-level interval of each evaluation index is determined, and the susceptibility evaluation of geological disasters is completed by combining the second-level interval information value of each index factor. Thus, the problem that the information value is inconsistent with the weight value in the weighted information value model is solved. According to the evaluation results, the regionalization effect of EWM-Apriori-IV model is better than that of EWM-IV model. Compared with EWM-IV model, the disaster area ratio of high and low prone areas evaluated by EWM-Apriori-IV model increased by 58.3% and decreased by 43.1%, respectively, and the AUC value of the area under ROC curve increased by 7.4%. Therefore, it is proved that the accuracy and rationality of introducing Apriori algorithm to obtain the second-level interval weight of the index factor and combining with the information value model to predict susceptibility. In addition, this paper only analyzes the feasibility of improving the weighted information value model based on Apriori algorithm. For more susceptibility evaluation methods, it is worth further discussion and discussion to combine them with the weighting method of geological hazard susceptibility evaluation index based on Apriori algorithm to carry out index second-level factor weighting and susceptibility evaluation.

Conclusion
Taking the Xiangtan urban area of Hunan Province as the research area, this paper selected 10 evaluation indexes, introduced the Apriori algorithm as the weight model of the second-level interval of the evaluation index, constructed two evaluation models to evaluate the susceptibility of geological disasters in the research area, and carried out precision verification and comparative analysis on the evaluation results. The results are as follows.
(1) The Apriori algorithm is introduced to analyze the correlation between the second-level intervals of disaster-causing factors and the susceptibility to geological disasters. A weighted model based on the Apriori algorithm is established to achieve the objective weighting of the second-level intervals of disastercausing factors in the susceptibility evaluation of geological disasters and to solve the problem of the inconsistency between the information value and the weight value in the weighted information value model. (2) The EWM-IV model and the EWM-Apriori-IV model are established respectively to evaluate the susceptibility of geological disasters in the study area. The results show that: When the weight model based on the Apriori algorithm is used to assign a weight, the accuracy of susceptibility evaluation is significantly increased by 7.4%, and the disaster area ratio of the high-prone area is increased by 58.3%, while the disaster area ratio of the low-prone area is reduced by 43.1%, indicating that the EWM-Apriori-IV model is more accurate and rational in evaluation.
(3) According to the susceptibility evaluation results of the EWM-Apriori-IV model, the study was divided into high, medium, and low susceptibility areas. The high-prone area covers 68.4 km 2 , accounting for 10.4% of the Xiangtan urban area, and contains 78 geological disaster points, accounting for 64.4% of the total geological disaster points. The middleprone area is 158.4 km 2 , accounting for 24.1% of the Xiangtan urban area, including 29 geological disaster points, accounting for 24.0% of the total geological disaster points. The low-prone area covers 430.6 km 2 , accounting for 65.5% of the Xiangtan urban area, and contains 14 geological disaster points, accounting for 11.6% of the total geological disaster points.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material further inquiries can be directed to the corresponding author.