A Spatial Assessment of Wildfire Risk for Transmission-Line Corridor Based on a Weighted Naïve Bayes Model

In order to improve the wildfire prevention capacity of transmission lines, a wildfire risk assessment method for transmission-line corridors based on Weighted Naïve Bayes (WNB) is proposed in this paper. Firstly, the importance of 14 collected types of wildfire-related factors is analyzed based on the information gain ratio. Then, the optimal factors set and the most accurate sampling table are constructed by deleting the factors in the lowest order of importance one by one. Finally, the performance of the WNB model is compared to that of NB and BNW models according to the ROC curve and visualization. A total of 76.36% of fire events in 2020 fell in high-risk and very-high-risk regions, indicating the acceptable accuracy of the proposed assessment method of wildfire risk.


INTRODUCTION
The continuously growing economy has brought a higher demand for electricity in China in recent years. Under the background of "power transmission from west to east", cross-regional high-voltage transmission lines achieve the demand for long-distance power transmission. However, transmission-line corridors have to extend to and/or through areas suffering from high risks of wildfires (Song et al., 2012;Zeng, 2009). Once a wildfire is ignited near a transmission line, the burning of vegetation produces high temperatures and a high concentration of soot cover, which causes the dramatic decrease of air insulation under the transmission line. Undere these conditions, breakdowns are prone to happen between phase-to-phase or phase-to-ground and cause a tripping failure in the transmission line (Huang et al., 2015). Under the effect of wildfire combustion, the automatic reclosing is difficult to operate successfully, and may even cause multiple trips of transmission lines to induce a cascading grid failure (Hu et al., 2014;Wu et al., 2012).
To reduce the effect of wildfire on transmission lines, a series of wildfire prevention measures have been proposed and carried out domestically and abroad, such as quantitative forecasting, wide-area real-time satellite monitoring, fire-fighting measures, and so on (Lu et al., 2017;Ye et al., 2014). Compared to passive prevention methods, the assessment of the wildfires risk level can effectively improve the implementation efficiency of key wildfire preventions in high-risk areas and reduce the hazards and economic losses caused by wildfires (Wang and Fan, 2016;Liu et al., 2016).
The outbreak of wildfire generally stems from the combined effects of multiple wildfire-related factors. Scholars have formulated a series of quantitative risk assessments based on wildfire-related factors. Early forest fire risk assessments in China are only focused on meteorological factors. The temperature, humidity, precipitation, and wind speed are used to predict the weather and wildfire behaviors (Xu et al., 2016). However, besides the meteorological factors, other wildfire-related factors, such as vegetation types, land-usage types, and fire-spot density also contribute significantly to assessing fire risk. State Grid Corporation of China has issued drawing guidelines for regional distribution maps of wildfires near overhead transmission lines. In this guideline, fire-spot density factors combined with vegetation burning hazard levels are used to assess and classify the risk level of wildfires in the transmission-line corridors (QGDW11643, 2016). However, these three wildfire-related factors are considered to have the same contribution to the risk assessment of wildfires. The Analytic Hierarchy Process is widely used to differentiate the importance of factors, but it relies too heavily on the subjective experiences gathered by questionnaires Zhu et al., 2016). BP neural network can improve the accuracy of the model by continuously correcting the weight of factors. Liu et al. proposed a wildfire risk assessment method based on the BP neural network (Liu C. X. et al., 2017), but this method requires a huge amount of data as the basis for modeling. The Bayesian Network (BNW) can effectively integrate prior knowledge and objective evidence to solve the uncertainty of wildfire risk assessments based on mathematical statistics and graph theory. However, its complex network structure also easily leads to a reduction in computational efficiency (Dlamini, 2010). The use of Naïve Bayes (NB) based on conditional independence has the advantage of promoting computational efficiency, but neglecting the relationship between the wildfire-related factors could bring the curse of dimensionality and reduce the evaluation accuracy (Chen et al., 2021).
In this paper, a method is established based on Weighted Naïve Bayes (WNB) for assessing the wildfire risk of transmission-line corridors. First, 14 types of wildfire-related factors are screened out. The data of wildfire-related factors is collected and pre-processed in the four southern provinces of China (Yunnan, Guizhou, Guangxi, Guangdong) with grids of 1 km × 1 km. Then the weights of factors are obtained based on the entropy method to weaken the independence assumption of Naïve Bayes (NB). Combining with the most accurate sampling table (MAST), an optimal WNB model is constructed to calculate the wildfire risk probability of the grids in the research area. For visualization, the wildfire risks are graded based on the geometric interval classification. Finally, the performance of WNB is compared to that of BNW and NB models.

Study Area
Four provinces in the south of China, Yunnan, Guizhou, Guangxi, and Guangdong, were selected as the study area. They all belong to subtropical and tropical monsoon climates, being generally rainy with high temperatures which, are suitable climatic conditions for the growth of multiple forests. Specifically, the Yunnan-Guizhou plateau has high vegetation coverage and extensive karst landforms with a sparse population. From January to December in 2020 a total of 825 fire-spots were monitored. Once a fire is ignited, it is difficult to put it out quickly and is prone to spread.

Wildfire-Related Factors
Wildfire ignition requires three preconditions: a fire source, sufficient combustibles, and a suitable environment for fire. Fire sources are generally divided into artificial and natural fire sources. Statistics show that more than 90% of wildfires are caused by human activities, both intentionally and unintentionally. Therefore, this paper chooses five factors to represent the impact of human activities, including Distance from Settlements (DS), Distance from Roads (DR), fire-spot density, Gross Domestic Product (GDP), and population density. Combustibles means vegetation types and their coverage on the underlying surface. This paper selects four factors, land-usage type, vegetation type, fuel load, and Normalized Difference Vegetation Index (NDVI), to represent the influence of the underlying surface. Fire-environment mainly refers to weather conditions and topography. Annual precipitation and annual temperature are selected. And elevation, slope and aspect are used to represent the topographic factors. The chosen factors are listed in Figure 1.

Data Pre-Processing and Discrete Classification
The study area is first divided into 1 km × 1 km grids. The data of wildfire-related factors are provided by the Resource and Environmental Science and Data Center and the National Meteorological Center, which are extracted by using ArcGIS software. Among them, land-usage type and vegetation type are discrete variables, and the remaining factors are continuous variables. The sample set of fire spots is formed by the latitude and longitude of monitored fire-spots from 2010 to 2019, which are provided by the National Meteorological Center. The sample set of non-fire-spots is constructed by random sampling within the study area. To avoid the overlap of firespot and non-fire-spot samples, only grids at least 3 km away from fire-spot samples can be used as non-fire-spot samples.
During the factor collection procedure, the data of DS, DR, elevation, slope, and aspect are calculated by using Digital Elevation Model (DEM). And the data of fire-spot density needs further calculation (Chen et al., 2021), as follows. step 1 The study area is meshed with 0.25 km × 0.25 km precision. The area of a single grid is calculated based on its longitude and latitude, as shown in formula (1-3).
where d 1 and d 2 are the distance spanned by the grid along the longitude and latitude in km. δ is the latitude span of the grid. R 0 6.371 km, which is the average radius of the Earth. α is the central latitude of the grid.
Step 2 The number of fire spots falling on each grid is counted as F x . And the calculation formula of fire-spot density D x is as formula (4).
where the Y is the year of fire-spots.
Step 3 To meet the requirement of spatial resolution, the firespot density to a resolution of 1 km × 1 km by using the Kriging interpolation algorithm.
To accelerate the calculation efficiency of Bayes models, the data of all wildfire-related factors for both fire-spot and non-firespot samples are graded into four classes. The grading standard is to maximize the difference between the groups of samples. Taking the data of elevation as an example, the frequency distributions of wildfire-related factors are firstly calculated and compared for the two groups of samples, as shown in Figure 2. Secondly, all intersections of the distribution curves were found. These intersections can divide data distribution into several intervals, which means the change of probability for wildfire occurrence. For example, when the elevation is lower than 605 m, or between 1,105 m and 1,745 m, there is a greater probability of wildfire occurrence. Thirdly, for the factors of which the number of intersections is more than three, the adjacent intervals are then combined to reduce the final classes number into four.
For the wildfire-related factors with discrete data, the classes are simply formed by their natural property or their attribution to wildfire occurrence. The specific grading standards are summarized in Supplementary Tables S1, S2, S3.

IMPORTANCE OF WILDFIRE-RELATED FACTORS
Among the selected 14 wildfire-related factors, some of the factors may contribute little to the risk assessment and cause data redundancy, which increases the model complexity and decrease the accuracy. Thus, the factors are ranked by contribution importance based on the information gain ratio. By deleting the lowest factors of the rank singly, the optimal factor set is selected according to the accuracy results of the model.

Information Entropy and Information Gain Entropy
In 1948, the mathematician C. E. Shannon first proposed the concept of Information Entropy. The larger the information entropy is, the larger the uncertainty of the information source it represents. However, the size of information entropy often cannot reflect the importance of the information contained in the system.
In order to reflect the degree of characteristic information brought to the system, the Information Gain Entropy, which is the difference between the entropy of the set to be classified and the conditional entropy of a selected feature, was used.

Information Entropy
To calculate information entropy, the fire-spot samples, as well as the same amount of non-fire-spot samples are selected. And the information entropy of wildfires is calculated by formula (5): where p(y 1 ) and p(y 0 ) represent the occurrence and nonoccurrence probability of wildfires, respectively. And there is p(y i ) 1.

Information Gain Entropy
In a wildfire event, the information gain entropy represents the reduced uncertainty degree of the factor information, which is recorded as G(Y|X).
where H(Y|X) is the conditional information entropy of the wildfire event Y with the given factor X. p(x i ) represents the distribution probability under each factor. p(x i1 ) and p(x i0 )

Information Gain Ratio
When the number of factor samples data is large, the information gain entropy is also larger. In order to eliminate the influence of the number of samples, the information gain ratio is proposed (Xiong et al., 2014). It avoids the overfitting of factor data by offsetting the complexity of factor variables.
where G(Y|X) represents the information gain entropy corresponding to factor X. H X (Y) is the information entropy of mountain wildfire event Y about factor X.

Importance Rank of Wildfire-Related Factors
The calculated information gain entropy and information gain ratio of factors are listed in Table 1. Vegetation type, land-usage type, and historical fire-spot density are the three most important factors in affecting the occurrence risk of wildfire. It is because that the vegetation type and land-usage type can indicate the degree of fuel whereas the historical fire-spot density represents the high incidence of wildfires. The DR and the aspect are the factors with the least influence on the wildfire occurrence.

METHODOLOGY Weighted Naïve Bayes
Bayes theorem calculates the posterior probability of events by combining the prior probability and conditional probability, as follows (Qu et al., 2016;Zhao et al., 2013;Ma et al., 2013).
where P(X i ) is the prior probability of event X i , which is obtained according to the data distribution difference of fire-spot and non-fire-spot samples. P(Y|X i ) is the conditional probability of event Y occurring under the occurrence of event X i . P(Y) P(Y|X i ) · P(X i ) is the probability of event Y occurring under the condition with the occurrence of all events X 1 X 2 /X i . P(X i |Y) is the posterior probability that the given Y is caused by the event X i .
Due to the conditional independence, the NB model has advantages of higher operating efficiency, faster speed, and a simple structure compared to BNW. But absolute independence does not exist in reality. The WNB model assigns different weights to nodes to strengthen the connections between nodes. In this way, the assumption of the independence of NB can be weakened (Huang et al., 2015;Lee, 2015;Liu R. et al., 2017;Tang et al., 2018;Ji et al., 2019). The structures of BNW, NB, and WNB models are shown in Figure 3.
The posterior probability in WNB is defined as the weighted product of the conditional probabilities of factors, as shown in formula (10).
According to the wildfire risk assessment of the transmission corridor based on NB [16], the conditional probability of factors under fire or non-fire conditions is obtained. P x ij y j n ij + 100 4 k 1 n ij + 400 where x ij represent the factor x i at the j th level. n ij is the sample size of x ij . The distributions of P(x ij |y 0 ) and P(x ij |y 1 ) represent the conditional probability of a non-fire event and a fire event.
Finally, the weighted Bayesian posterior probability of wildfire occurrence P(Y) is determined.

Weight Calculation
The weights are often obtained by subjective or objective methods. With subjective methods, the value of weights is strongly affected by the knowledge and experience of surveyed experts, which may introduce larger errors to the model. Therefore, the entropy method is used to objectively evaluate and weight the factors in the WNB model. In the information entropy theory, a smaller information entropy indicates a larger variation of the factor value with more information. Based on the information entropy theory, the weight is determined by the information entropy of the factor.
Step 1 The data of factors is normalized by where x ij represents the value of the j th factor of the i th object.
x max and x min are the maximum and minimum values among the x ij .
Step 2 The entropy value e i of the i th factor is: e i −ln(n) −1 n j 1 P ij · ln P ij (14) where P ij represents the proportion of the j th evaluation object of the i th factor.
Step 3 According to the information entropy e i of factors, the initial weight f i is determined, and the entropy weight w i is obtained after normalization.
The entropy weight w i of the i th factor is calculated as:

Framework Conceptualization
The framework of the WNB-based wildfire risk assessment model is as shown in Figure 4. 1) Initialize the sample table, that is the proportion n of firespot samples and the proportion m of non-fire-spot samples, in which n + m 100%. And set the maximum sampling times, Mis = 100.
2) Extract the fire-spot samples and non-fire-spot samples, and the value of wildfire-related factors in 1 km × 1 km grids by using ArcGIS 10.4.
3) Establish a training set and a test set for the model. The training set is formed by randomly selecting 75% of the fire-spot and non-fire-spot samples. The test set is formed by the remaining 25% of the samples. Naïve Bayes conditional probability is estimated based on the training set. And the optimal threshold is determined by observing the Receiver Operating Characteristic (ROC) curve to establish the confusion matrix, as shown in Supplementary Table S4.
The accuracy P a , recall P r , precision P p and F β score of the test set under this sampling ratio are calculated, according to Eqs 18-21.
F β 1 + β 2 P r P p β 2 P p + P r Considering the higher tolerance of the fake wildfire events for power grids, β 3 is used.
4) Set the thresholds of P a , P r , P p and F β as 0.75 to find the Most Accurate Sampling Table (MAST) by using the forward sorting algorithm. Only the sampling tables whose performance meets the requirement of thresholds can be included in the sampling table database. The conditional probabilities of factors under the fire and non-fire conditions are then evaluated. 5) Calculate the factor weights by using the entropy method.

RESULTS AND DISCUSSION
According to the importance rank of information gain ratio, some factors contribute little to the wildfire risk assessment. Therefore, the factor with the lowest information gain ratio is deleted one by one to obtain the Optimal Factor Set (OFS). The optimal WNB model is then established by the OFS. At the same time, an NB model, as well as a BNW model with the same factors, are established for comparison.

The Optimal Factors Set and the Most Accurate Sampling Table
The entropy weights of wildfire-related factors during the formation of OFS are shown in Table 2.
The performance of the WNB models under different factor sets is compared by using the 825 new fire-spot samples and an equal number of non-fire-spot samples in 2020. With the optimal threshold, the performance measures of the model under different factor sets are shown in Figure 5.
As can be seen from Figure 5, the two factors of aspect and DR have been deleted, and the P a , P r , P p and F β scores of the model have improved to a certain extent. They all reach the optimal state. As more factors are reduced, the P r and F β scores of the model reduce stepwise. Therefore, the top 12 factors of the information gain ratio are selected to form the OFS, and the MAST is obtained through 100 iterations. By using the OFS and MAST, the final accuracy P a is 0.7566, recall P r is 0.7661, precision P p is 0.7728, and F β the score is 0.7667. The prior probability of fire and non-fire is 0.53 and 0.47, respectively. The conditional probability table is shown in Figure 6.
The conditional probabilities of factors under fire and non-fire conditions differ from each other, indicating different effects on the wildfire. Therefore, these 12 types of wildfire-related factors can be used to evaluate the wildfire risk of the research area.

Comparison of BWN, NB, and WNB
The BNW can solve the uncertainty of evaluating wildfires by combining the prior conditional probability and the relationship between factors (Jiang et al., 2016;Albuquerque et al., 2017;Bates et al., 2021). However, the network complexity may be timeconsuming and storage-consuming of the model. The NB, based on the assumption of conditional independence, can improve computational efficiency but sacrifices the predicting accuracy. By adding the different weights into factors, the WNB could compensate for the influence of different factors on the results and form a budget method with both plausibility and efficiency. In order to compare the assessment performance of the proposed WNB model, a BWN model and an NB model are also established based on OFS.

Wildfire Risk Assessment Model Based on BNW and NB
To build a BNW-based wildfire risk assessment model, the appropriate BNW structure should be established firstly  (Sevinc et al., 2020;Penman et al., 2020;Wu et al., 2018). The BNW structure consists of factor nodes, connecting lines, and arrows. The data of factor nodes are provided by the most accurate sampling table, which guarantees the optimal state of nodes. The connection line and arrow are determined by the causal correlation of two nodes. The relationship between wildfire-related factors is obtained by Pearson correlation analysis (Supplementary Table S5). When the absolute value of a Pearson coefficient is higher than 0.1, it is considered to have a certain relationship between two factors. For example, the Pearson coefficient between GDP and population density is 0.719, indicating a strong positive correlation between the two factors. According to the literature, population density affects the distribution of FIGURE 6 | The conditional probability table of fire and non-fire (Red is fire and gray is non-fire).
FIGURE 7 | The model framework based on BNW.
Frontiers in Energy Research | www.frontiersin.org February 2022 | Volume 10 | Article 829934 7 GDP, so the arrow points from population density to GDP. The connecting lines and arrows between the factors in BNW, as well as the prior probability between factor nodes, are obtained, shown in Figure 7.
The structure of the NB model is much simpler, in which all factor nodes only points to fire events node, as shown in Figure 8.

Receiver Operating Characteristic Curve
The Receiver Operating Characteristic (ROC) curve is drawn based on a series of binary classification results. In the ROC curve, the True Positive Rate (TPR) is the ordinate whereas the False Positive Rate (FPR) is the abscissa. The Area Under ROC Curve (AUC) can be used to represent the classification performance of the model. The performance of the classifier is better when the AUC is larger. The TPR and FPR are calculated as: In order to compare the performance of three Bayesian models, fire-spots from 2010 to 2019 and non-fire-spots are randomly extracted in equal proportions to calculate the probability of wildfire. According to the threshold of fire and non-fire, the ROC curves are constructed, as shown in Figure 9.
It can be clearly seen that the AUC of the NB model is the smallest, indicating the worst classification effect. The AUC of the BNW, NB, and WNB classifiers are 0.8446, 0.7973, and 0.8383 respectively. The points marked in curves are the Optimal Division Threshold (ODT). Under the ODT, the TPR of the WNB model and BNW model is nearly identical and is higher than that of the NB model. The FPRs of WNB, NB, and BNW are relatively similar.
According to the analysis of the ROC curves, both WNB and BNW models have a good performance in classifying the fire and non-fire events. But the intercoupling of factors, the establishment, and calculation processes of the BNW analysis is complicated and time-consuming, andthe classification performance of the model strongly depends on the reasonability of structure. The WBN strengthens the effects of important factors and weakens the influence of redundant relationships on model performance by assigning different weights to the factors. It not only considers the interrelationship of factors but also reduces the complexity of calculation. Therefore, the classification performance of the WNB model is similar to that of BNW but has a simple structure and shortened calculation speed.

Wildfire Risk Assessment
By using the established WNB, NB, and BNW models, the posterior probabilities of wildfire risk of the grids in the research area are estimated. And the statistics of probability results are shown in Supplementary Table S6.
The distribution of estimated probabilities by different models is very different in the research area, indicating that the grading method may have an important impact on wildfire risk assessment. Therefore, four different grading methods, that  is equal interval, quantile, natural breaks, and geometric interval, are compared. Among them, the equal interval method divides the probability distribution of wildfire risk into four equal sub-ranges. The quantile method allocates division intervals into an equal proportion. The natural breaks method, which was proposed by Jenks (Anchang et al., 2016, suggests that the distribution can be divided into groups with similar nature by the natural turning points or breakpoints between any series of populations. It collects the greatest similarities inside the groups, and used the greatest difference to separate the groups. The principle of the geometric interval method is to establish segmentation hyperplanes by maximizing the distance between the hyperplane and the nearest sample (Peng and Wang, 2009). By using the grading methods, the wildfire risk of grids is then classified into four levels, Low-risk, Medium-risk, High-risk, and Very-high-risk, based on the results of poster probability. For comparison, the fire spots in 2020 are used as the criterion to identify whether they fall into a region with high-risk or veryhigh-risk levels. The accuracy of the four grading methods is shown in Figure 10.
With the four grading methods, the accuracy of the BNW model differs remarkably. The accuracy reaches the highest (0.7636) by using the quantile method, whereas it is only 0.6642 using the grading methods of equal intervals and natural breaks. This means the predicting performance of the BNW-based wildfire risk assessment model is sensitive to the grading method. On the other hand, the accuracy results of the NB and WNB models are relatively stable with different grading methods. Specifically, the accuracy of the WNB model is all above 0.76 with either grading method, indicating that the WNB model is stable and adaptable.
Compared to the other three grading methods, the geometric interval method ensures an approximate same number of values in each grading range and a consistent variation between intervals. It is believed to be a compromise between the equal interval method, the quantile method, and the natural breaks method. Therefore, the geometric interval method is selected as the grading standard for visualizing the wildfire risks of the research area. By using the geometric interval method, the accuracy of the three Bayesian models is 0.7418, 0.7515, and 0.7636, respectively.

The Visualization of Wildfire Risk Assessment Models
Based on the grading results of the geometric interval method, the distribution of wildfire risk levels based on BNW, NB, and WNB models are visualized by using ArcGIS Software, as shown in Figure 11.
It can be found that no matter which Bayesian model is adopted, the overall risk distribution of the study area is roughly similar. This is mainly because the three kinds of Bayesian networks all use the same OFS and MAST and the sampling results objectively reflect the actual situation of the study area. The High-risk and Very-high-risk regions are generally distributed at the south and southeast of the study area. However, in some local areas, the assessment results of different models differ. For instance, in the result of the BNW model, the regions of a, b, and c have a larger area that is assessed to be the Very-high-risk. The Very-high-risk areas in NB and WNB models are much more scattered in the local regions, as shown in Figure 12. This may be caused by the mutual coupling of factors considered in the BNW model. Most factors have spatial continuity and surrounding relevance, so when the BNW model is used for assessment, the results would be smoothed. The NB model treats the factors independently, so the results are more scattered. Since WNB emphasizes some wildfire-related factors, the dispersion of the results lies between the NB model and the BNW model.
The proportions of grid areas and the fire spots from 2015 to 2021 under the four levels are summarized in Table 3. When using the geometric interval method for grading the poster probabilities of risk assessment, the low-risk and very highrisk occupy a larger proportion than the rest two risk level in  the BNW model. The total area of these two risk levels accounts for 60.36% of the whole region. The area proportions of the four risk levels are more evenly distributed in the results of the NB and WNB models. In WNB-based risk assessment, 86.28% of the firespots fall in the high-risk and very high-risk region, which is much higher than that of BNW (76.43%) and NB (82.78%).
Considering the area and accuracy at the same time, the assessment performance of the BNW model is the worst. In total, 48.57% of fire spots falls in the High-risk and Very-highrisk area, which accounts for 47.35% of the total research region. By defining the predicting efficiency as the ratio of accuracy and the area proportion of the High-risk and Very-high-risk area, the predicting efficiencies for BNW, NB, and WNB models are 1.61, 1.69, and 1.73, separately. The WNB model exhibits the highest efficiencies for wildfire risk assessment, which is very helpful for the monitoring, inspection, and prevention of wildfires by relevant departments.

CONCLUSION
1) The importance of 14 types of wildfire-related factors is ranked through the information gain ratio. The vegetation type, landusage type, and fire-spot density are the three most important factors that affect the occurrence of wildfires. The aspect and DR have few effects on the risk assessment of wildfire occurrence. 2) The deletion of less important factors and establishment of a Most Accurate Sampling Table can improve the assessment performance of the WNB model. The best accuracy of the WNB model is 0.7566 by deleting the aspect and DR factors. 3) Compared to the BNW and NB model, the WNB model has the best predicting efficiency for fire-spots assessment. By using the geometric interval grading method, a total of 86.44% of the fire-spots fall in the high-risk and very high-risk regions, and the predicting efficiency is 1.72.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Materials, further inquiries can be directed to the corresponding author.