Assessment of Landslide Hazard in Jiangxi Using Geo-information Technology

Landslides constitute a severe environmental problem in Jiangxi, China. This research was aimed at conducting landslide hazard assessment to provide technical support for disaster reduction and prevention action in the province. Fourteen geo-environmental factors, e.g., slope, elevation, road, river, fault, lithology, rainfall, and land cover types, were selected for this study. A test was made in two cases: (1) only based on the main linear features, e.g., main rivers and roads, and (2) with detailed complete linear features including all levels of roads and rivers. After buffering of the linear features, an information value (IV) analysis was applied to quantify the distribution of the observed landslides for each subset of the 14 factors. The results were inputted into the binary logistic regression model (LRM) for landslide risk modeling, taking the known landslide points as a training set (70% of the total 9,525 points). The calculated probability of a landslide was further classified into five grades with an interval of 0.2 for hazard mapping: very high (3.70%), high (4.05%), moderate (18.72%), low (27.17%), and stable zones (46.36%). The accuracy was evaluated by AUC [the area under the receiver operating characteristic (ROC) curve] vs. the validation set (30%, the remaining landslides). The final results show that with increasing the completeness of the linear features, the modeling reliability also significantly increased. We hence concluded that the tested methodology is capable of achieving the landslide hazard prediction at regional scale, and the results may provide technical support for geohazard reduction and prevention in the studied province.


INTRODUCTION
Landslides are a worldwide natural hazard, especially in Southern and Southeastern Asia including South China, and cause huge damages to human life and property, e.g., destroying houses, farmland, roads, and various infrastructures; killing livestock; and even amplifying existing disasters (Wu and Ai, 1995;Nadim et al., 2006;Assilzadeh et al., 2010;Froude and Petley, 2018).
It is of critical importance to conduct landslide risk prediction, zoning, and assessment to provide scientific advice and technical support for disaster prevention and early warning.
Actually, a large number of institutions and scientists have implemented projects or undertaken research to find solutions to the problem of landslide disasters, including landslide mechanism analysis, risk mapping, and assessment (Montgomery and Dietrich, 1994;Guzzetti et al., 1999;Aleotti and Chowdhury, 1999;Ayalew and Yamagishi, 2005;Ruff and Czurda, 2008;Fan et al., 2016;Arabameri et al., 2017;Zhang Y. et al., 2020) 1 . These authors have proposed different qualitative and quantitative assessment approaches by involving a set of indicators (Guzzetti et al., 1999;Corominas et al., 2014;Goetz et al., 2015;Furlani and Ninfo, 2015;Li et al., 2017;Zhu et al., 2019;Zhang Y. et al., 2020; see text footnote 1), and these studies laid a solid foundation for our landslide hazard assessment study in Jiangxi, which ranks number 2 in terms of geohazard occurrence frequency in China in 2019 .
Actually, landslides are the result of the interaction of multiple geo-environmental factors and human activity, including geological lithology, structure (e.g., fractural zones, faults, and joints), elevation, slope, aspect, river, regolith, soil, land cover, rainfall, roads, and housing development. Djukem et al. (2020) and Zhang Y. et al. (2020) have discussed and successfully applied these factors for landslide hazard assessment. Hence, these geo-environmental factors will be necessary and taken into account for achieving our purpose in this study.
Combining the knowledge of different disciplines can effectively improve the assessment accuracy of geohazards in practical applications (Kilburn and Pasuto, 2003). The assessment of landslide hazard refers to the prediction of the probability of its occurrence in a specific area by studying the combined effects of multiple geo-environmental factors (Tian et al., 2020;Zhang Y. et al., 2020; see text footnote 1). To achieve an assessment, two types of approaches, i.e., knowledge-driven and data-driven methods, are at present available. The knowledge-driven method relies on expert experience and knowledge. It is subject to certain subjectivity and uncertainty and suitable for areas with simple geo-environmental conditions or areas with limited data. This method is of limited assessment accuracy for areas with complex conditions and unknown landslide mechanisms. For the datadriven method, the landslide assessment factors are selected by quantitative analysis, and risk assessment is conducted by employing artificial intelligence approaches. Hence, theoretically and practically, the data-driven method seems to be more robust and reliable although more computing power is required (Zhang and Jiang, 2004;Zhang et al., 2017).
As a matter of fact, a number of scientists have made efforts on this research topic. For the time being, statistical analysis, especially machine learning (Wu et al., 2018), is the common approach for landslide hazard risk assessment, for example, the information value (IV) analysis (Gao et al., 2006;Chen et al., 2012;Sharma et al., 2015;Feng et al., 2016;Ren et al., 2018), the logistic regression model (LRM) (Carrara, 1983;Lee and Min, 2001;Ayalew and Yamagishi, 2005;Xing et al., 2004;Bai et al., 2010;Feng et al., 2016), artificial neural networks (ANNs) (Pradhan and Lee, 2007;Yilmaz, 2009;Lee et al., 2010;Feng et al., 2016;Kalantar et al., 2018), support vector machines (SVMs) (Yao et al., 2008;Peng et al., 2014;Kumar et al., 2017;Xia et al., 2018;Wang et al., 2019), and random forests (RFs) (Li et al., 2014;Kim et al., 2018;Dang et al., 2018;Zhang Y. et al., 2020; see text footnote 1). Recently, some authors have even attempted to employ a combination of LRM with IV analysis (Feng et al., 2016;Du et al., 2017;Fan et al., 2018;Zhang Q. et al., 2020) or LRM with certainty factor (CF) analysis (Yang et al., 2019;Zhang Q. et al., 2020) for achieving landslide risk assessment. Although these different techniques have been proven to be effective, there is no consensus on which technique and method are the best (Wang et al., 2005;Zhang, 2019). It can be seen from the above brief review that data-driven approaches, in particular, machine learning approaches, have great potential in geohazard risk prediction and assessment. Zhang (2019) and Zhao et al. (2019) noted that IV-based LRM is well capable of addressing the problem of binary variables (e.g., presence or absence of landslides) and has been applied to the assessment of landslide hazards.
In view of these, the objective of this study was to realize a landslide risk assessment by the combined approaches of IV and LRM in order to provide technical support for disaster reduction and prevention of the local authorities, taking Jiangxi, China, as an example. With more and more regional-scale studies on geohazard assessment being required to meet the need of Disaster Reduction and Prevention actions of governments, one may immediately think to use coarse-resolution data with major features of the geo-environmental factors for this purpose. Thus, one specific objective was to test the influence of data completeness and detailed extent on such regional-scale modeling and prediction and to check whether coarse-resolution data and major features alone are able to successfully achieve this task.

Study Area
Jiangxi is a province located in Southeast China, extending from 24 • 29 14 N to 30 • 04 41 N in latitude and from 113 • 34 36 E to 118 • 28 58 E in longitude, covering an area of 166,900 km 2 . Situated in the south of the middle reaches of the Yangtze River Watershed, the overall terrain of Jiangxi looks like a horseshoetype or dustpan-type basin. The Poyang Lake basin is situated between the Yangtze River in the north and a series of NEor NNE-striking mountain ranges such as Huaiyu and Baiji in the east, Wuyi in the southeast, Jiulian in the south, and Mufu, Jiuling, Wugong, Wangyang, and Zhuguang in the west. There are five main rivers, namely, Xinjiang and Raohe from the east, Fuhe and Ganjiang from the south, and Xiushui from the west, all flowing into the Poyang Lake and then joining the Yangtze River (Figure 1). Jiangxi belongs to the subtropical climate zone. Rainfall is abundant, and monsoon rain is predominant in spring and summer, in particular in June and July. The annual precipitation is more than 1,500 mm. The average annual temperature is about 16.3-19.5 • C, generally increasing from 16.3 to 17.5 • C in the north to 19.0-19.5 • C in the south. The northwest wind prevails in winter, and it is relatively cold. In summer, it is humid and hot with an average temperature of 24-29 • C (and the extreme maximum temperature is more than 40.0 • C because of the Pacific subtropical monsoon).
Geologically, as shown in Figure 1, Jiangxi crosses over two geotectonic units: Yangtze Plate in the north (I) and Cathaysia Massif (South China Plate) in the south (II 3 ) where the Qian-Hang Tectonic Belt (II 1 ) is the joint belt between the two plates (Yang, 2003). Though active faults are rarely observed nowadays, shaped by such geotectonic settings, the landform characterized by the mountains-basin pattern facilitates the occurrence of geohazards, especially landslides, in Jiangxi. Up to 2020, a total of 9,525 landslide taking place in the past decades were collected. The economic losses caused by geohazards are next to those by floods and droughts, and their casualties even exceed those of floods (Jiangxi Geological Disaster Emergency Center, 2014). Research on zoning of landslide susceptibility will hence help us understand the overall situation of landslide disasters and provide technical support for decision making in hazard prediction, prevention, and early warning in the province.

Landslide Inventory Data
Apart from the field survey by ourselves in July-October 2019 and August 2020, the majority of the landslide data in Jiangxi were obtained from the Environmental Science Data Center, Institute of Geographical Sciences and Natural Resources Research (IGSNRR) of the Chinese Academy of Sciences (CAS). A total of 9,525 landslide points were made available for this research. The spatial distribution of these landslides is shown in Figure 2.

Geo-Environmental Factors
Based on the field survey and general understanding of the landslide mechanism, the following geo-environmental Frontiers in Earth Science | www.frontiersin.org parameters were utilized for risk assessment, e.g., slope; aspect; elevation derived from the digital elevation model (DEM); rainfall including mean annual rainfall; accumulated monthly rainfall of March-July, March-June, and May-July; roads; rivers; faults; lithologies of strata; normalized difference vegetation index (NDVI); and land cover ( Table 1). These factors include both continuous and discrete data and are described as follows.

Slope
The slope is an important factor in the occurrence of landslides which take place only when the slope reaches a certain degree. The geometric characteristics of the slope determine the stress distribution and hence the stability of the slope (Lan et al., 2002). Human activity such as road construction reduces the slope resistance and exacerbates instability (Yu, 2003). Derived from the DEM product ASTGTM (V003, 30 m), slope ranges from 0 • to 75 • in Jiangxi and is presented in Figure 3A.

Aspect
The aspect is the normal direction of the slope surface projected on the horizontal plane. Jiangxi is situated to the north of the Tropic of Cancer. The southern slopes receive more solar radiation, leading to a higher temperature, bigger contrast in daynight temperature, and a stronger evapotranspiration than in the northern ones. Such difference in physiochemical conditions results in difference in vegetation development and weathering between the southern and northern slopes. Su (2006) noted that landslides occur more frequently in the southern slopes than in the northern ones. For this reason, the aspect information of Jiangxi was extracted from the DEM based on spatial analysis ( Figure 3B) and used for landslide hazard assessment.

Relief Degree of Land Surface (RDLS)
The RDLS is a parameter to recount surface morphology, one of the most important factors to determine the topographic conditions and to characterize the potential energy of surface erosion and material movement of the slope (Yin et al., 2010;Su et al., 2017). It is useful for quantitative analysis of landform and erosion degree of the regional surface (Guo et al., 2008). Based on the spatial analysis, the RDLS is calculated from the DEM with values ranging from 0 to 588 m ( Figure 3C).

Distance From the Linear Features: Roads, Rivers, and Faults
The study area is situated in the south of the Yangtze River, composed a series of hills and mountains (see section "Study Area"), leading to the development of the five important rivers and their tributaries and subtributaries. They have been modifying the landscape and breaking up the rocks and, at the same time, generating instability of slopes and landslides. Generally, the closer to the river, the higher the slope instability and risk of landslide. Faults are geological structures in which the rock blocks of the two sides are displaced against each other along the fractural surface, destroying the integrity of the rock formations. The development degree of joints of the geological bodies is often controlled by faulting. The occurrence of several geohazards is closely associated with faults, especially active faults (Huang and Li, 2009).
Road construction and other housing development engineering have led to slope cutting and destruction of the stability of slopes composed of rocks and soils. Hence, road networks and slope housing are a landslide indicator as well (Xu, 2005;Meten et al., 2015;Zhang Y. et al., 2020; see text footnote 1).
Linear features like roads ( Figure 3D), rivers ( Figure 3E), and faults ( Figure 3F) have the same proximity effect; that is, the closer the slope to the linear feature, the higher the risk of landslide it may have. Moreover, scale may also play a role as the larger the scale of the faults, roads, and rivers, the stronger their impacts on the stability of the slope.
With a specific purpose to test the impacts of completeness of roads and rivers, we set up two groups of these two linear features for modeling: (1) main roads (highways and railways) and big rivers and their major tributaries and (2) main roads, secondary roads (provincial and county levels) and countryside roads, and big rivers with their major tributaries, subtributaries, and streams.

Rainfall
Rainfall is a salient triggering factor for landslides as it constitutes slope runoff, leading to soil erosion and lubrication of potential sliding surfaces. Rainfall may increase the load of rock and soil and reduce the resistance from underlying rocks. Usually, after a continuous rainfall reaches a certain threshold, landslides take place (Guzzetti et al., 2008). The distribution of heavy rainfall affects the concentration of landslides (Shan et al., 2004). Because of the subtropical monsoon climate, Jiangxi receives abundant rainfall and frequent heavy rains, which strongly provoke the occurrence of landslides. The annual rainfall in Jiangxi ranges from 1,361.6 to 2,037.4 mm (Figure 2). As landslides occur mainly in spring and early summer from March to July, especially from June to July, the mean annual rainfall and the mean accumulated March-June, March-July, and May-July rainfall of the period 1981-2010 were produced for hazard analysis.

Lithology of Strata
Lithology plays a certain role in landslide events as it constitutes different resistances and degrees of propensity to this hazard (Yu, 2003). In addition, different lithologies may be weathered into different regoliths and soils. Lithological properties of the study area can be largely divided into weathering crust (including soil), sandstone, metamorphic rocks, conglomerate, shale, granodiorite, limestone, volcanic rocks, basic rocks, and granitic rocks (Figure 3G).

NDVI
The NDVI reflects the growth status and coverage of vegetation (Tucker, 1979). It is a widely used vegetation index for land cover characterization (Tucker, 1979;Huete et al., 1997;Walsh et al., 2001;Wu, 2014;Wu et al., 2016). The soil and water conservation effect of vegetation reduces surface runoff and soil erosion. At the same time, the biological weathering of vegetation also causes certain damage to rocks and soils (e.g., rock breakup process by plant rooting). Though it is not absolute, slopes with more abundant vegetation are more resistant to landslide than bare soils. Therefore, the occurrence of landslides is often related to vegetation coverage. The NDVI value was calculated from the November MODIS data from 2005 to 2010, and that below 0 (mostly water-bodies) was replaced by zero, and finally, its value ranges from 0 to 0.92 ( Figure 3H).

Land Cover
Different types of land cover have different vegetation properties and different effects on surface water and soil conservation, which are associated with the slope surface stability, enforcing to a certain extent impacts on the occurrence of landslides. The main land cover types are croplands, grasslands, forests, woodlands, wetlands, artificial lands, barelands, and water bodies ( Figure 3I).

IV Analysis
IV was used in the fields of geology and mineral prospecting in early times, converting the measured values reflecting various influencing factors into the IV. Later, some scholars transformed it into a bivariate statistical analysis, which can combine the subjective estimation of experts with objective data. The IV analysis is to analyze the possibility of landslides under similar conditions by counting the information of past landslides. Hence, Gao et al. (2006), Sharma et al. (2015), and Ren et al. (2018) considered that this IV represents regional stability.
Landslide information is calculated by the IV in each subset of the selected indicator. The IV of each factor is overlaid on each other to calculate the possibility of landslides in the study area. The greater the IV, the higher the possibility of a landslide or vice versa (Dai, 2013). The possibility of landslides can be evaluated by the amount of information or IV in the prediction process. The formula of the information model is shown as follows: where I is the IV, P is the probability of landslide hazard occurrence for each evaluated indicator, and P(y) denotes the probability of landslide occurrence in the normalized treatment area; x 1 , x 2 , x 3 ,... x n are the influencing factors of geohazards. Actually, the final form of Equation 1 can be further simplified and presented as follows: where N is the total number of landslides (points or sites) in the study area; N i is the number of landslides in each subset of the given factor; S is the total pixel number of assessment units in the study area; S i is the total pixel number of each subset of the given factor; and I(y, x 1 x 2 x 3 . . . x i ) is the IV of each factor contributing to the landslide hazard. IV can be both positive (favorable) and negative (unfavorable). Taking the geoenvironmental factor slope as an example, we consider that a slope of <3 • is stable and is selected for non-risk sampling, subsetting started with 3 • upward by an interval of 5 • up to >38 • , and IV was calculated using Equation 2 and is shown in Table 2.

LRM
LRM is a non-linear statistical model in which the variables can be either continuous or discrete. In the assessment of geological hazards, the data combined with continuous and discrete variables are to be comprehensively processed (Nandi and Shakoor, 2010;Zhao et al., 2019). LRM has been widely used in land cover change estimation (Mertens and Lambin, 2000;Serneels and Lambin, 2001;Wu, 2003) and disaster prediction (Nandi and Shakoor, 2010;Zhao et al., 2019) and is able to reveal the relationship between the dependent variable, i.e., change or disaster occurrence (with 1 indicating that an event occurred and 0 indicating that no event occurred), and multiple independent variables, i.e., spatial determinants or hazard factors.
When the probability of an event is P with a value range of (0, 1), the probability of the event not occurring is 1−P. If P is close to 0 or 1, it is difficult to capture its value, and thus, it is necessary to transform it into a logarithm function, i.e., ln(P) = ln(P/1 − P), which is called a logit transformation, in which where Z = α + β 1 x 1 + β 2 x 2 + · · · + β n x n (4) P = exp(α + β 1 x 1 + · · · + β n x n ) 1 + exp(α + β 1 x 1 + · · · + β n x n ) = 1 1 + e −(α+β 1 x 1 +···+β n x n ) (5) where P is the probability of an event occurrence, e the natural logarithm, α the intercept (a constant), and β i (i = 1, 2, 3, . . ., n) the regression coefficient corresponding to the independent variable x i (i = 1, 2, 3, . . ., n). In our case, since the IV was calculated based on the subsets of each geo-environmental factor, hence, for the ith factor (i = 1, 2, 3, . . ., k) and jth subset (j = 1, 2, . . ., n), Equation 5 can be further specified as follows: where P is the probability of landslide occurrence and β ij the regression coefficient of the variable x ij (i = 1, 2, . . ., n, factor number; j = 1, 2, . . ., k, the subset number of factor i), i.e., the IV of subset ij. It is seen that the LRM is actually coupled with the IV analysis. This modeling is able to solve the problem of determining the weight of assessment factors and integrating different types of factor data. This may also reduce the influence of subjectiveness of a single model. The specific operation of model coupling is to get the IV of each subset of the geo-environmental factor through the IV analysis and then to input them into the LRM as independent variables to establish the regression equation in which the regression coefficient of each assessment factor is to be calculated .

Modeling and Prediction of Landslide Hazard
For assessing the landslide hazards in Jiangxi, 14 geoenvironmental factors were selected. The 9,525 landslide disaster points collected were randomly divided into a training set (TS) and a validation set (VS) at a ratio of 7:3. Also nonlandslide points were chosen from the relatively flat areas (such as cultivated land and urban areas) with a slope of <3 • in terms of Miao et al. (2016) and Zhang Y. et al. (2020), and they were integrated into the TS and VS. The raster calculator within the spatial analysis tool of GIS was used to realize the superposition and calculation of IV for each geo-environmental factor.

Calculation of the IVs
The landslides in the study area are mostly small and expressed in the form of points. The attribute values of the geo-environmental factors corresponding to each landslide point were extracted. In combination with division of the subsets, the IV of different subsets of each factor was calculated using Equation 2 as mentioned above, and the results are presented in Table 2, taking a part of factors as an example.

Correlation Test of the Assessment Factors
To avoid the collinearity of the geo-environmental factors, a correlation analysis was performed. As shown in Table 3, the correlation among all the factors is less than 0.3, indicating that these factors and their division subsets are reasonable.

LRM
All IVs of subsets of each geo-environmental factor were outputted in DBF format and then converted into an Excel file. Taking the attributes of landslides and non-landslide points in the TS as dependent variables and all predictive factors as independent variables, the binary LRM was realized within SPSS 25, a software package for statistical analysis.
Modeling was conducted in two cases: one with only the major linear features, e.g., big rivers, roads, and faults, and the other with both major and minor scales of linear features including also subtributaries of big rivers and streams, small roads (county level and commune level), and faults. The modeling results are presented in Tables 4, 5.

Calculation of Landslide Risk
The calculated regression coefficients (β) were inputted into Equation 5 to get the LRM: +0.896x 6 + 0.126x 7 + 0.159x 8 −0.263x 9 + 0.672x 10 +0.465x 11 + 0.392x 12 + 0.742x 13 + 0.294x 14 − 0.041 (7) where x 1 is the IVs of land cover, x 2 of lithology, x 3 of road, x 4 of river, x 5 of NDVI, x 6 of the mean annual rainfall, x 7 of May-July rainfall, x 8 of March-July rainfall, x 9 of March-June rainfall, x 10 of fault, x 11 of slope, x 12 of elevation, x 13 of RDLS, and x 14 of aspect. P is the probability of landslide occurrence, with a value of 0-1. With the use of the raster calculator tool within GIS and Equation 5, the probability of landslide hazard in the study area was obtained.
Different approaches were used to analyze and compare the results of landslide hazard modeling. One was to check the rationality of the number distribution of the actual disaster points of each risk grades; the second was to assess the accuracy of hazard zoning through the receiver operating characteristic (ROC) curve, in which the latter is an effective method to assess the performance of classification algorithms. The area under the ROC curve (AUC) is the area between the ROC curve and the horizontal axis. The larger the AUC value, the better the prediction accuracy (Wang, 2013).
Frontiers in Earth Science | www.frontiersin.org

RESULTS
Based on the above analysis and modeling, the results obtained are presented in this section.

Landslide Hazard Models and Maps
Tables 4, 5 show the results of LRM for landslide hazard and the related coefficients of each geo-environmental factor. The β value represents the weight of each factor in a landslide event.
The significance of each factor is judged by comparing the value of wals or sig. The larger the value of wals or the smaller the value of sig, the higher the significance (Liang and Cui, 2010). Clearly, in comparison with Table 5, the LRM of Table 4 is something not logical as the roles of lithologies and RDLS are exaggerated and those of roads, rivers, and slopes are underestimated. There is more detail in the discussion. The probability-based hazard zoning map based on the results from Table 5 is shown in Figure 4, and from the statistics, it is known that stable, low, moderate, high, and very high hazardous areas take up respectively 76,282.21 km 2 (46.36%), 44,713.33 km 2 (27.17%), 30,802.22 km 2 (18.72%), 6,659.40 km 2 (4.05%), and 6,091.74 km 2 (3.70%).
From this risk map, we get to know that areas prone to landslide are those with a slope of 12-23 • , within the scope of 150 m from the rivers and 100 m from the road and with an RDLS of 60-140 m, where annual rainfall is greater than 1,700 mm. In addition, landslides occur more frequently in low-altitude areas of mountainous and hilly slopes, where human activities are relatively intense.

Reliability of the Risk Map
(1) Assessment of the risk maps from the LRM of Tables 4, 5 vs. the VS (field points not used for training) demonstrates a significant difference in prediction of the very high and high risk zones (Tables 6, 7). These two zones from the LRM of Table 4 (33.15%) are much larger than those from the LRM of Table 5 (7.75%).  Table 5).
The observed VS landslide points were projected into the different risk zones, and we found that a large proportion falls in the very high and high hazardous zones and that the stable zone has a little percentage. The corresponding ratio (R ei ) between the percentage of landslide points of the VS falling in each grade (G ei ) and the percentage of the area of each grade to the entire study area (S ai ) should have a clear increase (Tian et al., 2016) if the prediction is reliable. From the calculation results (Tables 6, 7A), R ei (I) < R ei (II) < R ei (III) < R ei (IV) < R ei (V) meets these requirements, but Table 7 from the LRM with complete linear features seems much better as very high and high risk zones are much narrower or, rather, more accurate than those from Table 6. (2) The ROC curve is an efficient approach to assess the performance of classification algorithms. According to a series of different dichotomous methods, the curve is drawn with sensitivity as ordinate and 1 -specificity as abscissa, reflecting the restrictive relationship between sensitivity and 1 -specificity (Tian et al., 2016). The AUC is a standard used to measure the quality of the classification model. The AUC value is the area between the ROC curve and the horizontal axis. The larger the AUC value, the better the prediction accuracy (Wang, 2013). Based on the GIS interface, the landslide hazard map was sampled corresponding to the points in the VS, and the ROC curve and AUC value of the model are shown in Figure 5.
The AUC from LRM of Table 5 is 0.863. The accuracy of the model is more than 86%, indicating that the IV-based LR modeling for landslide risk prediction and zoning allows us to achieve satisfactory results of high reliability, in particular with complete linear features.

Rationality of the IV-Based LRM Approach
The above research shows that the results of risk zoning are basically consistent with the distribution of the regional historical landslides. With a rather complete inventory of the landslide data, our approaches composed of IV analysis and LRM with complete linear features allowed us to achieve landslide risk prediction with high reliability with an accuracy of >86% against the VS. Actually, Du et al. (2017); Fan et al. (2018), Tian et al. (2020),  have conducted landslide susceptibility assessment at the local scale using similar approaches but with an accuracy of about 79-84%. We hence believe that the proposed methodology has improved the reasonableness for regional-scale studies, and it shall be extendable to other similar provincial and regional landslide risk assessments.
Actually, Tian et al. (2016) and Zhang (2019) have, respectively, employed CF-based LRM for landslide risk analysis in Guangdong and Shaanxi. Their results show that 57.99 and 60% of the field-observed landslides fall in the very high risk areas (22.15% of the total area, with an AUC of 0.782, and 10.03% of the total area with an AUC of 0.890, respectively). However, our analysis revealed that 74.33% of the landslides of the VS are located in the high and extremely high risk zones (7.75% of the total study area). This indicates that our analysis provides a more accurate prediction in locality of the potential hazards as we have used more geo-environmental factors and a better sampling scheme, e.g., division of the subsets, and used a higher-resolution DEM with more detailed factors.
Though rational weight assignment in terms of the propensity to landslide as applied by Chu (2012), Wang and Wang (2017), Zhang Y. et al. (2020), and see text footnote 1, appears plausible, a significant advantage of our approach lies in the fact that the combination of expert knowledge-driven and data-driven approaches avoids the subjective weight assignment to the geological strata and linear features after buffering. At the same time, the prior knowledge of experts obtained in the field is also considered important to achieve modeling and prediction with higher reasonableness.

Importance of the Predictive Variables
The six most important independent variables revealed by LRM with complete linear features are roads, annual rainfall, slope, faults, river, and NDVI. These variables are more or less similar to those obtained by Zhang Y. et al. (2020) and see text footnote 1. For both local and provincial landslide hazard predictions, roads, rainfall, and slope are always the most important factors.

Findings and Existing Difficulties
As previously expected, one finding is that given the same condition of other planar factors, the completeness of the linear features will be beneficial for a reliable prediction of landslide hazard. It is essential to use complete linear features, or rather, to use linear features that are as detailed as possible even for regional-and provincial-scale assessments. Coarseresolution and major feature-based modeling and prediction may lead to a strong bias and even failure. This would be helpful for implementing the disaster reduction and prevention measures of governments.
Another surprising finding is that the slope of the most probable landslide occurrence in Jiangxi is low, about 3-23 • , in which 83.3% of the total landslides have taken place (Table 2), much lower than the threshold of 28-35 • , proposed for natural landslides by Fan et al. (2016). Zhang Y. et al. (2020) and (see text footnote 1) have also discovered a similar result. This may be due to (1) the uniformization of the realistic relief by the moderate resolution of DEM, 30 m, and the DEM-derived slope that is lower than the real one and (2) human activity, especially the development of the road network and urbanization through slope cutting, which has led to slope failure and reduced landslide slope threshold.
It is worth noting that while conducting IV analysis, we noted that area with a slope of < 3 • is also distributed with 3.36% of the total landslides ( Table 2). This may result from the slope homogenization as mentioned above. Thence, for regional-and even national-scale landslide hazard assessments, coarser-resolution DEM (such as SRTM, 90 m) data are not recommended as they may hide most of the small-scale landslides (e.g., several tens to several hundreds of square meters in surface area).
Another problematic issue arising in our study is the utilization of the MODIS NDVI, with 250 m of resolution. This factor is not as important as DEM but clearly not ideal for representation of the greenness and coverage, in particular of forests and woodlands because of their heterogeneity. Nevertheless, for an often cloudy province, it is extremely difficult to obtain cloud-free November Landsat images of a 5-year period for such a large area. Hence, to use the MODIS NDVI was the only choice.
Comparative Verification Zhang Y. et al. (2020) and see text footnote 1 have taken advantage of the RF algorithm to assess the landslides in Guixi and Ruijin in Jiangxi, respectively. We compared our provincial/regional-scale IV-based LR modeling results with the local risk maps of Guixi and Ruijin and found a good agreement between the percentages of the observed landslides falling in the predicted high and very high risk zones. In Guixi, 81.69% of the total landslide points and 79.27% of the VS are distributed in the very high and high risk area , whereas see text footnote 1 illustrated that 92.67% of the total landslide points and 86.59% of the VS fall in these two zones in Ruijin. This means our risk modeling and prediction results for the Jiangxi Province are reliable and of practical value to provide technical support for disaster reduction and prevention in this province.

CONCLUSION
In this paper, IV-based LRM for regional-scale landslide hazard mapping was applied to a complex disaster development and occurrence environment, namely the Jiangxi Province. The reliable results may provide technical support for landslide hazard reduction and prevention action at local-and provincial-scales.
One may think that it is enough to use large scale and the main linear features for regional-scale landslide risk modeling. However, this study reveals that it is essential to employ factors of all scales or as detailed as possible to achieve a reliable and accurate prediction.
From both local and regional/provincial-scale studies, it is uncovered that slope instability is mainly caused by road construction and housing development through slope cutting and triggered by rainfall. Hence, it is particularly important for engineers to select sites of stable geological and environmental conditions for road system development and urban planning to minimize the landslide risk. This is the precondition for a holistic and optimal design of infrastructures and urban planning, which is necessary for regional-or provincial-scale socioeconomic development.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available upon reasonable request to the corresponding author (WW: wuwch@ecut.edu.cn), without undue reservation.