Contemporary Remotely Sensed Data Products Refine Invasive Plants Risk Mapping in Data Poor Regions

Invasive weeds are a serious problem worldwide, threatening biodiversity and damaging economies. Modeling potential distributions of invasive weeds can prioritize locations for monitoring and control efforts, increasing management efficiency. Forecasts of invasion risk at regional to continental scales are enabled by readily available downscaled climate surfaces together with an increasing number of digitized and georeferenced species occurrence records and species distribution modeling techniques. However, predictions at a finer scale and in landscapes with less topographic variation may require predictors that capture biotic processes and local abiotic conditions. Contemporary remote sensing (RS) data can enhance predictions by providing a range of spatial environmental data products at fine scale beyond climatic variables only. In this study, we used the Global Biodiversity Information Facility (GBIF) and empirical maximum entropy (MaxEnt) models to model the potential distributions of 14 invasive plant species across Southeast Asia (SEA), selected from regional and Vietnam’s lists of priority weeds. Spatial environmental variables used to map invasion risk included bioclimatic layers and recent representations of global land cover, vegetation productivity (GPP), and soil properties developed from Earth observation data. Results showed that combining climate and RS data reduced predicted areas of suitable habitat compared with models using climate or RS data only, with no loss in model accuracy. However, contributions of RS variables were relatively limited, in part due to uncertainties in the land cover data. We strongly encourage greater adoption of quantitative remotely sensed estimates of ecosystem structure and function for habitat suitability modeling. Through comprehensive maps of overall predicted area and diversity of invasive species, we found that among lifeforms (herb, shrub, and vine), shrub species have higher potential invasion risk in SEA. Native invasive species, which are often overlooked in weed risk assessment, may be as serious a problem as non-native invasive species. Awareness of invasive weeds and their environmental impacts is still nascent in SEA and information is scarce. Freely available global spatial datasets, not least those provided by Earth observation programs, and the results of studies such as this one provide critical information that enables strategic management of environmental threats such as invasive species.

Invasive weeds are a serious problem worldwide, threatening biodiversity and damaging economies. Modeling potential distributions of invasive weeds can prioritize locations for monitoring and control efforts, increasing management efficiency. Forecasts of invasion risk at regional to continental scales are enabled by readily available downscaled climate surfaces together with an increasing number of digitized and georeferenced species occurrence records and species distribution modeling techniques. However, predictions at a finer scale and in landscapes with less topographic variation may require predictors that capture biotic processes and local abiotic conditions. Contemporary remote sensing (RS) data can enhance predictions by providing a range of spatial environmental data products at fine scale beyond climatic variables only. In this study, we used the Global Biodiversity Information Facility (GBIF) and empirical maximum entropy (MaxEnt) models to model the potential distributions of 14 invasive plant species across Southeast Asia (SEA), selected from regional and Vietnam's lists of priority weeds. Spatial environmental variables used to map invasion risk included bioclimatic layers and recent representations of global land cover, vegetation productivity (GPP), and soil properties developed from Earth observation data. Results showed that combining climate and RS data reduced predicted areas of suitable habitat compared with models using climate or RS data only, with no loss in model accuracy. However, contributions of RS variables were relatively limited, in part due to uncertainties in the land cover data. We strongly encourage greater adoption of quantitative remotely sensed estimates of ecosystem structure and function for habitat suitability modeling. Through comprehensive maps of overall predicted area and diversity of invasive species, we found that among lifeforms (herb, shrub, and vine), shrub species have higher potential invasion risk in SEA. Native invasive species, which are often overlooked in weed risk assessment, may be as serious a problem as non-native invasive species. Awareness of invasive weeds and their environmental impacts is still nascent in SEA and information is scarce. Freely available global spatial datasets, not least those provided by Earth observation programs, and the results of studies such as this one provide critical information that enables strategic management of environmental threats such as invasive species.

INTRODUCTION
Invasive plants have emerged as a serious problem for global biodiversity. Their infestations can lead to the extinction (Groves et al., 2003) and endangerment (Wilcove et al., 1998;Pimentel et al., 2005) of native species and the alteration of ecosystem processes (Vitousek and Walker, 1989;Simberloff, 2000). Although invasive species that are introduced to a region receive the greatest attention, it is not necessary for a species to be non-native to be invasive. Invasive species are defined as those that are expanding their range (Valéry et al., 2008). Under global climate change and human disturbance, some native species have also become aggressive invasive weeds (Avril and Kelty, 1999;Wang et al., 2005;Hooftman et al., 2006;Valéry et al., 2009;Le et al., 2012). Given the large impacts that invasive species can have and the limited possibilities for eradication, early detection and prevention of the establishment of invasive species should be a priority in conservation policies (Genovesi, 2005). Identification of areas that are at potential invasion risk, to either non-native or native invasive species, can be an effective way to guide efficient management and prevent further incursion (Kulhanek et al., 2011).
Species distribution models (SDMs) are currently a popular method for predicting the geographic distribution of species (Peterson, 2006). They are developed statistically from the known occurrences of the species and characteristics of the environment to identify similar suitable habitats and, thereby, predict the geographic distribution in unknown regions (Guisan and Zimmermann, 2000;Peterson and Vieglais, 2001;Peterson, 2006;Pearson, 2010). Given these modest data requirements, they are especially useful in cases of poorly studied taxa (Kearney and Porter, 2009). Therefore, SDMs have become an important tool to investigations of invasibility that aim to predict the potential distributions of invasive species (Peterson, 2003;Thuiller, 2005). Since the early study of Peterson et al. (2003) in predicting the potential distribution of four invasive plants in North America, SDMs have been increasingly and widely applied all over the world to predict biological invasions (Guisan and Thuiller, 2005;Underwood et al., 2013), especially exotic plants (Zhu et al., 2007;Andrew and Ustin, 2009;Barik and Adhikari, 2011;Fernández et al., 2012;Rameshprabu and Swamy, 2015). In SDMs, the environmental variables used vary at different scales (Bradley et al., 2012). At regional to continental scales, forecasts of invasion risk are often mainly driven by climatic factors (Pearson and Dawson, 2003). Predictions at a finer scale and in landscapes with less topographic variation may require predictors that capture biotic processes (e.g., vegetation productivity) and local abiotic conditions (e.g., topography, soil type) (Pearson and Dawson, 2003). However, continuous spatial measurements of these finer-scaled environmental variables are difficult to acquire at large spatial extent (Bradley et al., 2012).
Contemporary remote sensing (RS) now provides widely available data products at multiple spatial and temporal resolutions that characterize a range of ecologically relevant patterns and processes (Andrew et al., 2014). These data can be used to measure habitat properties over a larger area than can easily be covered by field surveys (Estes et al., 2008) and augment the array of spatial environmental variables available to SDMs to characterize abiotic and biotic niche axes beyond simply climatic factors. Table 1 provides an overview of the remotely sensed information that has been incorporated into SDMs as environmental predictor variables, to date, giving an indication of the evenness of research efforts and the capabilities of RS that are still relatively under-utilized. The most commonly used variable extracted from RS data is topography/elevation (42% of 39 reviewed studies that have developed SDMs of plant species using RS predictors). Besides, other abiotic predictors have been developed such as remotely sensed estimates of climate and weather, including surface temperature from sensors such as MODIS and rainfall estimates from TRMM and, more recently, the Global Precipitation Measurement mission, although studies applying these predictors are limited ( Table 1). Soil properties, one of the most important factors for plant distributions and species invasion (Radosevich et al., 2007), is rarely studied (He et al., 2015), although several recent studies have explored the use of remotely sensed indicators of soil characteristics in SDMs ( Table 1).
In addition to abiotic properties of the environment, biotic characteristics also play an important role in shaping species' spatial patterns (Wisz et al., 2013). RS can estimate many properties of the vegetated environment, and applications of products such as land-cover data or vegetation proxies to SDMs are on the rise (Table 1). Land cover has been considered as the primary determinant of species occurrences at a finer spatial resolution than climate (Pearson et al., 2004). Various studies (20% of 39 reviewed studies; Table 1) have applied land cover products derived from a variety of sensors (especially MODIS and Landsat) to SDMs. However, most of the current land cover information is in categorical format, which can lead to the propagation of classification errors (Cord and Rödder, 2011;Tuanmu and Jetz, 2014) and may not effectively represent the classes most relevant to the species of interest. In contrast, remotely sensed estimates of continuously varying ecosystem properties related to land cover and novel continuous land cover products can be used in SDMs and may avoid these limitations. Recent studies have found better performance from continuous estimates of vegetation properties and land cover rather than categorical representations (Wilson et al., 2013;Cord et al., 2014b;Tuanmu and Jetz, 2014). A range of remotely sensed measures of vegetation has been explored in SDMs, such as vegetation indices (Normalized difference vegetation index (NDVI), Enhanced Vegetation Index), phenology, and canopy moisture in order to evaluate variation in habitat quality at fine scales and in climatically homogenous regions (Table 1). Of vegetation metrics, NDVI, a useful measure of vegetation properties, has been extensively used as a predictor in SDMs (25.6%; Table 1). It represents photosynthetic activity and biomass in plants and is indirectly related to net primary production (Bradley and Fleishman, 2008). However, a study of  noted that while NDVI had high correlation with MODIS GPP (Gross primary production) and NPP (Net primary production), it was a less effective surrogate of productivity in areas of either sparse or dense vegetation. They found GPP to be better able to predict biogeographic patterns of species richness , but we know of no studies that have used GPP in SDMs. Valueadded science products, such as the MODIS primary productivity products, may provide more meaningful depictions of vegetation processes and improved environmental predictor variables for spatial models of biodiversity .
In addition to the typical niche axes used to inform variable selection for SDMs of plant species, there is a large body of literature determining the ecosystem properties that influence invasibility of a system, and these can be used to guide applications of SDMs to evaluating invasion risk. Resource availability (e.g., light, CO 2, water, nutrients) often facilitates successful invasion. Invasibility is predicted to be greater in sites with more unused resources (Davis et al., 2000). By damaging the resident vegetation, disturbance reduces resource uptake and competition, increasing resource availability (Hobbs, 1989;D'Antonio, 1993). Therefore, invasions by invasive plant species are often associated with disturbance (e.g., Fox and Fox, 1986). However, distributions of invasive species are typically modeled using static environmental datasets that may poorly proxy these dynamic processes (Franklin, 2010b;Dormann et al., 2012). Temporal summaries of GPP may provide useful indicators. GPP estimates total ecosystem photosynthesis, the cumulative response of the vegetation to its environment, and may be used as a spatial proxy of resource ability. As well, the variability of GPP over time can reflect disturbance processes (Goetz et al., 2012). Hence, quantitative spatial measurements of GPP are expected to be relevant predictor variables for modeling invasibility. Also, including soil properties in SDMs may be useful as numerous studies have shown that soil properties, including nutrient availability, relate to invasibility (Huenneke et al., 1990;Burke, 1996;Harrison, 1999;Suding et al., 2004).
In this study, we hypothesize that the inclusion of recently developed global remotely sensed data products providing quantitative estimates of vegetation productivity and its dynamics, land cover, and soil properties, in addition to climatic layers, will enable a more complete representation of species' ecological niches by SDMs. To test the hypothesis, bioclimatic data and RS data were used in isolated and combined models predicting the distribution of selected invasive plants across Southeast Asia (SEA).
Southeast Asia is an important region to global biodiversity; it has four of the world's 25 biodiversity hotspots (Sodhi et al., 2004). However, much biodiversity is being lost (Peh, 2010) due to threatening processes such as habitat loss, degradation, climate change, and pollution (Pallewatta et al., 2003). In addition, and operating in synergy with these anthropogenic changes,
The goal of this study is to provide an overview of potential invasibility to 14 priority invasive plants in SEA. To generalize estimates of invasion risk across species traits that may require different management approaches, we divided studied species into different life forms (herb, vine, and shrub). Such groupings based on life-history attributes have been widely used to understand the invasion process and propose tailored management strategies (McIntyre et al., 1995;Bear et al., 2006;Garrard et al., 2009). In addition, species were grouped by their origin status (native and non-native invasive species). Through evaluating SDMs by life forms and origin status, and using different environmental predictor variable sets, our study addresses the following questions: (i) Which life forms of invasive plant species pose the greatest risk to SEA? (ii) Are native weeds as great of a potential threat as nonnative invasive species? (iii) Do remotely sensed environmental predictor variables improve predictions of invasion risk over models constructed with climate variables alone? (iv) Do the benefits of incorporating remotely sensed predictors in invasion risk models differ by species life form or by origin status?

MATERIALS AND METHODS
In order to evaluate the potential distributions of selected invasive plant species in SEA and to assess the contributions of remotely sensed environmental predictors to SDMs, we developed three model sets: models constructed along climate data only (CLIM), models with RS only (RS) and models with both climate and RS data (COMB). CLIM models used well-established bioclimatic datasets. The compiled RS predictor set covered a diverse range of surface parameters, namely topography, soil properties, global land cover, and vegetation productivity (GPP). Models used the Maximum Entropy (MaxEnt) algorithm. Model comparisons were based on the AUC score of model performance, average predicted areas, the level of spatial agreement in predicted distributions between model results, and the usage of RS and CLIM variables. The evaluation of invasion risk across life forms and origin status used predictions of suitable habitat area for individual species and predicted maps of invader richness. These datasets and methods are described in more detail below.

Study Species and Occurrence Data
In this study, we modeled the potential distributions of 14 invasive species (Table 2) identified from the lists of native and non-native invasive species known in SEA (Matthews and Brand, 2004) and Vietnam (Ministry of Natural Resources and Environment and Ministry of Agriculture and Rural development, 2013). Species occurrences were collected from the Global Biodiversity Information Facility 1 . Records were cleaned for obvious spatial errors (e.g., points that occurred in the ocean for terrestrial species) in ArcMap and duplicate records in the dataset were discarded (following Barik and Adhikari, 2011). All species modeled had more than ten occurrence records within the study area. The species occurrence records span lengthy collection periods. For each of the 14 species studied, the median years of the observations occurred in the period 1956-2005.

Climate Data
Bioclimatic variables were obtained from the WorldClim database (Version 1.4), interpolated from measurements recorded during the period 1960 to 1990 from ∼46,000 climate stations worldwide (Hijmans et al., 2005). Eleven temperature and eight precipitation metrics, at 1 km resolution, were used, including annual means, seasonality, and extreme or limiting climatic conditions (Table 3). This dataset has been widely used for studies of plant species distributions (Pearson et al., 2007;Hernandez et al., 2008;Cord and Rödder, 2011;Zhu et al., 2017).

Remote Sensing Data
A Digital Elevation Model (DEM) was derived from GTOPO30 2 at 30 arc second resolution (approximately 1 km) (USGS, 1996). Ten soil layers representing soil physical and chemical properties (Hengl et al., 2014) (Table 3) at 1 km resolution were extracted from ftp://ftp.soilgrids.org/data/archive/12.Apr.2014/. This dataset was empirically developed from global compilations of publicly available soil profile data (ca. 110,000 soil profiles) and a selection of ∼75 global environmental covariates representing soil forming factors (mainly MODIS images, climate surfaces, Global Lithological Map, Harmonized World Soil Database and elevation) (Hengl et al., 2014).
We also included the consensus land cover layers developed by Tuanmu and Jetz (2014). They provide a continuous estimate of the probability of the occurrence of each of 12 land cover classes in each pixel, calculated from the agreements between four global land cover products. These estimates have been shown to have a greater ability to predict species distributions than the original categorical land cover products (Tuanmu and Jetz, 2014). These land cover data have a 1 km spatial resolution and are available online at http://www.earthenv.org/landcover. They represent consensus conditions incorporating estimates from the  (Tuanmu and Jetz, 2014).
To quantify spatial and temporal variation in vegetation productivity, we used global annual MODIS17A3 (version 005) Gross primary productivity (GPP) data for 14 years (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) at 1 km resolution (Running et al., 2004). The Primary Production products are designed to provide an accurate regular measure of the yearly growth of the terrestrial vegetation (Heinsch et al., 2003). Data were downloaded from the Numerical Terradynamic Simulation Group (NTSG) at the University of Montana 3 . The mean and coefficient of variation of GPP (inter-annual variability) were calculated over the time series at each pixel and supplied to the SDMs.
All predictor variable layers were aligned to a common 1 km grid and projected in the Asia South Albers Equal Area Conic system using nearest neighbor resampling. Spatial environmental layers were pre-processed in the TerrSet software (Eastman, 2015).

Selection of Environmental Predictions
To minimize predictor multicollinearity and its impact on subsequent analyses, we evaluated the inter-correlations among the 44 variables for all terrestrial pixels and retained a subset of uncorrelated (|r| < 0.75) predictor variables for species distribution modeling. Including too much flexibility may make it difficult for the model to distinguish noise from the true species response in real data sets (Baldwin, 2009;Merow et al., 2013). Minimizing correlation among variables, therefore, is assumed to increase the performance of species modeling (Austin, 2002). In this way, we reduced the number of predictors used per species to 7 climatic (out of 19) and 14 RS (out of 24) variables. All soil estimates were highly correlated across the study area, so only one was retained. See Table 3 for the full list of initial variables, and those that were retained for modeling.

Modeling Habitat Suitability of Species
To model habitat suitability, we used MaxEnt (version 3.3.3), a general-purpose machine learning method (Phillips et al., 2006). Among species distribution modeling techniques, MaxEnt is one of the most popular algorithms due to its predictive accuracy and ease of use (Elith et al., 2006;Phillips and Dudík, 2008). There are some characteristics that make MaxEnt highly suitable to modeling species distributions such as use of presence-only species data, flexibility in the handling of environmental dataincluding both continuous and categorical variables, and an ability to fit complex responses to the environmental variables (Phillips et al., 2006). Notably, MaxEnt is less sensitive to sample size, which makes MaxEnt a preferred predictive model across all sample sizes (Wisz et al., 2008).
In this study, we developed SDMs based only on the less-correlated climate and/or remotely sensed predictors with MaxEnt. To reduce overfitting, the regularization multiplier was set at 4. This parameter determines how strongly increases in model complexity are penalized during model optimization; higher values produce simpler models that are less overfit to the training data. Radosavljevic and Anderson (2014) found that regularization multiplier values from 2.00 to 4.00 were generally appropriate to minimize overfitting. For all 14 species, we created 10 random data partitions with 70% of the point localities assigned for training and 30% for testing and ran the three scenarios (see below) with each of these replicate partitions. Random samples of 10,000 background points were also used to develop each model. MaxEnt model performance was evaluated using the area under the receiver operating characteristic curve (AUC) assessed on the withheld set of test points. AUC values range from 0 to 1. Values of 0.5 indicate that the model performs no better than expected by chance, while an AUC of 1 suggests perfect discriminatory abilities. Models with AUC > 0.7 are considered to achieve acceptable performance (Swets, 1988). Mean values, averaging across the 10 replicate runs and across species, of the resulting AUC distributions were used to compare the model scenarios run with different predictor sets. Continous MaxEnt outputs were converted to binary maps of habitat suitability using the tenth percentile training presence threshold (Escalante et al., 2013) in order to estimate the area of suitable habitat for each species predicted by each model. Variable usage by the models was determined with (1) a variable importance measure estimated as the decrease in model performance when a given variable was randomized, and (2) marginal variable response curves, which plot the predicted suitability for a species across the range of values for a given variable while all other variables are held at their mean values.
To test the contribution of RS data to modeling invasive species distributions, we ran MaxEnt with climate and satellite layers in separation and combination. Three scenarios were evaluated: MaxEnt runs with (1) climate data only (CLIM), these include the three temperature and four precipitation layers from the final reduced subset; (2) remote sensing data only (RS), with two GPP layers, one soil layer (pH) and eleven land cover classes from the reduced subset; and (3) climate and RS data combined (COMB) using all 21 layers of the reduced subset (see Table 3). The evaluation was based on (i) the AUC score; (ii) average predicted areas; (iii) % agreement in predicted distributions between model results; and (iv) differences in variable importance for the RS and CLIM variables. These comparisons were performed for all species overall, and when grouping by life forms and origin status. For the assessment of invasion risk, binary maps of habitat suitability for each species from the COMB model runs were used to determine the predicted habitat area and combined into maps of invader richness to compare the relative level of invasion risk among plant life forms and native/non-native invasive species.

Model Performance
Overall, species distributions were generally predicted successfully. All species were successfully modeled (AUC > 0.7) by at least one predictor set ( Table 4). Species with few occurrence records (less than 20), such as Bauhinia touranensis, Mimosa pigra, and Merremia boisiana, tended to be less successfully modeled in some of the model scenarios (AUC < 0.7). The remaining species with greater data availability achieved "good" (AUC > 0.8) to "excellent" (AUC > 0.9) performance (Table 4), according to the classification of Swets (1988).
Across all species, the performance of the CLIM and COMB models was roughly equivalent (test AUC = 0.84 ± 0.08). Thus, along this metric alone, CLIM models may be preferable, as they are more parsimonious. On average, the RS models  were the least successful (test AUC = 0.75 ± 0.12) ( Table 4). However, the rankings differed somewhat for individual species and between species categories. CLIM models were preferred for 8 species, RS for 2, and COMB for the remaining 4 ( Table 4). RS models were found to perform worst in predicting vine species (Figure 1) and native invasive species (Figure 1). COMB models generally predicted smaller areas of suitable habitat than either CLIM or RS models. This pattern was consistent across life forms and origin status, but strongest for herbs, shrubs, and non-native invasive species (Figure 2). CLIM and RS models tended to predict similar areas of suitable habitat, except for the case of vines and native invasive species. The RS models for these groups predicted larger areas of suitable habitat than did CLIM models (Figure 2).
In general, spatial agreement in predicted habitat was greatest for pairwise comparisons with the COMB models (Figure 3).
As an exception to this pattern, the agreement between COMB and RS was as low as between CLIM and RS for vines and native invasive species. At the individual species level (Supporting Information S2), COMB tended to be most similar to the individual model set (CLIM or RS) that performed better in the AUC evaluations (Table 4) -typically CLIM.
The average relative variable importance varied considerably among the predictors within the variable sets. In the CLIM set, mean diurnal temperature range (importance = 32.5% ± 22.0 and precipitation of warmest quarter (importance = 23.8% ± 17.4) were most important ( Table 5). On average, other temperature variables (isothermality and annual mean temperature) have an importance around 12-13% and other variables contributed less than 10%. Of the variables in the RS predictor set, herbaceous vegetation land cover (importance = 16.7% ± 8.8) was the most important. Evergreen broadleaf tree, cultivated vegetation and GPP_CV were also important variables, with permutation importance ranging from 10 to 12% on average. In the COMB predictor set, the contribution of variables was similar to the CLIM and RS scenarios ( Table 5). All variables had reduced importance in COMB than in either CLIM or RS, due to the inclusion of a larger number of variables in these models, but the rankings of variables within each predictor were generally consistent.

Habitat Suitability
To assess the habitat suitability of species, we used results from COMB models. Response curves of each species (response curves are provided in Figure 4 for a selected species of each life form that was best modeled by the COMB variable set, and for all species in Supporting Information S1) in COMB models reveal that, across species, sites were generally predicted to have high suitability (>0.6) in areas with low mean diurnal temperature range and moderate to high isothermality. The highest suitability (0.9-1) was also generally found in areas with high precipitation in the warmest season. Many modeled species (Chromolaena odorata, Cenchrus echinatus, Eichhornia crassipes, Lantana camara, Mimosa diplotricha) were not predicted to invade closed areas such as forests (negative responses to high canopy land-cover classes), although the aggressive vine Pueraria montana is a notable exception. In addition, for species models with important contributions from the productivity variables, suitability was generally found to be highest in environments with high GPP and low variability of GPP (Supporting Information S1).
Herb species receive the greatest area predicted to be at risk of invasion by one or more species (5.3 million km 2 , versus 4.9 million km 2 and 4.3 million km 2 for shrubs and vines, respectively), however, the area vulnerable to the greatest invader richness is fairly concentrated around the north and north center of Vietnam (Figure 5). Response curves of herb species (Ageratum conyzoides, Cenchrus echinatus, Microstegium ciliatum, and Parthenium hysterophorus) indicate they prefer high rainfall in the warmest quarter (more than >1500 mm), however, this variable was generally less important for herbs than it was for other life forms (Supporting Information S1). Additionally, herb species prefer habitat with diurnal temperature ranges less than 10 • C and isothermality from 20 to 70%. Of the land cover variables, invasibility to herbs was more strongly related to the evergreen broadleaf and mixed forest classes, and to the cultivated class than were the other life forms. Response curves indicated that relationships with these cover classes were generally negative (Supporting Information S1).
Shrub species were predicted to have the greatest area at risk from multiple invaders: 1.3 million km 2 were predicted  to be suitable for four or more shrub species, as opposed to only 0.6 million km 2 for herbs and 86 thousand km 2 for vines (although note that only four vine species were modeled). Unlike the other life forms, regions suitable for multiple shrub invaders extended into countries in the south of the region such as Indonesia, Malaysia, and Philippines, as well as west to Bangladesh (Figure 5). Diurnal temperature range and precipitation of the warmest quarter were the most important factors for the distribution of these shrub species (e.g., Chromolaena odorata, Lantana camara, Leucaena leucocephala). Overall, models were more influenced by RS variables, especially land cover, for shrub species than for the other life forms. Shrubs exhibited generally negative associations with forested habitat (for all classes except the mixed forests) as well as with herbaceous land cover (Supporting Information S1).
In contrast to the other groups, large areas were predicted to be invasible to a single vine species. Areas vulnerable to greater richness of invasive vines were much more restricted, tending to occur in north and north-central Vietnam and Taiwan (Figure 5). While Mikania micrantha and Pueraria montana have less predicted area in SEA, Bauhinia touranensis and Merremia boisiana were predicted to invade much of the region (Supporting Information S2), especially in south China and north Vietnam. Unlike herbs and shrubs, distributions of vine species were generally unrelated to land cover (except for moderate influences of herbaceous land cover). Vine species received greater importance of climate factors, especially variables related to precipitation, than did the other life forms (Supporting Information S1).
Results of average predicted area at the species level showed that as large areas are vulnerable to invasion by native as non-native invasive species (ca. 2 million km 2 ) over the whole region (Figure 2). Cumulative levels of invasion risk are difficult to compare, since over twice as many non-native than native species were modeled, but substantial areas are at risk of invasion by one or more species of each origin status (6 million km 2 and 4.3 million km 2 , for non-native and native invasive species, respectively). Native invasive species richness was mainly concentrated in the north and north center of Vietnam; nonnative species had wider range of distribution and may potentially invade the whole region (Figure 6).
Comparing the total area predicted by the COMB models to be susceptible to the invasion of the 14 invasive species suggests which of the modeled species may be the greatest threats to the region. Ageratum conyzoides, Eichhornia crassipes, Leucaeana leucocephala and Microstegium ciliatum had the highest predicted area. Lantana camara and Mimosa diplotricha followed next. Parthenium hysterophorus had the lowest predicted area (Supporting Information S2).

Model Performance
Quantitative comparisons of models with various predictor sets showed that models built with incorporation of RS and climatic data layers substantially reduced predicted areas across all life forms and origin status compared to models with climate and RS data alone (Figure 2). The mapped predictions for individual species reflect this pattern spatially (Supporting Information S2). Suitable habitat modeled with climate variables alone are quite smooth and generalized, while the inclusion of remotely sensed predictor variables adds more nuanced spatial detail to this overall pattern. The most widely used bioclimatic predictors, including those evaluated in this study, are derived from station data; interpolation introduces smoothing, producing generalized portrayals of environmental variability. As well, climate generally varies continuously over broad spatial scales. Thus, exclusively climate-based distribution models are unable to capture variations of species diversity at the landscape level . As a consequence, large areas of predicted suitability are often seen (Thuiller et al., 2004). In contrast, while the biotic niche axes estimated by RS can further inform distribution models and enable dynamic models, they are unable to replace climatic factors in identifying suitable habitat as bioclimatic conditions are still essential driving factors for species distributions (Thuiller et al., 2004;Cord and Rödder, 2011). The high percentage agreement of spatial predictions between models based on climatic predictors only and climatic and RS predictors found in this study, as well as the high variable importance scores given to climatic predictors in the combined models, also supports the indispensability of climate in shaping the distribution of invasive plant species. Similar studies have also found that using either climatic-derived or RS-derived predictors alone often leads to the overprediction of species distributions Saatchi et al., 2008;Cord and Rödder, 2011;Cord et al., 2014a). By incorporating complementary limiting environmental conditions, combined models of climatic and remotely sensed predictor variables reduce predicted areas, thereby refining modeled species distributions. Although clearly refining the spatial patterns of predicted species distributions, in general, COMB models did not achieve higher accuracy than models with climate variables alone; RS models were often relatively poor. These results are in line with other studies (Zimmermann et al., 2007;Cord and Rödder, 2011;Cord et al., 2014a) that found that models based on RS data had the lowest AUC, compared to models with climate-derived predictors and climatic and RS predictors. Some explanations can be proposed for this. First, there may be temporal mismatch between occurrence data and environmental data. This is likely to be a more severe problem for remotely sensed predictors, which generally capture snapshots in time, rather than climatological averages, and which often describe environmental conditions, such as vegetation patterns, that vary over shorter time frames than does climate. Many of the occurrence records within museum or herbarium collections, comprising GBIF, are older; the land cover and vegetation productivity present at those sites at the time of the species' presence may not be represented by remotely sensed current conditions. To test for this problem, we repeated our models with recent records only (collected after 1992). Removing older species records reduced model performance overall, likely due to the much smaller samples available to train the models. Remotely sensed predictors received slightly higher importance values in the COMB models than previously, but were still secondary to climatic variables (Supporting Information S3). Although temporal correspondence among species occurrences and environmental variables is a concern and should be considered in further studies, it does not seem to contribute to our conclusions.
Alternatively, the quality and information content of the RS products may influence model performance. The consensus land cover product was used in this study because it was expected to be more reliable than traditional global land cover datasets. Additionally, its continuous estimates of the probability of class presence may avoid errors associated with categorical data and provide some level of subpixel land cover information. However, it still has limitations related to the input datasets. Global land cover products are constrained to a relatively simple legend, with broad classes. The consensus product is further constrained to FIGURE 4 | Marginal response curves of Ageratum conyzoides (a non-native herb best modeled by COMB), Leucaena leucocephala (a non-native shrub best modeled by COMB) and Mikania micrantha (a non-native vine best modeled by COMB) for variables with importance >5% for each species in COMB models. The orange curve in each plot is average response curve and the blue is standard deviation across all 10 partition runs. See other species in Supporting Information S1. a simplified legend that harmonizes each of the input products. The generality of these classes may not capture regionally relevant differences and limit their usefulness to SDMs. The consensus land cover product is also limited by quality of the individual products it integrates (Tuanmu and Jetz, 2014). In land cover products, classification errors are not evenly distributed across space and classes (Strahler et al., 2006). For instance, lower accuracy for land cover classes of GlobCover products was found  in some areas with limited data coverage (e.g., some areas in Amazonia) or in rugged terrain such as Laos (Bicheron et al., 2008). Also, cloud cover reduces the quality of the RS data, especially in tropical regions (Bradley and Fleishman, 2008). Classification errors do seem to be contributing to the performance of RS variables in our study. Unexpectedly, species associations with land cover classes, when they were found to be important to models, were overwhelmingly negative. There is no ecological or logical reason for this. Instead, because the consensus land cover product estimates the certainty that a class is present, given the individual land cover datasets, this suggests that habitat suitability tends to be greatest for the modeled species in areas with high land cover uncertainty. Such uncertainty may be due to inadequacies in the class definitions in this region, fine-scaled mosaics of land cover classes within a 1 km pixel, or simply poor classification performance. Indeed, using the maximum estimated probability of class membership as an indicator of certainty supports this interpretation. Large areas of SEA, including many of the same locations with high-predicted invasibility, exhibit low certainty of the land cover information (Figure 7). Further work is necessary to validate the consensus land cover products in SEA and, especially, to determine the meaning of areas with great class uncertainty. This is troubling and argues against the use of global land cover products in SDMs. Quantitative remotely sensed estimates of ecosystem structure and function may overcome some of the problems of categorical datasets, and we strongly advocate for their expanded use and continued evaluation in SDM contexts. Interestingly, the quantitative measures of vegetation productivity used in this study, while making important contributions to the RS model set, generally dropped out of the COMB models. This may be because of interdependencies between climate variables and the photosynthetic efficiency term used in the MODIS GPP product, which relies on both temperature and moisture (Running et al., 2004), and thus would not be detected by the simple univariate correlation analysis used to screen input variables. Another limitation to model performance in this study is the sample size of the species occurrence records. Performance of SDMs in the study varied among species. Species with few occurrence records occurring in a wide range of habitats, such as Mimosa pigra, have lower performance than others. This is because SDMs perform better with larger sample sizes and for species occupying a narrow environmental niche than for generalist species (Hernandez et al., 2006). Although Mimosa pigra has been recorded as one of the most invasive plants in many countries in SEA (Thi, 2000;MacKinnon, 2002;Vanna and Nang, 2005;Nghiem et al., 2013), the number of occurrence records of this species in SEA is still limited. This reflects lack of research and awareness of the public and government for invasive species detection in the region, which should be more encouraged. Also, using hyperspectral RS to detect invasive species occurrences (Andrew and Ustin, 2008;Hestir et al., 2008) can be a solution for developing high-quality, unbiased occurrence data inputs (He et al., 2015), and also may reduce temporal mismatch between species occurrences and environmental variables. In addition to model development, sample size influences model evaluation. Performance measures such as the AUC provide a single spatial summary value. AUC has been criticized for its inability to convey information about the spatial pattern of predictions or uncertainty (Franklin, 2010a). Yet spatial variation can be considerable. Because AUC is often calculated from a tiny proportion of the pixels modeled, wildly different spatial predictions can receive similar, and indeed very high, AUC estimates (Synes and Osborne, 2011). For this reason, we prefer to present a suite of evaluation tools, including total predicted area and estimates of spatial agreement, in addition to the AUC.

Habitat Suitability
Both non-native and native invasive species were predicted to occur across large areas of SEA, and thus may pose similar risk to the region. Among life forms, shrub species potentially pose greater risk because of the predictions of high shrub invader richness over large areas, based on the set of species assessed. Most countries in the region have suitable habitat for these species. In general, shrubs exhibited weaker environmental associations than the other life forms (as seen in the lower variable importance scores), suggesting they may be tolerant of a broader range of conditions. Relative to shrub and herb species, vine species' distributions were most strongly driven by climatic factors. This may facilitate their spread under climate change. Invasive species may disproportionately benefit from global climate change (Dukes and Mooney, 1999), and vines may be a good example of these concerns. Climate projections for the region include increases in annual temperature and in summertime precipitation (Christensen et al., 2007), the latter variable was important to nearly all vine species distributions, all of which showed positive associations. Without strong controls by biotic factors such as land cover, vines may invade valuable evergreen broadleaf trees forests in SEA. A native vine, Merremia boisiana is an example. In the past decade, the vine has spread dramatically over South China (Wang et al., 2005;Wu et al., 2007) and the north and center of Vietnam (Le et al., 2012) and our results reveal that more than 1.6 million km 2 are invasible to this species, largely concentrated in China and Vietnam. These findings suggest that awareness of invasive species and prevention and eradication efforts should not overlook the life form or origin status of the species of concern.
Interestingly, in contrast to our expectations, we found that for some species (Microstegium ciliatum and Mimosa diplotricha) suitability was negatively related to the variability of GPP (GPP_CV), which was used to proxy disturbance processes. This suggests that invasion is possible even with low disturbance, contradicting knowledge summarized by Lozon and MacIsaac (1997) that the establishment and spread of invasive plants are associated with disturbance. Although disturbance is certainly a factor in many invasions, an over-generalization that invasion requires disturbance can lead to low awareness of invasion in intact areas. Further field-based studies about invasibility of these species under difference disturbance levels should be conducted. The effectiveness of GPP variability as an indicator of diverse disturbance processes and diverse ecosystems should also be evaluated. The relatively short duration of the satellite archive from which it was computed is certainly a limitation.
Given that many of the study species were identified from Vietnam's invasive weed list, it is not surprising that we found, within the region, north and north central Vietnam were most susceptible to the invasion of weeds (Figures 5, 6). However, it is worth emphasizing that many of the invasive weeds predicted in this region also have high invasibility in China, where outbreaks have been recorded (Yan et al., 2001). Biological invasions are a trans-border issue. Similarly, provinces (Guangxi, Quangdong, and Yunnan) sharing borders with Vietnam, Lao, and Myanmar are listed as areas with a high number of invasive species in China (Xu et al., 2012). Effective management requires that invasions be considered in the context of the region (SEA), rather than a country (Paini et al., 2010). Studies such as ours can help the Vietnamese and other governments to prioritize management actions for invasive species within the country and also to inform biosecurity policy across borders.

CONCLUSION
This study demonstrated that although the environmental attributes derived from RS data did not strongly improve the accuracy of SDM predictions, they did provide more landscapelevel detail that refined species distribution predictions in space. Therefore, the inclusion of remotely sensed variables in SDMs likely is worthwhile. Furthermore, our results highlight shortcomings of land cover products, which are widely used in SDMs. There are widespread uncertainties in global land cover products and, disconcertingly, those sites with the greatest uncertainty also seem to be consistently ecologically important to the modeled species. We caution against continued use of land cover information in SDMs, which may propagate errors and confound interpretation. Greater adoption of quantitative remotely sensed datasets estimating ecosystem structure and function may mitigate the weaknesses and limited utility of RS observed in this study. From the standpoint of biodiversity management, our findings have implications in targeting management to susceptible areas, providing initial data for invasive species risk assessments, and proposing biosecurity policy in the region.

AUTHOR CONTRIBUTIONS
TT prepared input data, performed models and interpreted results, wrote manuscript and acted as corresponding author. MA supervised development of work, provided guidance throughout the project, and edited manuscript. GH contributed to editing manuscript.

FUNDING
TT was supported by Australia Awards Scholarship for her Ph.D. studies.