Evidence-Driven Approach for Assessing Social Vulnerability and Equality During Extreme Climatic Events

Climate change adaptation policy requires assessing a community's vulnerability based on its socio-economic characteristics. A predominant approach to vulnerability assessment is indicator-based, wherein variables are aggregated to assess the vulnerability of units in a system (e.g., neighborhoods in a city). Here we show that a particular evidence-based predictive statistics approach can address two shortcomings of the most commonly-used indicator-based approach: lack of a means of validation and problematic weighting of individual indicators. We demonstrate how robust evidence-based models can produce frameworks that overcome these limitations. Using the case study of Hurricane Sandy in the State of New Jersey, we conducted two-pronged validated vulnerability assessments, based on insurance claim payouts and assistance grants. The latter needs-based assessment shows that “Minorities” are substantially more vulnerable than others based on a significant negative association with assistance approval rate (approved claims divided by all claims). Our findings highlight issues discussed in the literature within the context of climate justice and equity. Such an approach is helpful locally, but also when adaptation plans are developed over broad scales of time and space considering disparities between regions or across multiple jurisdictions.

environmental justice issues related to climate change; both need to be addressed for efficient adaptation planning (Kim et al., 2018;van den Berg and Keenan, 2019). Despite its urgency, the practicalities of assessing the vulnerability of human communities and populations, which lays the foundation for designing adaptation plans and for allocating the resources necessary to make plans a reality, need further investigation.
A commonly used vulnerability assessment approach uses indicators (hereafter "indicator-based assessment") (Zhang et al., 2018). This approach has been widely implemented by governments as part of adaptation policy efforts (e.g., Rowan et al., 2013;Boston, 2016). In the social vulnerability domain, using socio-economic variables (social indicators), such as age, race, and income, the predominant indicatorbased vulnerability assessment should enable the prediction of the susceptibility of communities to the negative effects of climate-driven events, whether they be physical, financial, or psychological (Benevolenza and DeRigne, 2019). However, such assessments are based largely on theoretical assumptions of what is perceived to reflect vulnerability and therefore are less accurate than they would be if they were based on empirical findings.
Evidence-based approaches to vulnerability assessment could supplement and improve the standard "indicator-based" approach and thus lead to better allocation of resources for adaptation. Here we show that an evidence-based predictive statistics approach, often claimed infeasible in the past (Hinkel, 2011), can indeed be used as a possible solution for two shortcomings of the most commonly-used indicator-based approach: (1) the lack of a means of validation, and (2) problematic weighting (Tonmoy et al., 2014;Nguyen et al., 2016).
Due to the availability of large quantities of socio-economic data to choose from as indicators (e.g., as a result of increased data from national census programs), it is a common practice to remove correlating indicators, or to perform some type of dimension reduction statistical technique. The most common of these methods is the Principal Component Analysis (PCA). As a case in point, one of the most influential works in the field of indicator-based vulnerability assessments introduced the PCAbased social vulnerability index over a decade and a half ago and trademarked SoVi (R)-the Social Vulnerability Index.
Methods, such as PCA, minimize redundancy, and produce a lower and more manageable number of indicators (alias, "components" in PCA) for the assessment. While the PCA approach is sometimes mistaken to be a predictive datadriven approach, in practice it only analyzes variability in the explanatory dataset while offering no evaluation of its predictive power. Like other works over the years, the original introduction of the social vulnerability index approach explicitly acknowledged the problematic nature of indicator equal-weights practices back in 2003 (Cutter et al., 2003). Now, with new types of information and with the relative abundance and accessibility of big data, previously unforeseen research opportunities have become available and can be used to remedy this situation.
We propose validation of common theoretical assumptions by utilizing harm indicators, i.e., harm experienced by subjects during a climate-related event (e.g., heat waves or hurricanes), in robust predictive statistical models. PCA and other dimension reduction techniques are an important first step in analyses that utilize an initially large number of variables (especially correlating variables), however, these unsupervised approaches (i.e., for which there is no outcome/dependent variable) only analyze the explanatory data (social indicators in this case). They do not analyze how these social indicators come into play in real events which can themselves be analyzed by using a supervised predictive approach, i.e., ones that use an outcome variable.
Predictive supervised statistical models (as opposed to, for example, the unsupervised PCA) tell us whether certain vulnerability indicators are appropriate for predicting de facto vulnerability, always measured based on harm indicators. Furthermore, the results of such predictive models denote the relative importance of each vulnerability indicator and thus can help in (a) deciding on the final set of indicators to include in the assessment (e.g., that are the social indicators shown to be significant in predicting the outcome), and (b) assigning different weights to each indicator in that set.
Such ideas have been addressed in the literature in the past (see Discussion), however, indicator-based vulnerability assessments in general and particularly indicator-based social vulnerability assessments, have rarely used a predictive approach based on empirical observation of outcomes (i.e., harm indicators). Furthermore, they usually employ equal-weighted aggregation, wherein indicators are considered equally important without justification (Tonmoy et al., 2014;Nguyen et al., 2016). As a result, quantitative vulnerability assessments that are available to policy-makers today are largely not based on real-world experience; they are not sufficiently modified or improved based on recent and actual climate events, and thus they remain limited.
The methodology we present here follows several evidencebased predictive statistics studies while addressing some technical and practical limitations of these studies (see Discussion). We demonstrate, using a case study, how robust predictive statistical analysis can produce a validated evidence-based vulnerability assessment. Our case study is based on the impact of Hurricane Sandy (2012) on the State of New Jersey (NJ), USA. We analyzed the relationship between observed harm as reflected by insurance payout data, FEMA assistance data and using various social indicators while controlling for environmental factors. We hypothesize that certain indicators are significant in predicting harm and that the level of impact varies across indicators.

Harm Indicators (Outcome Variables)
Harm indicators (the outcome/dependent variables) were derived from two main datasets (Figure 1). The first, containing data related to insurance payouts after Hurricane Sandy was provided by the NJ Department of Banking and Insurance at the zip code level in NJ (Request Number: C115955). It reflects over four billion US dollars paid to subjects who experienced financial damage as a result of Hurricane Sandy. The second dataset contains information about the Federal Emergency Management Agency's (FEMA) Individual and Housing Assistance Program (IHP) of over 400 million US Dollars during Hurricane Sandy (hereinafter: "FEMA assistance" or "government assistance") which is also available at the zip code level (FEMA, 2014).
IHP provides assistance to those who had necessary expenses and significant needs, and only if they are unable to meet those needs through other means. It provides temporary housing assistance as well as other grant money that assists in activities, such as the replacement of lost furniture and clothing (Lindsay, 2017). Some typos were identified in the FEMA dataset's zip code numbers (invalid numbers or numbers outside the relevant states) and were subsequently removed from the database before it was used in the analysis.

Social Indicators (Explanatory Variables)
Initially, 15 social indicators (explanatory variables) were considered (see Table 1). These were consistent with the literature and were obtained from the US Census Bureau's American Community Survey (ACS) aggregated for the years of 2008-2012. Since Hurricane Sandy occurred toward the end of 2012, it was assumed that the majority of samples are relevant for the pre-event conditions as required for the analysis.

Exposure Indicators (Explanatory/Control Variables)
Three exposure indicators were used: distance from the storm track, maximum wind speed, and flood extent, as presented in Figure 2 (see Supplementary Material for additional information).

Spatial Resolution
The availability of data at the zip code level offers a sufficient number of observations (a sample size of 516-583 areas) at a relatively fine geospatial resolution for implementing predictive statistical modeling. Consequently, the association between socio-economic characteristics (indicators) of different communities (based on zip codes) and observed harm (insurance payouts and FEMA assistance) could be empirically explored.

Statistical Methodologies
General Harm indicators derived from the two aforementioned datasets (insurance and FEMA assistance) were used as dependent (outcome) variables in various predictive statistical models in order to explore socio-economical risk and vulnerability patterns. Three main statistical methodologies were used. First, Partial Least Squares Regression (known as PLS or PLSR) was implemented for selecting the relevant social indicators. PLS is a methodology that performs dimension reduction, like PCA but considering an outcome variable(s) in addition to the explanatory variables. Then, multivariate linear regression (hereinafter: "simple regression"), as well as spatial autoregressive regression (hereinafter: "spatial regression" or "spatial lag regression") was used to explore the relationship between the indicators in the models (i.e., the direction and estimate of coefficients-weight).
Following the results of the statistical models, further analysis was performed to explore potential disparities in approval rates (i.e., approved claims divided by number of claims). The models' results were used to create weighted vulnerability indices as described below. Furthermore, the weighted vulnerability index that was based on the FEMA dataset analysis was validated using available data concerning other neighboring states that experienced harm as a result of Hurricane Sandy (New-York, Connecticut, Rhode Island, and Maryland). Notably, the insurance data were only available for NJ and thus could not be validated in a similar manner. The study workflow is presented in Figure 3.

PLS: Reducing the Number of Social Indicators
A total of 19 PLS models were created by a combination of various harm indicators and different datasets. The pre-selected social indicators were used as the independent variables among all models. Each of these indicators was used in several separated models, using different datasets, as follows: 1. All claims aggregated by zip code. 2. Only FEMA assistance claims. 3. Only private insurance's residential property claims. 4. Private insurance's residential property claims and FEMA assistance claims aggregated.
All variables were log-transformed to fulfill the assumptions of normality and linearity and centered by their means for the PLS analysis. The PLS models' results demonstrated several dominant social indicators that thus were selected to be used in the linear and spatial regression models as discussed below (see Supplementary Material for detailed results).

Multivariate Linear Regression: Finding Weights
Linear regression models were used to assess the direction (indicate by plus or minus) and relative weight or importance (thorough standardized coefficients) of the social indicators. The social indicators selected through the PLS analysis were used as the independent (explanatory) variables in the regression models. The three exposure indicators described above were also added to the regression models as independent variables, as well as an additional variable: the number of households. The latter indicator was added in order to control for various sizes of areas captured in a single zip code.
Two outcome (dependent) variables were used in several models: number of approved claims and actual payouts/assistance amounts. These variables were assumed to reflect experienced harm (harm indicators). Approval rate was used in a post-analysis discussed separately below.
The two dependent variables were modeled using four datasets (a total of 8 linear and 8 spatial regression models): -All aggregated.
-Private insurance's residential property and FEMA assistance aggregated.
Similar to what was done for the PLS analysis, the socioeconomical independent variables and the outcome variables were log-transformed. The exposure indicators were not transformed since two of them are categorical and it was not necessary to transform them to satisfy the regression model assumptions. Furthermore, Variance Inflation Factor (VIF) analysis indicated that multicollinearity did not occur in the model.

Spatial Regression: Correcting for Autocorrelation
The regression models were tested for spatial autocorrelation (Moran I test) and their results were compared with the results of the spatial lag regression models. These analyses revealed that spatial autocorrelation was present in all the regular (nonspatial) models.
To solve this problem, we used spatial lag regression models with a log-likelihood function. For the application of this method, spatial weights are assigned to each observation and considered in generalized linear regression models. The weights list is created through two steps. First, a neighbors list is built based on regions with contiguous perimeters that are sharing one or more boundary points. Then the weighting list is created based on the neighbor list, by summing rowstandardized values over links among regions. Detailed results and additional details about the methodologies are provided in the Supplementary Material section. FIGURE 4 | Social indicators' weights according to the predictive model results. These weights were used to create the example vulnerability assessments. Notably, the minority indicator had a negative coefficient for payouts and a significant negative coefficient for assistance approval rate.

General
The results of the main spatial regression models (standardized coefficients) are presented in Figure 4, wherein the upper graph represents the overall payouts as the dependent variable and the lower graph approval-rate as the dependent variable. The main influential social indicators, as selected through PLS, were mean income, density, and rates of poverty, unemployment, minority population, and immigration.
Using these coefficients as aggregation weights ( Table 2) of the actual values by zip code (modified as discussed below), we demonstrate the creation of two vulnerability assessments ( Figure 5): net-value based (meaning, that the models used the net paid claims) and need-based (meaning, that the models used the FEMA assistance paid grants), with the former based on all payouts and the latter using only FEMA assistance data. Beyond their general importance for setting adaptation policy, net-value may be of use to entities such as insurance companies and real-estate organizations for anticipating losses and for planning investments. Need-based vulnerability will likely be most useful for governments and aid organizations seeking to assist communities at high risk.
From the needs-based assessment, it became clear that the variable "Minorities, " which had a negative coefficient in the FEMA payouts model, actually reflects a substantially higher vulnerability than others since this indicator also demonstrated  (Figure 4, lower graph).

Validation
An important part of the study presented here and an aspect that directly addresses one of the two shortcomings of the most commonly used indicator-based approaches mentioned in the introduction of this paper is the facilitation of validation. Thus, as another means of validating the methodology used in our study (and thus the vulnerability indexes we produced), we extrapolated the selected social indicators' weights ( Table 2, Need-based index) to create a need-based vulnerability index for neighboring states that were also affected by Hurricane Sandy. We used the need-based vulnerability index for these other states in regression models to investigate the index's predictive power and did the same using the traditional PCA-based equalweights approach. In the latter, we used the same initial list of indicators, selected a smaller number of indicators according to the result of a PCA model (four factors), and aggregated their values into a single index (using equal weighs). Three spatial regression models were produced, in all of them the dependent variable was FEMA assistance and the explanatory variables were the physical exposure variables along with the newly produced indexes as follows: one using our proposed need-based vulnerability index, one using the PCA equal-weights vulnerability index, and one using both.   Subsequently, we found that our proposed method can better predict harm using fewer variables, as shown in Table 3. This may signal to researchers and policy-makers that there is higher value in monitoring specific social indicators over others.

DISCUSSION
The shortcomings of the common indicator-based methodological approaches often used to conduct vulnerability assessments, such as lack of validation frameworks and the unjustified equal-weighting approach, have been acknowledged in the literature as described above. Only a few studies have taken on the task of validating the relationship between social indicators and observed climate change-driven impact (harm indicators) using robust predictive statistical models (Tonmoy et al., 2014). Even fewer use the results of such models to modify how vulnerability is assessed (i.e., what weight is given to individual indicators in the assembly of the vulnerability index). However, the grave consequences of lethal climate events recently experienced lead us to contend that these common indicator-based methodological limitations must be addressed and that methods can and should be improved. We demonstrate how robust evidence-based models can produce frameworks that overcome these limitations.
Two explanations for not using studies that are based on evidence and predictive statistics are usually offered. The first explanation highlights the lack of proper data at the required geographic resolution used for analysis (Hinkel, 2011).
The second explanation originates in the difficulties related to communicating the results of complex methodologies (Beccari, 2016), an argument which renders simplistic approaches preferable over those that could provide more accurate results.
The few studies that implemented predictive statistical techniques that we reviewed (e.g., Burton, 2010) introduce some statistical shortcomings that may bias results. Particularly methodological issues include: not including environmental/exposure as possible predictors in the model (e.g., Finch et al., 2010;Burton, 2015); lack of transparency or misreporting of model results, such as missing information concerning model results and the preprocessing of the data (e.g., Flanagan et al., 2011); not accounting for geographic dependencies in the data (spatial autocorrelation) (e.g., Myers et al., 2008;Fekete, 2009); reliance on correlation without considerations to causation (e.g., Borden et al., 2007;Finch et al., 2010); use of spatial units that may be too large to reflect socio-economic variability (e.g., Fekete, 2009); use of simulated results (e.g., Schmidtlein et al., 2011) or political decisions as outcome variables (e.g., Borden et al., 2007), both which do not serve as evidence of vulnerability; and other statistical assumption violations.
The first shortcoming mentioned above, which is particularly grave and common, results in a particularly low explanatory power of the model. This leads to biases, especially when performing an analysis based on a single climatic event with its unique physical features. The physical exposure (e.g., flood level) would carry a high explanatory power of the climatic event's consequences. Thus, including exposure in the statistical model allows a better examination of the other factors (socio-economic indicators/variables) that impact vulnerability.
Some limitations of this work are that it used only one case study. It also used similar datasets (though with different variables) for the first (supervised dimension reduction) and second (regression) steps of the study due to relatively small sample size (though with different target variables to overcome this limitation), and it explored only a single statistical approach for variable weighing (standardized coefficients). We therefore recommend additional evidencebased regional vulnerability assessments use data from several hurricane/flooding events and explore possible modifications to the model design by using additional statistical techniques, including those incorporating interactions between variables and standardizing model coefficients differently. Furthermore, we suggest exploring the normalization of indicators within a spatial unit using additional types of data from myriad sources, keeping in mind that the interpretability of models is especially important in such cases for driving adaptation policy.
In any case it is important to point out that a crucial aspect of this study that is seldom performed in other studies in this field is the validation of the proposed vulnerability index using a different geographical area. Other methods of validation can be explored, such as holding off some of the internal units (zip codes in this case) for validation when there is a sufficient sample size.
Perhaps most notable in our analysis results is the negative coefficient associated with the minority indicator for approval rate (i.e., successful assistance application rate). This result highlights issues that have been discussed in the literature, particularly within the context of justice and equity when facing the consequences of climate change (Rydin, 2006;Kim et al., 2018). These could be helpful on a local scale, but also when climate change adaptation plans are developed over broader scales of time (i.e., for long-term planning) and space considering disparities between regions or across multiple jurisdictions (Barbier, 2014;van den Berg and Keenan, 2019).

CONCLUSION
Our analysis suggests a strong association between social indicators and observed vulnerability, empirically demonstrating that some indicators are more meaningful than others. Consequently, adaption planning should consider and prioritize the most vulnerable communities, as reflected by the indicators, with consideration to the indicators' weights. Most importantly, this work sets another steppingstone for methodological advancement for the assessment of hurricanerelated vulnerability to climate change. Moving consideration of social vulnerability to climatic events forward, and especially with regard to events related to storm surge and flooding, is of vast importance as new data reveals increased risk of damage to extensive areas and the crucial consequences such damage involves, especially among already vulnerable communities (Flavelle et al., 2020).
Researchers, policy-makers, and other climate change adaptation practitioners should promote additional evidencebased predictive statistics approach implementations, thus expanding knowledge for adaptation planning and increasing the likelihood that appropriate and supportive policies for such planning to be put in place. In view of this position, we call on others to build upon, as well as to question, the proposed vulnerability assessment methodology, consequently improving adaptation planning and mitigating harm caused by climate change to communities at risk.