Differential Impact of COVID-19 Risk Factors on Ethnicities in the United States

The coronavirus disease (COVID-19) has revealed existing health inequalities in racial and ethnic minority groups in the US. This work investigates and quantifies the non-uniform effects of geographical location and other known risk factors on various ethnic groups during the COVID-19 pandemic at a national level. To quantify the geographical impact on various ethnic groups, we grouped all the states of the US. into four different regions (Northeast, Midwest, South, and West) and considered Non-Hispanic White (NHW), Non-Hispanic Black (NHB), Hispanic, Non-Hispanic Asian (NHA) as ethnic groups of our interest. Our analysis showed that infection and mortality among NHB and Hispanics are considerably higher than NHW. In particular, the COVID-19 infection rate in the Hispanic community was significantly higher than their population share, a phenomenon we observed across all regions in the US but is most prominent in the West. To gauge the differential impact of comorbidities on different ethnicities, we performed cross-sectional regression analyses of statewide data for COVID-19 infection and mortality for each ethnic group using advanced age, poverty, obesity, hypertension, cardiovascular disease, and diabetes as risk factors. After removing the risk factors causing multicollinearity, poverty emerged as one of the independent risk factors in explaining mortality rates in NHW, NHB, and Hispanic communities. Moreover, for NHW and NHB groups, we found that obesity encapsulated the effect of several other comorbidities such as advanced age, hypertension, and cardiovascular disease. At the same time, advanced age was the most robust predictor of mortality in the Hispanic group. Our study quantifies the unique impact of various risk factors on different ethnic groups, explaining the ethnicity-specific differences observed in the COVID-19 pandemic. The findings could provide insight into focused public health strategies and interventions.


INTRODUCTION
Numerous researchers have found various comorbidities and other risk factors affecting the spread and prognosis of coronavirus disease . Recent work by many researchers has also demonstrated that the COVID-19 pandemic has affected marginalized ethnicities more severely. We thus hypothesize that the risk factors for COVID-19 must have affected different ethnic groups in a distinctive manner. In this paper, we aim to quantify the differential effect of risk factors on different ethnicities.

COVID-19 and Ethnicity
The public health crisis created by the COVID-19 has uncovered the historical inequalities (1)(2)(3)(4) between ethnic groups in certain countries, in particular in the UK and US, which are countries with ethnically diverse populations. These observations and consistent fatal outcomes in the minority ethnic groups (5,6) have led to speculations about why patients from these groups are susceptible to infections, followed by severe complications. These trends could be due to different rates of COVID-19 infections, underlying health conditions, living conditions including housing density, having jobs as essential workers, access to health care, quality of care, and a mixture of multiple factors among these groups. The United States national data (7) from states and municipalities reports disproportionate COVID-19 infections, hospitalizations, and deaths among minority ethnic groups. Dobin and Dobin (8) showed that the infection rate is 4-fold for the Black and Hispanic population in selected counties in New York state. Moore et al. (9) observe a disproportionate number of COVID-19 cases among underrepresented racial/ethnic groups in the United States. Adhikari et al. (10) show that the racial and ethnic disparities in COVID-19 infections and deaths existed beyond those explained by income inequality.

Effects of Geographical Location on COVID-19
The impact of COVID-19 varies widely across countries and even within a country or a region. For example, Sun et al. (11) showed a negative correlation between the number provincial COVID-19 cases and latitude, as well as altitude. Breen and Ermisch (12) use spatial autoregressive regression to show that the relation of COVID-19 mortality to social composition of geographical areas in England is distinct than that of non-COVID mortality. A number of factors including societal awareness and culture, public health measures, healthcare infrastructure, and more recently vaccination coverage are known underlie the variation for COVID-19 infection rates and adverse health outcome (13).
Although multiple studies have confirmed that black and Hispanic populations in the US are more vulnerable to COVID-19, to our knowledge, no data is available if they are equally susceptible across geographical locations. Stephens-Davidowitz (14) uses the search data from Google to show that there exists a wide variation in racism in the US within the 50 states. Thus, we surmise that the impact of the COVID-19 on various ethnicities may not be uniform in all regions of the US. Hence, we are interested in understanding if a geographical location plays a part in the variation of COVID-19 impact on minorities.

Comorbidities for COVID-19
Emerging evidence highlights that comorbid conditions such as obesity, cardiovascular disease (CVD), and type 2 diabetes are directly linked to the severity of the COVID-19 disease (15)(16)(17). A meta-analysis including 76,993 patients with COVID-19 showed diabetes, CVD, smoking, malignancy, chronic kidney disease, hypertension, chronic obstructive pulmonary disease (COPD) are associated with poor prognosis (18). This conclusion was further supported by Richardson et al. (19), and Sun et al. (20). Using logistic regression (21) show that obesity was a risk factor for the severity of the COVID-19 disease. Furthermore, in a retrospective cohort study, Busetto et al. (22) conclude that despite their young age, overweight patients were more likely to need assisted ventilation and access to intensive care units than patients with normal weight. The connection between obesity and pulmonary function is wellestablished, e.g., Sharp et al. (23) observe that obese patients have significantly decreased total respiratory compliance. Moreover, Li et al. (24) find that reduction in functional residual capacity and diffusion impairment are the most common abnormalities in obese patients. Yan et al. (18) show that diabetic patients experienced more mortality than non-diabetic patients. Finally, just as in the case of the SARS epidemic (25), COVID-19 has disproportionately affected the older population (26). In fact, in the US, 92% of the COVID-19 recorded deaths till June, 2020 are in the age group 55 years and above (27). In summary, the main comorbidities for COVID-19 include obesity, diabetes, advanced age, hypertension, and cardiovascular disease.
However, these risk factors affect different ethnicities differently. For example, Paeratakul et al. (28) find that among obese individuals, the prevalence of hypertension was higher in NHB subjects than other groups. Sturm and Hattori (29) observe that the prevalence of obesity is about double among NHB than among Hispanics or NHW. Kuzawa and Sweet (30) note that NHB suffer from a disproportionate burden of CVD relative to NHW. Thus, we are motivated to understand whether or not these comorbidities affect different ethnicities differently.

Impact of Poverty on COVID-19 Prognosis
Patel et al. (31) note that economically disadvantaged people are vulnerable to COVID-19 due to a combination of factors. A time-series analysis conducted by Elgar et al. (32) reveals that income inequality is associated with a higher number of deaths due to COVID-19 in 84 countries. In particular, in the US, the states with higher income inequality experienced a higher rate of infection as well as the number of COVID-19 related deaths (33). This pattern could be because the comorbidities associated with COVID-19 are linked to poverty.
A longitudinal study involving 600,662 adults from Taiwan's National Healthcare Insurance database indicates that diabetes incidence is associated with poverty (34). This finding is particularly notable since the subjects from this study had access to universal healthcare. However, the subjects were from a ethnically homogeneous population. Thus, we are motivated to investigate the differential role of poverty among various races.

Objectives
For this study, we choose Non-Hispanic white (NHW), Non-Hispanic Black (NHB), Hispanic, and Non-Hispanic Asian (NHA) as four ethnic groups. The risk factors we choose to focus on in this work are advanced age, obesity, cardiovascular disease, diabetes, hypertension, and poverty. We aim to investigate the following in this study:

Selection of Variables and Data Sources
Given the discussion in the sections 1.3, 1. In our work, Relative Infections % and Relative Mortality % were considered as the response variables. For brevity, we write infection rate instead of Relative Infections %, and so on. The use of relative percentages allows a direct comparison with the population percentages of that ethnicity. For example, in a state with a 5% NHB population, relative mortality of 15% in the NHB community indicates disproportionately large mortality compared to the NHB population. The use of "relative" percentage is independent of the population of the state itself. The use of this measure also allows us to compare the impact on a certain ethnic group in two states with similar proportion of the minority population. As a concrete example, when we consider the states of California and Texas, both have a similar percentage of the Hispanic population, 39.5 and 40%, respectively. However, the relative mortality percentages for the Hispanic group in California and Texas are 48.3 and 56.1%, respectively. We collected the data on the percentage of people with age 60 or more in each state and ethnicity is obtained from the CDC dataset (41). Race and state-wise data were obtained from adults who reported being told by a health professional that they have diabetes (excluding prediabetes and gestational diabetes) using the America's Health Rankings (42). We used the body mass index (BMI) as a measure of obesity following (43) and define obesity as a condition of having a BMI of 30.0 or higher. The dataset (44) were used to obtain the obesity data from each state and for the races NHW, NHB, Hispanic, NHA. We acquired the percentage of adults whom a health care professional informed that they had a coronary heart disease, or myocardial infarction, or a stroke from AHR CVD data (45). This was gathered for each state and ethnicity of interest. We obtained the race and state-wise data on adults who reported being informed by a health professional that they have high blood pressure from AHR HBP data (46). The US Census Bureau defines the "poverty threshold" for a family with two adults and one child as $20,578 in 2019. We extracted the data from KFF Poverty data (47) on poverty defined by the "poverty threshold." We obtained this data for each state and ethnicity of interest. For each state, and each of the four ethnicity of interest (NHW, NHB, Hispanic, and NHA) we defined the variables: Age60+, BMI30+ (a measure of obesity), CVD, Diabetes, HBP, Poverty. For a state S, and an ethnicity E we defined the relative percentage of people with age 60 or over Age60+ as follows: Number of people of ethnicity E with age over 60 in state S Total number of people with age over 60 in state S × 100.
We use the variable name Age60+ instead of Relative Age60+ % for conciseness, and so on. We define the relative percentage variables Obesity, CVD, Diabetes, HBP, and Poverty in a similar manner.

A Note on Unavailability of Data From Some States
We encountered a few irregularities during our data collection process in the format the data was made available by various states (36)(37)(38)

Description of Data
The data for this study are state-level demographics based on four ethnic groups. We depict the relative infection % and relative mortality % for NHW, NHB, and Hispanic group in the map in Figure 1. As described in section 2.2 some states do not make the ethnicity-wise data public. The states with no color in the Figure 1 indicate that the ethnicity-wise infection and mortality data was not available in those states. We calculated state-wise descriptive statistics for the relative infection and mortality percentages and population comparing each ethnic group. We performed a descriptive analysis to explore the region-specific, state-wise characteristics of for the relative infection % and mortality % and population by calculating their medians, first and third quartiles, and presented in Figure 2.

Quantifying the Regional Variability of COVID-19 on Various Races
The infection and mortality rates for various ethnic groups are disproportionate to their share of the population in the US (7,8). We aim to understand this phenomenon and its severity across various regions in the US. To this effect, we employed the Kruskal-Wallis (KW) test (48), a non-parametric equivalent of the one-way analysis of variance. Since the test does not identify the groups that differ in their distributions, we followed it with Dunn's multiple comparisons test (49) for cases for which the KW test yielded statistically significant results. We used the combination of KW test, and Dunn's comparison test for the groups NHW, NHB, Hispanic, and NHA separately for all four regions of the US, as well as the whole country.

Correlation Analyses
In order to quantify the association between the impact of COVID-19 on the ethnic groups and the risk factors across the country, we consider each state, for which the data are available, as a data point. We computed the pairwise Pearson's correlation coefficients between various risk factors for the racial groups NHW, NHB, and Hispanic, along with their 2-tailed statistical significance values. We summarized the comparisons between the variables in correlation matrices.

Constructing Robust Linear Models With Infection and Mortality Rates as Response Variables
In order to elucidate the role of the explanatory variables on a specific aspect of the COVID-19 burden linear models are employed. For these linear models, we considered each state as a data point. From the Figure 2, we observe that the rate of infection and mortality in the NHA are consistently lower when compared to their population. Thus, we consider building FIGURE 2 | Box plots of population, relative infection %, and relative mortality % in each of four US regions, and combining all regions for NHW, NHB, Hispanic and NHA groups. Horizontal bars represent medians. "*" significance at p < 0.1, "**" significance at p < 0.05, "***" significance at p < 0.01, NS, not significant (Kruskal-Wallis tests followed by Dunn's tests).
Frontiers in Public Health | www.frontiersin.org 5 December 2021 | Volume 9 | Article 743003 linear models for NHW, NHB, and Hispanic groups only to elucidate the contributions of risk factors considered in this study. However, infection and mortality rates, the response variables for our model, showed skewness in their distributions. Since logarithmic (log) transformation of data is one of the most commonly used techniques to conform to normality (50), we implemented it on infection and mortality rates. The log transformation was effective in correcting the skewness and introduce normality (Supplementary Figures S1, S2). As log transformation of infection and mortality rates improved their normality behavior, we used log transformed form of these variables exclusively for model construction. Thus, when we refer to infection rate and mortality rate in the context of linear models they denote log transformed infection rate and mortality rate, respectively. For NHW, NHB, and Hispanic groups, we built our preliminary linear models with infection rate, and mortality rate as our response variables and the risk factors defined in section 1.3, i.e., advanced age Age60+, BMI30+ (a measure of obesity), CVD, Diabetes, HBP, Poverty as the explanatory variables. However, conditions of advanced age, cardiovascular disease, diabetes, obesity, and hypertension are interrelated. This interrelation can also be observed from the correlation Tables 3-5. Multicollinearity among the explanatory variables can lead to unstable and unreliable estimates of regression coefficients (51). We used the variance inflation factor (VIF) to assess the multicollinearity between the explanatory variables (52). Following Kutner et al. (53, p. 409) an upper cut-off value of VIF for explanatory variables is set as 10 to minimize the contribution of multicollinearity in our model. Starting from the preliminary model for ethnicity of interest, we propose the procedure outlined below to construct our final model: After constructing the linear models, we checked the normality of the residuals of the regression models with Lilliefors normality test (54).

Geographically Weighted Regression
Linear regression yields stationary and global regression coefficients. However, it is conceivable that these coefficients might have local variability. To find the geographical variability in the coefficients, we employed the geographically weighted regression (GWR) (55). Rather than producing global regression results, GWR yields "local" regression coefficients in terms of geographically varying functions. For our analysis, we used the infection rate and mortality rates as response variables and the variables obtained from section 2.4.3 as the explanatory variables.

Regional Variation of COVID-19 Impact on Various Ethnicities
The boxplots in Figure 2 summarize the relative impact of COVID-19 on various ethnicities across the four regions of the US and all regions as an aggregate. In Figure 2 we present various descriptive statistics of the population, infection, and mortality rates for the NHW, NHB, Hispanic, and the NHA groups across various regions. As noted in the section 2.2, not all the states are included in the analyses. Thus, the statistics shown in this plot do not correspond closely to those of the whole country. We describe the Kruskal-Wallis test results in Table 1. We see in Table 1 that the KW test for NHW is statistically significant in the Northeast and the West with p < 0.1. For the NHB group, the KW test is significant in the Northeast and the Midwest with p < 0.1. The KW test was statistically significant for the Hispanic group in "all four regions, " with p < 0.1 in the Northeast; with p < 0.05 in Midwest and the West; and p < 0.01 in the South. The NHA data yielded significant results with the KW test only in the South with p < 0.05.
When we considered all four regions in the US together, the KW test was statistically significant for all ethnicities with p < 0.01 for NHW and Hispanic communities. The KW test was significant for the NHB and NHA when all regions were combined with p < 0.1.
We followed the significant KW tests with Dunn's multiple comparison test to identify factors differing in their distributions. We depict the results from the Dunn's test in Table 2. In particular, we obtained statistically significant results (with p < 0.05) in the South, Midwest, and West for the Hispanic population between the pairs 'infection & mortality rates' and "infection rate and population share."

Results of the Correlation Analyses
The KW test provides evidence of geographical impact on various ethnicities. In this section we provide the results of correlation analysis between other risk factors. In Table 3 we see the Pearson correlations between the variables along with the 2-tailed significance values for the NHW group. The same statistics are provided in Tables 4, 5 for the NHB and Hispanic communities respectively. All the variables are strongly (p < 0.01) and positively correlated with every other variable for all ethnicities, with poverty being the sole exception. To be precise, for the NHW group, poverty is positively correlated "*" significance at p < 0.1, "**" significance at p < 0.05, "***" significance at p < 0.01.
The Table 8 depicts the linear models with infection rate and mortality rate as response variables for the Hispanic group. The preliminary model for infection rates in the Hispanic community accounts for 67% [R 2 = 0.67, R 2 adj = 0.60, F (6, 26) = 8.97, p < 0.01] of the variability in the infection rates for the Hispanic population. The final model for the Hispanic infection rates consists of diabetes and poverty as the explanatory variables. This model accounts for 51% [R 2 = 0.51, R 2 adj = 0.48, F (6, 25) = 9.98, p < 0.01] of the variability in the Hispanic infection rates. The preliminary model with mortality rates among the Hispanic community as the response variable accounts for 71% [R 2 = 0.71, R 2 adj = 0.63, F (6, 25) = 9.98, p < 0.01] of the variability in the Hispanic mortality rates. The final model for Hispanic mortality consists of advanced age and poverty as the explanatory variables. Note that advanced age is the most significant explanatory variable in the final mortality model in the Hispanic group, whereas having diabetes was the most significant variable predicting infection in the Hispanic community. This final model accounts for 55% [R 2 = 0.55, R 2 adj = 0.53, F (2, 39) = 23.93, p < 0.01] of the variability in the Hispanic mortality rates. We note that the final model for the Hispanic mortality includes 42 states, whereas the preliminary model has only 32 states due to lack of data availability.
Adjusted R 2 value for the regression model for NHW mortality was much higher (0.77) in comparison to NHB (0.65) and Hispanic (0.53). However, all six models showed statistical significance and satisfied normality tests for the residual values. Indeed, the Lilliefors normality test applied to the residuals obtained from each of these models revealed that the residuals were normally distributed with p > 0.001. The histograms, and the QQ plots for the residuals are provided in Figure 3.

Results From the Geographically Weighted Regression
The geographically weighted regression yields coefficients for each risk factor for every state. We show the state-wise coefficients for the most significant explanatory variable for each ethnicity in Figure 4. Empty spaces for states in Figure 4 indicate that ethnicity-wise data was not available for those states for the

DISCUSSION
Our analysis of the nationwide data revealed that geographical location, and other COVID-19 risk factors affect different ethnicities in a dissimilar way. We observed that the disparate burden of the pandemic was most prominent on the NHB and Hispanic communities. This observation is supported by Anyane-Yeboa et al. (56) and Escoba et al. (57) other studies.
In particular, the rate of infection was exceptionally high for the Hispanic community compared to their population share. Discordant impact on NHB and Hispanic populations has been reported by Centers for Disease Control (58) and studied using data from metropolitan cities and combining selected states, but the nationwide study is limited. In our work, this effect was observed in the four US regions separately and also when all the states' data was aggregated. When considered the four regions individually, we found that the excessive infection rate in the Hispanic community was most prominent in the South region. However, compared to the Hispanic group's infection rates, their mortality rates were statistically lower in all regions of the US. This apparent discrepancy could be because the Hispanic community is the youngest of the four ethnic groups considered in our study (59). The infection rate of NHB population was higher compared to their population share in the Midwest, and the Northeast than other regions. The correlation analysis confirmed that the COVID-19 related risk factors such as advanced age, cardiovascular disease, diabetes, hypertension, and obesity are highly interrelated. This finding is consistent with numerous studies. For example, Mokdad et al. (60) show that obesity (BMI≥ 30) was significantly associated with diabetes, hypertension, high cholesterol, asthma, and arthritis. Wilson and Kannel (61) conclude that obesity and diabetes are associated with atherogenic risk factors. Abdullah et al. (62) also conclude that obesity is associated with type 2 diabetes. We also found that "within" an ethnic group, poverty was uncorrelated or weakly correlated with infections and mortality for all three ethnic groups, implying that poverty is an "independent" risk factor for COVID-19. This finding is supported by Elgar et al. (32) and Oronce et al. (33) which we discussed in section 1.4.
After eliminating variables with high multicollinearity, we formulated robust and parsimonious linear models for NHW, NHB, and Hispanic populations. The linear models described in section 3.3 reveal that "obesity" encapsulates many other codependent risk factors for the infection and mortality in NHW and NHB groups. This finding is expected in light of numerous studies (61,63). Obesity and diabetes are well-established risk factors for COVID-19. In these two conditions, adipose tissue is compromised, which can directly or indirectly get involved in interaction with SARS-CoV-2, the pathogen responsible for COVID-19 disease (64). Thus, it is not surprising that obesity highly influences the regression models for NHW and NHB with death rate as a response variable. However, the degree of influence of obesity on infection rates and mortality is noteworthy, with obesity emerging as the most significant factor contributing to the infection rates and mortality for the NHW and NHB groups.
The Hispanic community markedly differs from NHW, and NHB with respect to the results of the linear models. Diabetes was the most significant factor for infection rate in Hispanics, while advanced age emerged as most significant for mortality. The effect of advanced age on Hispanic mortality could be also due to the relatively younger, and thus working-age, population of Hispanics (59) in the US.
The regression models indicate a strong association of poverty with a high infection rate, followed by death for all ethnic groups studied. This finding is in agreement with several studies focusing on the association of low socioeconomic status, which increases the exposure to COVID-19 (31,65). People with low socioeconomic status avail healthcare services at an advanced stage of illness, thus experience a worse prognosis. The disease burden associated with obesity is linked to socioeconomic status and race (28). Ethnic minorities and populations with low socioeconomic status have been disproportionately affected in previous pandemics (5,6,8). Evidence from the COVID-19 pandemic is not an exception to the above fact. To this end, public health strategies to control the current and future pandemics need to take these ethnicity-specific effects into account to mitigate the spread and severity of the disease.
The linear regression furnishes global and static coefficients for the explanatory variables. However, the geographically weighted regression gives coefficients that are geographically varying. We see in Figure 4 the variability in the coefficients of the GWR. We note that the neighboring states seem to have similar coefficients, indicating similarity in the risk factors in nearby states. Obesity is the most prominent risk factor amongst the NHW and NHB populations, and diabetes and advanced age seem to be more influential in the Hispanic community. The GWR results for the Hispanic group show more variability than the NHW group, which could be due to the higher percentage of people of Hispanic origin in southern and western states. The local R 2 map in Supplementary Figure S5 also indicates that the GWR model fits the NHW and NHB groups better than the Hispanic group. We plan to explore the geographical variation of the risk factors in more detail in future work.

Limitations
As discussed in section 2.2 and noted by other researchers (36,38) there is a lack of consistency and availability of COVID-19 related data. Our study does not include data from the state of New York, since the state does not make the ethnicity wise data available. Moreover, the data we use is state wise statistics of the various risk factors. However, we note that the observations made using such data is consistent with those made by other researchers.

Practical Implications of the Study
Although racial and ethnic disparities in COVID-19 infections and mortality are becoming increasingly clear from several studies based on available data, drivers of these disparate outcomes remain less understood at a national level. Our models, based on the nationwide data, indicate that "obesity" effectively encapsulates the effect of other co-dependent factors for NHW, and NHB populations (section 3.3). The link between COVID-19 infection severity and obesity is noted by Watanabe et al. (66) even in the early stages of the pandemic. Similarly, during the H1N1 pandemic of 2009 (67,68) observed that obesity was associated with higher mortality.
Another implication from our work is that the Hispanic community is more susceptible to the COVID-19 infection. This observation is valid throughout the US. This situation could be remedied via public policy changes and awareness of the issue. The disproportionate impact of COVID-19 on the minority population is largely attributed to existing socioeconomic inequities. The low-income minority population are often compelled to work in an environment with higher risk of disease exposure, live in a crowded accommodation, and lack adequate access to healthcare. The government support to lowincome families in the form of the CARES Act, Consolidated Appropriations Act, 2021, Department of Treasury US (69) and the American Rescue Plan Act of 2021 (70) are critical but might not be sufficient to fully mitigate observed disparity in infection and mortality rates. Our analysis indicates that certain subpopulations of the minority population are at higher risk of COVID-19 infection and mortality. Identifying these vulnerable subpopulations, such as Hispanics with diabetes or age over 60 years, and prioritizing additional attention to these populations could enable a more efficient allocation and utilization of resources. Increased effort toward educating and raising awareness on COVID-19 and associated risk factors could also be an effective method to develop community resilience. One potential avenue to improve awareness on COVID-19 will be through recruiting volunteers to educate the vulnerable population. For example, "Philly counts" (71), a program supported by the Philadelphia Department of Public Health, initially created for Census 2020, currently helps direct community engagement efforts for the COVID-19 vaccine. Extending similar initiatives to populations with major risk factors such as obesity could result in a major beneficial impact on overall COVID-19 burden.

CONCLUSION
Several researchers have concluded that several health conditions, poverty, and geographical location affect the COVID-19 prognosis. Studies have shown that the COVID-19 pandemic has impacted some minorities in the US more severely than other groups. Our work focused on quantifying this distinct effect of various COVID-19 risk factors on different ethnicities in the US during the first pandemic wave.
To this effect, we included Non-Hispanic White, Non-Hispanic Black, Hispanic, Non-Hispanic Asians. Our work has revealed differences in the way the COVID-19 pandemic affected various ethnic groups. We observed that the infection rates in the Hispanic population were disproportionately larger than the share of their population across all regions of the US. This effect was most prominent in the South region. The NHA populations consistently had lower infection rates and mortality rates compared to their population. Furthermore, we studied the following risk factors in this work: advanced age, obesity, cardiovascular diseases, diabetes, hypertension, and poverty for NHW, NHB, and Hispanic populations. We aimed to quantify the different effects of these risk factors on various ethnicities. To this end, we constructed linear models with infection and mortality rates as the response variables. We eliminated variables causing multicollinearity from our models, leading to robust linear models. Our models indicate that "obesity" parsimoniously describes the impact of other co-dependent comorbidities for NHW and NHB populations (section 3.3). However, for the infection rates in the Hispanic group, the factor leading to the robust linear model was the prevalence of diabetes. On the other hand, advanced age was more significant for COVID-19 related mortality for the Hispanic community. We also established "poverty" as an independent risk factor for infection and mortality amongst the three ethnicities: NHW, NHB, and Hispanics. The findings in this study quantified ethnicityspecific effects of COVID-19 risk factors, which we hope could be mollified with public policy interventions and community engagement.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/prashantva/Covid-19-Ethnicity.

AUTHOR CONTRIBUTIONS
PA: writing-review, conceptualization, editing, investigation, and analysis. JC: data collection. VK: data curation, formal analysis, visualization, and coding. SM: writing-original draft, methodology, formal analysis, and project administration. SS: supervision, conceptualization, validation, and editing. All authors contributed to the article and approved the submitted version.