Comparing Genetic and Socioenvironmental Contributions to Ethnic Differences in C-Reactive Protein

C-reactive protein (CRP) is a routinely measured blood biomarker for inflammation. Elevated levels of circulating CRP are associated with response to infection, risk for a number of complex common diseases, and psychosocial stress. The objective of this study was to compare the contributions of genetic ancestry, socioenvironmental factors, and inflammation-related health conditions to ethnic differences in C-reactive protein levels. We used multivariable regression to compare CRP blood serum levels between Black and White ethnic groups from the United Kingdom Biobank (UKBB) prospective cohort study. CRP serum levels are significantly associated with ethnicity in an age and sex adjusted model. Study participants who identify as Black have higher average CRP than those who identify as White, CRP increases with age, and females have higher average CRP than males. Ethnicity and sex show a significant interaction effect on CRP. Black females have higher average CRP levels than White females, whereas White males have higher average CRP than Black males. Significant associations between CRP, ethnicity, and genetic ancestry are almost completely attenuated in a fully adjusted model that includes socioenvironmental factors and inflammation-related health conditions. BMI, smoking, and socioeconomic deprivation all have high relative effects on CRP. These results indicate that socioenvironmental factors contribute more to CRP ethnic differences than genetics. Differences in CRP are associated with ethnic disparities for a number of chronic diseases, including type 2 diabetes, essential hypertension, sarcoidosis, and lupus erythematosus. Our results indicate that ethnic differences in CRP are linked to both socioenvironmental factors and numerous ethnic health disparities.


INTRODUCTION
C-reactive protein (CRP) is synthesized by hepatocytes and secreted to the bloodstream in response to inflammation. CRP is employed as a serum biomarker for both acute and chronic inflammation, with important implications for immune response and overall health (Pepys and Hirschfield 2003;Black et al., 2004). Elevated levels of CRP have been shown to be associated with an increased risk of diabetes (Dehghan et al., 2007), cardiovascular disease (Danesh et al., 2004), psychological stress (Wium-Andersen et al., 2013), and all-cause mortality (Zacho et al., 2010).
CRP blood serum levels vary across ethnic groups (Nazmi and Victora 2007), with a number of studies showing that Black patients have higher average levels of circulating CRP than White patients (Wong et al., 2001;Abramson et al., 2002;Ford 2002;Danner et al., 2003;Khera et al., 2005;Matthews et al., 2005;Alley et al., 2006;Lakoski et al., 2006). Ethnic differences of this kind are likely to have multifactorial causes, including contributions from genetic, socioeconomic, and environmental factors. Given the fact that ethnicity co-varies with all of these classes of risk factors, it is difficult to tease apart the genetic and socioenvironmental contributions to ethnic health disparities. This is further complicated by the fact that socially defined ethnicity is an imprecise proxy for genetic diversity.
We use genetic ancestry inference as a means to disambiguate genetic and socioenvironmental effects on ethnic health disparities. Genetic ancestry refers to patterns of genetic diversity that are linked to the geographical origins of human populations (Mathieson and Scally 2020). Individuals who share common ancestors have genetic similarities, and distinct ancestry groups show correlated allele frequency differences (Bergstrom et al., 2020;Genomes Project et al., 2015). Genetic ancestry can be defined objectively, using comparative genomic analysis, without relying on socially defined ethnic groups (Yudell et al., 2016). Patterns of genetic ancestry can be compared to self-identified ethnicity to understand the extent to which they overlap and how they may differ (Tang et al., 2005;Banda et al., 2015;Fang et al., 2019). Modelling of health outcomes with genetic ancestry and socioenvironmental factors as independent (predictor) variables can be used to assess how each contribute to health disparities and how they may interact (Choudhry et al., 2007;Nagar et al., 2021).
The objective of this study was to characterize the effects of genetic ancestry, socioenvironmental factors, and health conditions on ethnic disparities in CRP serum levels. Participants from the United Kingdom Biobank (UKBB) prospective cohort study who self-identified as belonging to Black or White ethnic groups were characterized with respect to CRP levels, genome-wide genotypes, and a variety of socioenvironmental factors and health conditions that show prevalence disparities across groups. Multivariable regression and relative importance analysis were used in an effort to decompose genetic and socioenvironmental contributions to CRP ethnic disparities. Age and sex were considered as covariates in all models given their known associations with CRP serum levels. Given that genetic and socioenvironmental contributions to health outcomes can covary across ethnic groups, we hypothesized that the inclusion of socioenvironmental covariates could attenuate the association between ethnicity, genetic ancestry, and CRP levels.

Study Cohort
Study participants and data were taken from the United Kingdom Biobank (UKBB), a prospective cohort study on the effects of demography, environment, and genetics on health and disease (Bycroft et al., 2018). The UKBB database contains phenotypic, clinical, and genetic information on more than 500,000 participants between the ages of 40 and 70, enrolled from 2006 to 2010. Ethics approval for the UKBB was obtained from the North West Multi-centre Research Ethics Committee (MREC) for the United Kingdom, the Patient Information Advisory Group (PIAG) for England and Wales, and the Community Health Index Advisory Group (CHIAG) for Scotland 1 . UKBB participants self-identified as belonging to a single ethnic group upon enrollment 2 , and we included participants who identified as Black or White for this study. It should be noted that the UKBB ethnic group labels used here correspond directly to racial group labels from the United States. This study was conducted under the United Kingdom Biobank project #65206 to LMR and IKJ.

Participant Data
UKBB participants completed questionnaires, nurse-led interviews, and medical assessments upon enrollment and provided access to their electronic health records. We accessed participant information on age (Field 21003: Age when attended assessment center), BMI (Field 21001: Body mass index (BMI), ethnicity (Field 21000: Ethnic background), ICD-10 disease diagnosis codes (Fields 41270: Diagnoses-ICD10), insomnia (Field 1200: Sleeplessness/insomnia), recruitment year (Field 53: Date of attending assessment centre), sex (Field 31: Sex), smoking status (Field 20116: Smoking status), and Townsend deprivation index (Field 189: Townsend deprivation index at recruitment) from the UKBB data portal. If multiple instances of participant data were available from a follow-up visit, only data collected during the initial assessment visit (2006)(2007)(2008)(2009)(2010) were used for analysis.
The Townsend deprivation index is a widely used measure of socioeconomic deprivation known to be associated with poor health outcomes (Foster et al., 2018). It combines four variables-unemployment, non-car ownership, non-home ownership, and household overcrowding-to generate a numerical score (Townsend et al., 1988), which ranges from −6.26 to 11.0 in the UKBB study cohort. Negative values indicate less socioeconomic deprivation, and relative affluence, whereas higher scores indicate greater socioeconomic deprivation.
UKBB participants provided whole blood samples for characterization of protein biomarkers and DNA as previously described (Elliott et al., 2008). C-reactive protein (CRP) blood serum levels were measured as mg/L units using the immunoturbidimetric method with the Beckman Coulter AU5800 clinical chemistry analyzer 3 . This procedure corresponds to the highsensitivity (hs) CRP test. DNA was extracted from 850 μl buffy coat blood aliquots 4 , and participant genome-wide genotypes were characterized using the UKBB Axiom Array or United Kingdom BiLEVE Array (Welsh et al., 2017).

Disease Case/Control Cohorts
Disease (or health condition) diagnoses for study participants were taken from UKBB ICD-10 diagnosis codes, which were then converted into disease-specific phenotype codes (phecodes) using the scheme developed by the PheWAS consortium (Carroll et al., 2014). Phecodes have been manually curated and validated by disease experts, and they are widely used for the analysis of electronic health record data (Wu et al., 2019). The phecode scheme provides ICD-10 code inclusion and exclusion criteria for each individual disease in order to define disease-specific case/ control cohorts that can be confidently compared. For example, when studying participants with type 2 diabetes, participants with type 1 diabetes are removed from the control cohort to avoid any overlapping genetic or environmental signals that might be common to both. This approach improves power for the detection of disease-specific association signals when modelling case/control status. Phecode case/control cohorts were curated for a total of 1,537 diseases or health-related conditions.

Genetic Ancestry Inference
UKBB participant genome-wide genotypes were merged and harmonized with whole genome sequence data from global reference populations characterized as part of the 1,000 Genomes Project (1KGP) and the Human Genome Diversity Project (HGDP) (Bergstrom, et al., 2020;Genomes Project, et al., 2015). Global reference populations were grouped into six regional ancestry groups based on their genetic and geographic affinity, including African (sub-Saharan) and European reference population groups (Supplementary Table 1).
UKBB, 1KGP, and HGDP genomic variant data were merged to include variants present in all three data sets. Minor allele frequency >1% and variant sample missingness <5% filters were used for merging, with variant strand flips and identifier inconsistencies corrected as needed. The merged genome variant data set was pruned for linkage disequilibrium using the program PLINK v2 (Chang et al., 2015).
Principal component analysis (PCA) of the harmonized UKBB, 1KGP, and HGDP genome variant dataset was performed using the FastPCA program implemented in PLINK v2 (Galinsky et al., 2016). Data for the first ten PCs were used to infer UKBB participant genetic ancestry fractions for African, European, and other regional ancestry groups. Our PCAbased genetic ancestry inference approach compares PCA data from UKBB participants to PCA data from reference population individuals using non-negative least squares to assign genetic ancestry fractions for regional ancestry groups as previously described (Conley et al., 2017;Jordan et al., 2019). Participants showing >5% non-African or non-European ancestry fractions were excluded from the study cohort.

Statistical Modeling
All statistical analyses were performed using the R statistical language v3.6.1 (Team R. C, 2013). Forest plots were generated using the forestmodel R package (Kennedy 2020). Other plots were generated using the ggplot R package (Wickham 2009).
CRP blood serum levels are measured in mg/L units by the UKBB. Log transformed CRP blood serum levels were modeled as a continuous outcome using linear regression, and clinically elevated CRP (>3 mg/L) was modeled as a binary outcome using logistic regression using the "glm" function in R. Independent (predictor) variables for the base models included ethnicity, age, and sex. Ethnicity was modeled as a binary variable (Black or White), age was mean centered and modeled as a continuous variable, and sex was modeled as a binary variable (female or male). Independent (predictor) variables for the fully adjusted models included BMI, insomnia, major depressive disorder, PC1, and PC2, recruitment year, smoking status, and the Townsend deprivation index. BMI, PC1, and PC2, and the Townsend index were modeled as continuous variables, and insomnia and smoking status were modeled as categorical variables, and major depressive disorder was modeled as a binary variable. Relative importance analysis was used to assess the relative contributions of the different covariates in the fully adjusted model, while accounting for multicollinearity. The R package "relaimpo" was used for analysis.
The odds of prevalence of specific diseases or health conditions were modeled as the dependent (outcome) variable with multivariable logistic regression computed using the "glm" function in R. Independent (predictor) variables for disease models included ethnicity, mean centered age, sex, and log transformed CRP levels.
Model specifications and detailed results for all statistical models are provided in the Supplementary Material. For each model, we provide the regression equation and model coefficients, along with effect size estimates, standard errors, z-values, and p-values for each model coefficient.

C-Reactive Protein, Ethnicity, Age, and Sex
The study cohort is made up of 433,298 United Kingdom Biobank participants who self-identify as Black (n 6,456) or White (n 426,842) ( Table 1). Males make up 45.7% of the cohort compared to 54.3% females, and the mean age of cohort participants is 57. C-reactive protein (CRP) blood serum levels vary by ethnicity, age, and sex. Black participants show a mean CRP level of 2.75 mg/L, and White participants show mean CRP level of 2.59 mg/L (t 2.72, p 0.01; Table 1 and Figure 1A). 25% of Black participants have clinically elevated CRP compared to 22.6% of White participants ( Table 1). Participant CRP levels increase with increasing age ( Figure 1B), and females show higher mean CRP levels than males (t 17.22, p < 2.2 × 10 −16 ; Figure 1C). When CRP levels are modeled by ethnicity, age, and sex, Black ethnicity and age show significant positive associations with CRP, whereas male sex shows a significant negative association with CRP (ethnicity: β 0.08, p ∼0, age: β 0.02, p ∼0, sex: β −0.06, p ∼0; Figure 1D and Supplementary  Table 4).
Inclusion of interaction terms in the CRP linear regression model revealed a significant interaction between ethnicity and sex (Supplementary Table 5). A likelihood ratio test showed a significantly better fit for a model with the ethnicity-sex interaction term compared to the model with no interaction term, providing additional support for the interaction (Supplementary Table 6). The observed ethnicity-sex interaction results from higher CRP for Black female participants compared to White female participants and lower CRP for Black male participants compared to White male participants (Figure 2).

Ethnicity and Genetic Ancestry
Black and White participants differ with respect to mean levels of African and European genetic ancestry ( Table 1). Black participants show averages of 87.8% African ancestry and 11.7% European ancestry compared to averages of 99.8% European and 0.03% African ancestry for White participants (African ancestry: t 671.91, p < 2.2 × 10 −16 ; European ancestry: t 672.38, p < 2.2 × 10 −16 ).
The relationship between ethnicity and genetic diversity for Black and White participants can be visualized via principal components analysis of genome-wide genotype data ( Figure 3A). Principal components one and two separate participants by ethnicity along a continuum of genetic  Figure 3B). The probability of participant self-identification as Black or White shifts in the range of 23-44% African ancestry ( Figure 3C). Participants are more likely to identify as Black, and less likely to identify as White, if they have ≥29% African ancestry.

Genetic and Socioenvironmental Effects on C-Reactive Protein Ethnic Disparities
Multivariable regression and relative importance analysis were used to model the effects of ethnicity, genetic ancestry, socioenvironmental factors, and health conditions that disproportionately affect different ethnic groups on CRP serum levels. Genetic ancestry was modeled using the first two principal components, which show the greatest separation between African and European ancestry components. Socioeconomic deprivation was modeled using the Townsend index, BMI, insomnia, and major depressive disorder (MDD) were included as disparate health conditions, and recruitment year and smoking were included as environmental exposures. The effect of recruitment year was considered to account for possible psychosocial stress associated with the financial crisis of 2007-2008. The effects of all covariates on CRP were considered separately using age and sex adjusted models and together with a fully adjusted model ( Table 2 and Supplementary  Table 7). The significant positive association between Black ethnicity and CRP levels is attenuated in the fully adjusted model, and ethnicity shows the lowest relative importance among the 11 covariates included in the model. The effects of genetic ancestry are similarly attenuated in the fully adjusted model and have the next lowest relative importance values (PC1 9, PC2 10). BMI has the highest relative effect on CRP in the fully adjusted model followed by age, smoking, socioeconomic deprivation measured by the Townsend index, and sex. Insomnia, recruitment year, and MDD all show significant but marginally important effects on CRP. Considered together, these results are consistent with the hypothesis that ethnic differences in socioenvironmental exposures and health outcomes explain the association between ethnicity, genetic ancestry, and CRP levels.

C-Reactive Protein and Ethnic Health Disparities
The relationship between CRP serum levels and ethnic health disparities was evaluated by independently modeling the effect of CRP and the effect of ethnicity on disease outcomes and comparing the results. There are 116 out of 1,537 diseases analyzed where both CRP and ethnicity showed significant associations with disease status, after correcting for multiple tests using the Bonferroni correction (Figure 4 and Supplementary Table 8). The effect size estimates for all diseases with significant CRP and ethnicity associations were evaluated to identify diseases where differences in CRP serum   Table 3, and data for all diseases are shown in Supplementary Table 8. The top ranked diseases include examples of infectious disease (tuberculosis and HIV), metabolic diseases (type 2 diabetes and hypoglycemia), circulatory system diseases (hypertensive chronic kidney disease, hypertensive heart disease, and essential hypertension), schizophrenia, genitourinary diseases (nephrotic syndrome and chronic kidney disease), and dermatologic diseases (lupus erythematosus and sarcoidosis).

Ethnic Differences in Inflammation are Explained by Socioenvironmental Factors
C-reactive protein (CRP) is a widely used clinical marker of inflammation. Our study of the UKBB found that CRP blood serum levels differ according to participant's self-identification as belonging to Black or White ethnic groups, and ethnicity in our cohort is highly correlated with genetic ancestry. Given these results, it could be naively expected that differences in genetic ancestry between the Black and White ethnic groups explain differences in CRP levels. However, comparisons between ethnic groups with distinct genetic ancestry profiles can be confounded by gene-environment correlations. Indeed, genetic ancestry, socioenvironmental factors, and health conditions can all covary among ethnic groups. We used multivariable regression and relative importance analysis in an effort to decompose the contributions of genetic ancestry, socioenvironmental factors, and health conditions to CRP ethnic disparities.
When differences in socioenvironmental exposures and health outcomes are accounted for, the associations between ethnicity, genetic ancestry, and CRP levels are almost completely attenuated. These results indicate that the environment plays a more important role than genetics in shaping ethnic disparities in inflammation for this cohort. Possible socioenvironmental factors leading to higher levels of CRP observed for Black participants could include psychosocial stress linked to racial discrimination and poverty (Sanders-Phillips et al., 2009;Williams and Mohammed 2009;Lewis et al., 2010;Beatty et al., 2014). The high relative effects of BMI and smoking on CRP suggests that aspects of diet and lifestyle associated with socioeconomic deprivation could also be linked to ethnic differences in inflammation (Feinstein 1993;Inglis et al., 2005;Govil et al., 2009).
Although we are making a semantic distinction between genetic effects, as measured by genetic ancestry, and the effects of socioenvironmental factors and health conditions, it should be noted that the socioenvironmental and health covariates modeled here are also likely be influenced by genetics. For example, BMI and smoking are both highly heritable traits. The effect of sex also likely includes a genetic component based on chromosomal differences between males and females. In other words, our approach to decomposing genetic and environmental contributions to health disparities is limited by the pervasive contributions of both genetics and the environment to human traits.

Interaction Between Ethnicity and Sex
UKBB participant CRP blood serum levels vary by ethnicity, age, and sex. Modeling CRP levels with all of these factors reveled a highly significant interaction effect between ethnicity and sex. Black females show higher CRP levels than White females, whereas Black males have lower CRP than White males. Thus, Black females are at the highest risk of chronic inflammation, suggesting the possibility of exposure to particularly high levels of stress for this group. This finding is consistent with previous studies showing that Black women can experience worse health outcomes than Black men, White women, or White men owing to their relatively subordinate position in both ethnic and gender hierarchies (Woods-Giscombe 2010; Farmer et al., 2021). This perspective underscores the importance of an ethnic health disparities analysis framework that includes multiple, interacting demographic, genetic, and socioenvironmental factors (Bowleg 2012;Brown and Hargrove 2013;Bauer 2014;Richardson and Brown 2016).

Inflammation and Ethnic Health Disparities
We related inflammation and ethnic health disparities by independently modeling the effect of CRP and ethnicity on disease status and then looking for diseases that showed significant associations with both factors. There were 109 out of 1,537 diseases that showed significant associations with both CRP and ethnicity, and we explored the diseases that showed the strongest effects for both. This approach uncovered a number of diseases linked to immune response and inflammation, including infectious diseases and complex, common diseases. This suggests the possibility that ethnic differences in inflammation, related to environmental exposures and psychosocial stress, could be broadly related to ethnic health disparities.
It is important to note, however, that our observational study design and statistical modelling do not allow for unambiguous causal inference regarding the relationship between CRP and disease (Davey Smith et al., 2005;Pingault et al., 2018). For infectious diseases, CRP levels are expected to be elevated after infection, which would entail a kind of reverse causality with respect to how our regression models are specified. For chronic diseases, systemic inflammation could precede disease or contribute to disease progression, but it could also reflect the presence of disease. Our models cannot distinguish between these possibilities, and it is not known whether participant CRP levels measured at recruitment precede or follow the diagnosis and course of disease. Thus, it is possible that the observed ethnic differences in CRP reflect a higher overall burden of disease for ethnic minorities in the UKBB, linked to higher levels of socioeconomic deprivation, rather than a causal risk factor for ethnic health disparities.

DATA AVAILABILITY STATEMENT
The datasets analyzed for this study can be found on the United Kingdom Biobank website (https://www.ukbiobank.ac. uk/register-apply/). FIGURE 5 | Effects of C-reactive protein (CRP) and ethnicity on disease. Effect sizes for statistically significant CRP-disease associations (β CRP , y-axis) and significant ethnicity-disease associations (β Ethnicity , x-axis) associations. β Ethnicity > 0 shows diseases that are positively associated with Black ethnicity, and β Ethnicity < 0 shows diseases that are positively associated with White ethnicity. Select disease examples are annotated as shown.