The use of race terms in epigenetics research: considerations moving forward

The field of environmental epigenetics is uniquely suited to investigate biologic mechanisms that have the potential to link stressors to health disparities. However, it is common practice in basic epigenetic research to treat race as a covariable in large data analyses in a way that can perpetuate harmful biases without providing any biologic insight. In this article, we i) propose that epigenetic researchers open a dialogue about how and why race is employed in study designs and think critically about how this might perpetuate harmful biases; ii) call for interdisciplinary conversation and collaboration between epigeneticists and social scientists to promote the collection of more detailed social metrics, particularly institutional and structural metrics such as levels of discrimination that could improve our understanding of individual health outcomes; iii) encourage the development of standards and practices that promote full transparency about data collection methods, particularly with regard to race; and iv) encourage the field of epigenetics to continue to investigate how social structures contribute to biological health disparities, with a particular focus on the influence that structural racism may have in driving these health disparities.


Introduction
One of the goals of researchers focused on human health in response to environmental stressors is to equitably promote excellent health for all.The epigenome is an interface between the genome and our environment, and as such is a prime target for exposures that can impact our health.The growing field of social epigenetics has demonstrated that adverse social environments and exposures can be associated with alterations in the epigenome (Cunliffe, 2016;Burris et al., 2016/01;Shields, 2017;Tung and Gilad, 2013;Evans et al., 2021).Given this, the epigenome is a mechanism that could be key in linking systemic racism to health disparities (Non, 2021;Martin et al., 2022).In human epigenetic studies, there are two common ways that race has been used in data analysis: as a covariate, or as a variable of interest.When race is assessed as a variable of interest, it is generally for the purpose of identifying health disparities.It is imperative that researchers keep assessing data with regard to race and other social factors in order to strengthen our understanding of social epigenetics and links between the epigenome and health disparities.In epigenetic research that is not focused on health disparities per se, such as the study of environmental exposures on epigenetics or associations of epigenetic patterning with disease outcomes, it is common practice to treat race as a covariate when looking for differences in epigenetic marks like DNA methylation or histone modifications.In this article, we call for the careful re-evaluation of the practice of using race as a covariate in epigenetic and epigenomic studies.
Despite the frequent use of the term "race" in biological and genomic research, it is more accurately defined as a socio-political rather than a biological classification (Yudell et al., 2016).The concept of biological races would indicate that humans are biologically distinct from one another but share a common ancestor.Humans do not have biological races, as human biological diversity does not align with racial populations (Graves, 2023).Modern use of the term "race" relies on social definitions that vary across cultures and over time rather than the identification of distinct biological groups (Roberts, 2011;Fuentes et al., 2019).In this paper, we exclusively discuss socially defined race.Terms frequently used in a biologic context such as ancestry or ethnicity reflect one's relationship to other individuals in the context of genealogical history (Yudell et al., 2016).In contrast, race is a social concept that connects someone to a larger constructed group, often based on having physical, geographical, or social characteristics similar to others in that group (Yudell et al., 2016).
Given that racial classification systems have been developed from our socially constructed systems of stratification, power, and ideology, it is unlikely that an individual's socially defined race per se influences epigenetic mechanisms (Smedley and Smedley, 2005;Yudell et al., 2016;Baker et al., 2017).This is because someone's racial classification does not accurately represent the experiences of each and every individual within that racial grouping.However, the association of an individual's socially defined race with many social determinants of health may likely result in broad differences in epigenetic markers across races.In this scenario, the category of race as a construct cannot be the ultimate driving factor influencing the epigenome, but is potentially associated with the causative factors such as physical or social aspects of their environment.For any such superficial associations between socially defined racial categories and differences in epigenetic marks (Xia et al., 2014/08;King et al., 2015;Song et al., 2015/12), it is imperative that the field focuses efforts towards a deeper understanding of the actual underlying causal mechanisms that lead race to be associated with health disparities-or at least explicitly discuss the possibilities.The field of epigenetics must seek to understand how social and environmental exposures combine with biology to affect the social distribution of disease (Williams and Sternthal, 2010).
In many epigenetic studies, socially defined race is used as a proxy for socioeconomic and sociodemographic variables, including financial stability, healthcare access and social stress, among others.However, individuals are not bound to experience certain stress levels or financial status based on their race alone, making race a poor proxy for many social metrics.This practice of substituting race for other social metrics may confound and distort data analyses.Using race as a proxy for socioeconomic status (SES) rather than collecting data on SES not only fails to accurately capture the relevant social factors intended for study, but can also reinforce negative racial stereotypes (Williams and Sternthal, 2010).
Socially defined race is also often used in epigenetic and epigenomic research as a proxy for ethnicity or ancestry.Given the large amount of genomic diversity within socially defined races, using the racial category with which an individual identifies will not accurately align with their genetics (Lewis et al., 2022).For example, Black people in the USA who are descended from enslaved individuals have varying degrees of European ancestry across regions within the USA: >30% European ancestry in West Virginia compared to less than 16% in South Carolina (Bryc et al., 2015).This is further counfounded by incorporating first generation Africans from all over the continent of Africa, who live in the USA and are considered by the USA census as Black or African American, though their ancestry will vary greatly from those who are descendents of enslaved individuals.Additionally, researchers must be mindful that socially defined racial categories differ between cultures when attempting to conduct research involving individuals from different regions of the world.For instance, between the USA and the UK, socially defined racial categories may carry different meanings and members of certain socially defined races may have entirely different genealogical history and ancestry in the USA compared to the United Kingdom.
Analyses that include genomic sequencing to categorize individuals into ethnic or ancestral groupings are also faced with the limitations of discretizing a continuous and very complex variable (Lewis et al., 2022).Ways in which ancestry data has been used in research shows that it is largely ambiguous and varies across studies, indicating that the field lacks consistent use and definitions of ancestry terms and interpretations (Dauda et al., 2023).The field of epigenetics should continue to conduct empirical studies on the role of genetics in epigenomics.Current efforts that conduct genomic sequencing in order to perform principle component analysis (PCA) for statistical corrections are more useful in that they do not rely solely on reported race.However, the use of PCA plots to determine genetically similar groupings based on statistics alone should recognize that the social experiences of individuals in these groups may not align and may conflate the data analyses.Additionally, using sequencing technologies to assign people to a single ancestral grouping and then inferring the influence this may have on epigenetics currently has inherent limitations given that researchers still do not have a good grasp on how individual and cumulative genetic differences influence epigenetic outcomes.Conflating socially defined race and ancestry through the practice of using race as a proxy for ancestry also has the potential to continue to perpetuate the false notion that race has a genetic basis.
Propagating the false narrative of race as a biologic or genetic construct rather than a social construct can perpetuate racist ideologies and exacerbate or create further disadvantages for members of specific socially defined races (Cerdeña et al., 2020).For example, associating race with genetics in healthcare can directly harm patients.When medical training leaves physicians with the idea that substantive genetic differences exist between races, this facilitates the implicit and explicit rationalization and justification of treating patients differently based on their race.This ultimately causes harm to individuals as it can result in alarming outcomes such as the administration of less pain medications due to false beliefs of racially-linked differences in pain tolerance (Hoffman et al., 2016), fewer preventative screenings for bone density measurements in Black women, the ultilization of race-adjusted glomerular filtration rate to assess kidney function which results in underdiagnosis of kidney disease in Black patients (Ahmed et al., 2021;Uppal et al., 2022), and many more examples.Additionally, attributing racial health disparities based on the false premise of racially-based genetic differences conflates the cause and the effect of health disparities and detracts from examining the underlying problems driving these differences, such as racism and systemic inequality.
Biomedical researchers, in their role as educators of students, can directly influence the training of future medical doctors and any biases they hold.In academic settings, many researchers and medical professionals may be unaware that race is a social construct and not quantifiably related to genetic ancestry.Basic biological research influences policy makers, medical doctor training, and has the potential to perpetuate harmful biases to the general public.When race is treated as a biological phenomenon rather than a social construct, it risks further perpetuating the incorrect notion that race has a genetically-defined rather a socially-defined origin.
Here we contend that the field of epigenetics needs to scrutinize its data collection methodology to target the environmental along with the social factors that contribute to the establishment, maintenance, and alterations of the epigenome.We also call for environmental epigenetic and epigenomic research to better understand how environmental exposures and experiences of social forces that differ between socially defined races can cause epigenetic changes resulting in racial health disparities.Such research efforts would be fortified by collaboration with social scientists and thoughtful data collection.This is particularly important to the field of epigenetics as these mechanisms could be the key to understanding how racism elicits a very real biologic response that could be, in part, responsible for the establishment of health disparities (Snyder-Mackler et al., 2020;Martin et al., 2022).

Shortcomings in the use of race in epigenetic research
Scientists often fail to use race accurately in biological and genomic research in two key ways.First, by neglecting to distinguish between self-identified racial categories and assigned or assumed racial categories.Second, by the haphazard use and reporting of racial/ethnic variables in genetic research, that is, reliance on race without clearly articulating exactly what race represents (Yudell et al., 2016;Dauda et al., 2023).These oversights risk not only perpetuating a misperception of race as genetically based, but, by misclassifying race, may also reduce the validity and reliability of the scientific research.It is important to be aware that an individual's reported race may differ based on whether it was assumed by someone else collecting demographic data or if it was self-reported.Additionally, an individual's answer might change if their race is not accurately reflected as an answer to a survey question.The increasing number of people identifying as mixedrace further complicates this type of data collection.Without concrete knowledge of the influence of genetics and social factors on the epigenome, the practice of classifying an individual's socially defined race and using it as a covariate in statistical models is problematically simplistic.
While it is useful to assess epigenetic patterning in a way that allows identification of health disparities (e.g., running multivariate models stratified by socially defined race), it is also important to address how race is employed in study design.Oftentimes race is simply included as one of many covariates in a multivariate model.
The following examples are worth considering in conceptualizing why socially defined race as a covariate might confound data.It was common practice in the past for doctors to record race for their patients rather than have patients self-report.Race was therefore documented based solely on skin color, within limited categories.Rather than skin color, what if hair color or eye color was instead used in these models (Neal, 2008)?Would there be any biological rationale for doing so?Consider using the racial category Asian and Pacific Islander as a proxy for genetic ancestry.Individuals in these two groups have very different ancestry yet are often grouped together.Race is either inadequately defined in these situations or not defined at all.From these poorly conceived categorizations, particularly consequential oversights include failing to acknowledge how outcomes would differ in the event of racial misidentification, missing information for individuals who are multi-racial, if race has incongruous definitions across different individuals, or if an individual's race is simply not one of the input options in the study design.Where race is used as a proxy for levels of stress, it would be far better to intentionally collect and use actual stress data, given that not all members of the same socially defined race will experience stress in the same way or to the same degree.The practice of using race as a covariate fundamentally limits and may even confound data interpretation.

Structural racism drives racial health disparities
Studying the implications of socially defined race for health disparities should include explicit acknowledgement of the causal pathways through which race is associated with negative health outcomes.Disproportionate health outcomes are driven by factors associated with structural racism, not race itself (Phelan et al., 2010/ 03;Bailey et al., 2017;Bailey et al., 2021).Structural racism involves the "systemic racial exclusion from power, resources, opportunities, and wellbeing that is embedded in societal institutions" (Brown and Homan, 2022).A good example comes from the field of birth outcomes, in which a great deal of research indicates that race itself is not the driving risk (Chantarat et al., 2022;Ross, 2014;McAfee, 2017;Collier et al., 2021).In other words, Black women are not more likely to experience pre-term birth and higher rates of infant mortality because they are Black, but due to chronic stressors like racism and poverty.Structural racism may operate through disproportionate exposure to risks, such as social stressors, environmental racism, or through unequal access to material and social resources, including social and economic capital, freedom, autonomy, power and prestige (Brown and Homan, 2022).
It is important to note that the goal of research on socio-political experiences and their effects on health is not to place blame on the communities that are experiencing health disparities, but rather to identify the institutional structures and institutional policies that can be altered to help alleviate these unjust health disparities.Throughout history, research has often focused on attempting to connect health disparities with individual-level underlying health conditions and behavioral risk factors rather than institutional-level policies and experiences (McClure et al., 2020).Focusing on behavioral risk factors can result in blame being placed on individuals and their personal behaviors, such as poor diet or stress levels, rather than the actual structural issues that are to blame (Roberts, 2011).Attention is then drawn away from how racialized communities are more likely to experience greater levels of workplace hazards, experience low wage work, and lack access to high quality healthcare due to institutional policies (McClure et al., 2020).When researchers use a hyper-individual approach to identifying risk factors, the social causes of disease may be obscured.This approach allows society to both ignore how policy and inequality create a system in which not everyone can thrive, masking systemic oppression as a root cause of health inequities, and can further perpetuate false notions of inherent differences between socially defined races (McClure et al., 2020).In other words, the focus should be less on race in patient classification and identifying causal mechanisms, but rather more on race in terms of discerning the ways in which structural racism produces and exacerbates health inequities (McAfee, 2017).

Toward achieving adequate measures of important social metrics
Quantifying structural racism is an important component of research that seeks to elucidate how policy and institutions impact health.Previous studies have demonstrated how structural racism and other state-level structural inequalities such as sexism, individually and jointly, shape health outcomes (Homan et al., 2021).As quantification of discrimination at the structural level becomes more sophisticated, researchers will have access to state and regional level estimates of structural inequality, which will allow them to model the ways socially defined race, gender, class, ability, and other individual-level characteristics interact with institutions to affect health outcomes (Brown and Homan, 2022;Homan et al., 2021;Hardeman et al., 2022;Atkins, 2014;James, 2022).
Social scientists and physical scientists must work together toward improving our understanding of how structural inequalities affect health outcomes.Social scientists' development of measurements of structural discrimination could improve epigeneticists' estimations of individual health outcomes.Others have already called for a more explicit paradigm shift with regard to scientists' use of racial categories, including requiring journals to explain the use of classificatory terminology in studying human genetic diversity (Yudell et al., 2016).Epigenetics and social science research could also benefit from replicating existing studies that have relied on race, with substitution of more accurate and meaningful variables associated with socially defined race.For example, if stress is what race is intended to capture, then including measures of perceived racism, socioeconomic status, allostatic load, birth zip code, measurements of neighborhood deprivation, some measure of wealth or financial safety net, social network, and primary language would serve to better capture and characterize levels of stress.

Epigenetics offers tools for understanding health disparities
The field of epigenetics promises to reveal much about the relationship between social, political, or environmental factors and their influence on health disparities.Epigenetics research has the potential to improve our understanding of how chronic stress, nutritional status, and socio-political structures such as structural racism and environmental racism can influence the health of individuals and impact health across generations (Breton et al., 2021;Salas et al., 2021;Chan et al., 2023).Epigenetic research is already examining the influence of nutritional status on the epigenome (Ideraabdullah and Zeisel, 2018;Gomez-Verjan et al., 2020), the links between DNA methylation and higher incidence rates of chronic pain in African Americans (Aroke et al., 2019), and epigenetic alterations associated with racial trauma (McDade et al., 2017;Grossi, 2020).Health disparities in cardiovascular disease (Kuzawa and Sweet, 2009) and exposures to environmental contaminants (Majnik and Lane, 2014;House et al., 2019) are of major interest in the field and can help provide insight into how structural racism embedded in our society can have negative health outcomes mediated though epigenetic mechanisms.

Recommendations
Given the rapidly progressing nature of the field of health disparities research, best practices for incorporating socially defined race into epigenetics research will likely change over time.However, the intention here is to prompt researchers to think more critically as they design, plan, and carry out experiments, and that when deciding where to allocate time and resources, funding agencies acknowledge the limitations and implications of oversimplifying the role of race.We and others (Krieger, 2020;Chan et al., 2023;Lewis et al., 2023;National Academies of Sciences E and Medicine, 2023) have put forth reccomendations for carrying out genetic and epigenetic research in the context of socially defined race and health disparities.Our recommendations include: 1. Researchers should engage in dialogues about the definitions of race, what socially defined race does and does not represent, and why socially defined race is included as a covariate in study design.Bringing this topic to the forefront of pedagogy will help young scientists continue to push this field and think critically about this issue.Dedicating sessions in scientific conferences to discuss, learn, and understand these issues will help formalize best practices in the field of epigenetics.
The hope is that this will stimulate conversations about how socially defined race is used in the field of environmental epigenetics research and how race in the resulting research may be interpreted by lawmakers and clinicians as well as how this might influence biases held by decision makers and further perpetuate health disparities.2. If researchers choose to use socially defined race as a covariate in models, then a justification should be provided in the context of the scientific question being asked.In the case of using data from a previously established cohort with limited demographic data, researchers should acknowledge the weaknessess associated with the data available to them and be fully transparent about the methods of data collection regarding social metrics.Depending on the demographic data available for each cohort, some metrics could be extrapolated from zipcode data, such as levels of neighborhood deprivation.However, we recommend this is performed such that the researchers recognize the imperfections in extrapolating this data and work closely with social scientists when doing do.3.Where socially defined race is used as a proxy for variables like stress, other data likely to be associated with stress should be collected; for example, levels of perceived racism, regional levels of structural discrimination, socioeconomic status, or zip code (Williams and Sternthal, 2010; The Use of Racial, 2005) that may be the explanatory variable that drives the association between socially defined race and alteration of the epigenome.
Collaboration with social scientists may give more insight into tools researchers can use to collect better measurements of structural racism.Where socially defined race is used as a proxy for ancestry, researchers should confront the inherent limitations of this oversimplificiation.When utilizing genetic ancestry in epigenetic studies, researchers should engage with ethical frameworks for incorporating ancestry data in their analyses (Lewis et al., 2022;Lewis et al., 2023).Embracing an interdisciplinary framework in study designs will allow for the incorporation of the social and cultural drivers of health disparities and provide a better understanding of how the epigenome is being modified.4. New cohorts should be inclusive, representative of many backgrounds, and include as much demographic data as possible so that researchers can assess interactions of multiple social and environmental stressors with chemical exposures.However, inclusion of enough minority participants in new cohorts to achieve sufficient statistical power will require overcoming distrust between communities and researchers.Establishing trust can be difficult and will be best accomplished when researchers are actively engaging with community and addressing topics of concerns to them (Masuda et al., 2011;Han et al., 2021/;Mikesell et al., 2013;Cook, 2008;Wallerstein and Duran, 2010;Gilmore-Bykovskyi et al., 2022).It is important that scientists approach community engagement with the goal of not just communicating science to the public but working to understand their concerns and include them in the scientific process.Research topics should connect with the social, cultural, and political context in which they reside.5.In designing new studies, self-reported race (Lorusso and Bacchini, 2015) and ethnicity data should still be collected so that data stratified across socially defined races can be used to help identify racial disparities.With regard to these data, it is most important that researchers are transparent about how race and ethnicity variables are obtained, particularly regarding whether race data was codified by others or if it was selfreported.Researchers should provide detailed information about the demographic data collection process.6.More research on the topic of epigenetics and racism is needed to understand the mechanistic links through which institutional racism can exert direct impacts on health, perpetuating health disparities.Funds, time, and resources need to be dedicated to research focused on how institutionalized racism and environmental racism can influence epigenetics and potentially influence health for generations.