Population-enriched innate immune variants may identify candidate gene targets at the intersection of cancer and cardio-metabolic disease

Both cancer and cardio-metabolic disease disparities exist among specific populations in the US. For example, African Americans experience the highest rates of breast and prostate cancer mortality and the highest incidence of obesity. Native and Hispanic Americans experience the highest rates of liver cancer mortality. At the same time, Pacific Islanders have the highest death rate attributed to type 2 diabetes (T2D), and Asian Americans experience the highest incidence of non-alcoholic fatty liver disease (NAFLD) and cancers induced by infectious agents. Notably, the pathologic progression of both cancer and cardio-metabolic diseases involves innate immunity and mechanisms of inflammation. Innate immunity in individuals is established through genetic inheritance and external stimuli to respond to environmental threats and stresses such as pathogen exposure. Further, individual genomes contain characteristic genetic markers associated with one or more geographic ancestries (ethnic groups), including protective innate immune genetic programming optimized for survival in their corresponding ancestral environment(s). This perspective explores evidence related to our working hypothesis that genetic variations in innate immune genes, particularly those that are commonly found but unevenly distributed between populations, are associated with disparities between populations in both cancer and cardio-metabolic diseases. Identifying conventional and unconventional innate immune genes that fit this profile may provide critical insights into the underlying mechanisms that connect these two families of complex diseases and offer novel targets for precision-based treatment of cancer and/or cardio-metabolic disease.


Introduction 1.Double-edged swords: important factors connecting metabolic disorders and cancer development
The following perspective was written in response to an invited Frontiers research topic to explore methods, mechanisms, and hypotheses that may ultimately identify and exploit biological processes contributing to complex disease progression and molecular interactions enabling cross-talk between cancer and cardio-metabolic disease.Based on our hypothesis that innate immunity differences contribute to observed population disease disparities in cancer and metabolic disorders, we apply a functional genomics approach to identify specific innate immune genes as potential therapeutic targets at the intersection of these two complex disease families.

Framing precision drug target discovery in the context of health disparities 1.2.1 Defining health disparities
The US National Institute on Minority Health and Health Disparities (NIMHD) defines health disparities as "a health difference (compared with the general population), based on one or more health outcomes (such as the overall rate of disease incidence, prevalence, morbidity, mortality or survival) that adversely affect disadvantaged populations."In the US, such populations include Blacks/African Americans, Hispanics/Latinos, Asians, American Indians/Alaska Natives, and Native Hawaiians/other Pacific Islanders) (1).Diverse sources, from sponsored websites (such as 2 and associated links) to peer-reviewed articles summarizing disparities in one or more diseases between two or more populations, provide ample evidence for differences in cancer (3), cardio-metabolic disease (4) and overall health risks and outcomes (5) based on ethnic background/geographic ancestry.By way of illustration, Tables 1, 2 summarize disparities in cancer incidence and mortality among US ethnic populations (adapted from 6) and population differences in overall mortality rates of cancer and cardiometabolic diseases (adapted from 7), respectively.
Assessing health differences between populations is complicated because results may vary depending on the size and granular composition of the populations being compared.On the one hand, evaluating larger, more heterogeneous populations improves statistical reliability, but this approach may mask disparities among subpopulations.For example, among Asians in the US (8) and Asia (9), the incidence of liver cancer varies widely based on geography and/or geographic ancestry.Further, trends in incidence and/or mortality may change due to cohort variations in age, exposure to risk, and geographic location, as is the case for liver (10) and breast cancer incidence (11) in the US and for global cancer mortality rates (e.g., 12).
Defining/distinguishing populations is a critical aspect of evaluating health disparities.Many analyses have been based on selfidentified ethnicity; it stands to reason that this approach is likely to align more closely with social determinants of health.In contrast, a relatively precise biological assessment of geographic ancestry can be obtained using genetic markers to identify ethnic origins.In this approach, selected ancestry informative markers (AIMs) were initially used to evaluate genetic admixture and geographic ancestry and provide valuable background information when comparing individuals representing different populations (13).Improved methods and more extensive and complete reference datasets have further refined admixture mapping (14).
For the purposes of this perspective, we will refer to populations as they are defined by individual authors; populations in Section 3 are defined according to Karczewski (15).The interested reader is referred to a recent book chapter entitled "Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field" written by the National EA, European American, non-Hispanic White; AA, African American, non-Hispanic Black; ASN/PI, Asian American/Pacific Islander; NA/AN, Native American/Alaskan Native; HISP, Hispanic/Latino.

Considering geographic ancestry in the development of effective treatments
The human genome possesses a high degree of variation.According to a 2016 meta-analysis of 60,706 individuals of diverse ancestries, an average of 1 in 8 bases of the coding sequence were variants, and 72% of these had not been previously identified and/or characterized (35).Wide genetic variations within populations are at least as diverse as genetic variations between populations (36).This finding implies that not all genetic variations contribute to putative biological differences between populations.
Genetic differences associated with geographic ancestry, such as AIMs, may result in the uneven among populations distribution of gene variants.In many cases, these variants are uncommon, and/or their impact on protein expression, function, or disease is either insignificant or unknown.However, an intriguing study by Ahsan et al. (37) identified 65 "minor" drug response alleles that were present in more than 50% of individuals in at least one population; in other words, in some populations, the variant was more common than the wild type/canonical protein.Consistent with this is a body of clinical evidence that specific drug responses vary according to geographic ancestry, with outcomes that range from lack of efficacy to drug-related pathology and death in one or more minority populations (38)(39)(40).Therefore, we sought to identify populationspecific potential therapeutic targets at the intersection of cancer and cardio-metabolic disease, in part by hand-curating gene variants with "minor" alleles that were common in at least one major population (as defined by 15) but that were significantly less common in at least one other major population.

Innate immunity as a biological driver of health disparities
Gene variants that confer protective immunity are retained in each population to optimize survival.For example, in the case of those with African ancestry, gene variants retained in the pan-African genome have been identified that provide defense against indigenous pathogens such as malaria and trypanosomiasis (African sleeping sickness/Chagas disease).The selective pressure imposed by pathogens on gene variation is impressive; in the case of malaria, variants of at least 40 different genes are thought to protect against one or more species of Plasmodium (41,42).
Unfortunately, immune protection frequently involves a tradeoff where protective innate immune variants may introduce new pathologies.For example, among the gene variants that protect against malaria, HbS also promotes sickle cell anemia, HbE promotes thalassemia, G6PD variants promote hemolytic anemia, and Duffy antigen receptor (DARC) variants are associated with increased breast cancer metastasis and mortality (43,44).Similarly, the same APOL1 variants shown to protect against severe trypanosomiasis are also associated with nephropathy (45,46).
Several lines of evidence affirm that innate immune genes are highly adaptable and optimized to respond to local pathogens.First, within the human genome, genes associated with immunity are under the strongest selective pressure (47,48).Second, selective pressure on immune genes is pathogen-driven (49,50).Third, the geographic distribution of populations bearing the highest frequency of HbS (51) and DARC (52) gene variants closely resemble the geographic distribution of the malarial strains they protect against.Finally, according to their geographical ancestry, populations differ in their susceptibility to infectious disease (53), in their immune response to pathogens (54) and even in their macrophage function and circulating cytokine levels (55-57).All of these findings indicate that protective innate immune variants are distributed among individuals based on their geographic ancestry.
It is important to note that genes associated with innate immunity are structurally and functionally diverse.Some are well-characterized participants in inflammation, including but not limited to cytokines, chemokines, and pattern recognition receptors (lectins, Toll-like receptor (TLR) family members, and NLRs) and their related pathways.However, as illustrated by the variety of genes that protect against malaria (summarized in Table 3), others are pleiotropic, expressed in non-immune tissues and/or frequently better known for their "day jobs".Most of the protective variants listed in Table 3 can be tied directly to immunity.Still, a few (such as APOE, G6PD, glycophorin (GYP), hemoglobin (HB), and haptoglobulin (HP)) would be considered unconventional innate immune genes.Hanahan and Weinberg, in their seminal review, describe six hallmarks of cancer, many of which are enabled by mechanisms of immunity, including inflammation (58).Their observations are particularly relevant to this perspective since further research in the field has established that reprogrammed energy metabolism and immune evasion are additional hallmarks (58, 59).
In a previously published perspective, we presented evidence for an association between breast and prostate cancer disparities in African Americans (AAs) and classic innate immune gene variants (interleukins, Toll-like receptors, monocyte activity) more commonly found in AAs (60).Since 2019, Google Scholar (accessed 4/18/23) has listed more than 18,000 publications with titles that include "cancer" and "inflammation," "infection", "immune," "immunity," or "innate"; these publications address a wide range of topics, including immune escape by cancer cells, the contribution of chronic inflammation to tumor progression, and immune-based cancer therapies, that are beyond the scope of this perspective.Notably, less than 40 of these publications (< 0.2%) include the terms "disparity" or "disparities" in their titles.Among this small set of publications are descriptions of population differences in tumor microenvironment and immune signatures in breast (61,62), head and neck (63-65), lung (66,67), and colorectal (68,69) cancers, as well as cancer generally (70).Of particular interest is a recent exploration of the link between racial differences in mitochondrial metabolism and the tumor immune microenvironment (71).

Cardio-metabolic disease and inflammation
The constellation of inter-related cardio-metabolic diseases has been collectively referred to as metabolic syndrome (MetS), and their cumulative effect on global health is massive (reviewed in 72-74).Clinical definitions of MetS vary depending on which disease(s) are of primary interest (reviewed in [75][76][77].The National Heart Lung and Blood Institute (NHLBI) lists the following MetS risk factors as abdominal obesity and/or insulin resistance, elevated triglycerides and LDL-cholesterol, reduced HDL-cholesterol, hypertension, elevated glucose and pro-thrombotic or proinflammatory states (78).Several metabolic diseases have been associated with these risk factors, including hypertension, obesity, atherosclerotic cardiovascular disease, type 2 diabetes (T2D), nonalcoholic fatty liver disease (NAFLD), and stroke.
Genetic and environmental factors impact cardio-metabolic diseases, and their risk, morbidity, and mortality vary with age, gender, and race/ethnicity (4,76).Unfortunately, the effects of MetS are not confined to cardio-metabolic co-morbidities, given that MetS is also associated with increases in the incidence and/or mortality of arthritis, chronic kidney disease, schizophrenia, depression and cancer, as noted in references (79, 80).
Inflammation is a key contributor to MetS and associated comorbidities (81)(82)(83), just as MetS pathologies impact inflammation (c.f.84).In general, low-grade chronic inflammation evoked during metabolic disease stimulates the production of pro-inflammatory cytokines, immuno-modulatory proteins, lipids, and other mediators of inflammation that impact systemic and/or localized tissue inflammation (82, 85).Unfortunately, the treatment of metabolic diseases is complicated by the cross-talk between proand anti-inflammatory mechanisms at work among MetS comorbidities (c.f.77, 86-88).Further, inflammation from one metabolic disease can also exacerbate other MetS co-morbidities.
As with almost all tissues, organs that regulate systemic metabolism possess innate immune response capabilities.Notably, some organs that regulate overall metabolic homeostasis also impact systemic inflammation.In the case of both adipose tissue (89-91) and liver (92)(93)(94), these organs harbor and partner with resident macrophages (ATMs and Kupffer cells, respectively) in inflammation.Further, adipose tissue and liver produce unique immunologically active biomolecules, such as adipokines (86,95) and bile acids (96)(97)(98)(99).Perhaps less appreciated are two additional organs associated with metabolic homeostasis that control systemic levels of immunologically active biomolecules: the gallbladder regulates bile acid levels and the pancreas controls insulin, which levels of insulin, with its known anti-inflammatory effects (100).
Just as mediators of metabolism can impact inflammation, mediators of immunity can impact metabolism.For example, innate immune receptors have demonstrated roles in metabolic disease progression (101), and pro-inflammatory cytokines produced in the adipose tissue of obese individuals contribute to the development of T2D (102).Significantly, biomolecules such as adipokines, insulin, and bile acids mediate metabolism and inflammation.Further, besides their widely recognized role in lipid transport and cellular metabolic homeostasis, serum lipids and lipoproteins also provide innate immune protection (103,104).

A functional genomics approach to novel target discovery
Using functional genomics, we and others have observed associations between specific innate immune gene variants and cancer or metabolic disease risk or outcome that differ according to geographic ancestry (57,60,105).Given that immunity including inflammation contributes to the progression of both complex disease families, we have hypothesized that population differences in genetic (and epigenetic) innate immune programs contribute to complex disease disparities between populations.Based on this conceptual framework, this perspective seeks to identify innate immune gene candidates associated with both cancer and cardiometabolic disease that differ between populations.
Genome wide association studies (GWAS) in general (106) and the Genome Aggregation Database (gnomAD) in particular (107) provide researchers with the capacity to compare thousands of complete genomes from individuals among all largely-grouped populations.These resources catalog gene variations called single nucleotide polymorphisms (SNPs) across the entire genome of each individual.SNPs are located not only in protein coding genes (including coding exons as well as non-coding introns and remote, up-, down-, and mid-stream regulatory sites), but also across regions associated with short and long non-coding RNAs, chromosomal architecture, and other essential functions that have been previously underappreciated and mislabeled as "junk DNA" (108).The number of genes and the percentage of the human genome they occupy varies depending on their definition (109).Notably, most SNPs associated with disease states or changes in phenotype (95%) are located outside coding exons (110).
Nevertheless, in this perspective, we will focus on widely occurring gene variants that code for changes in the canonical amino acid (aa) sequence, also referred to as missense variants or nonsynonymous SNPs, as a first step towards accelerating the development of optimally safe and active drugs that target understudied protein variants widely found in patients with diverse geographical ancestries.Importantly, nonsynonymous SNPs have the potential to impact protein conformation, activity and/or protein-protein interactions, potentially altering disease states and phenotypes.For simplicity, we have also excluded synonymous SNPs (exonic point mutations that do not alter aa sequence), in spite of mounting evidence that suggests they can function in isoform selection (protein size and sequence), transcript expression levels and stability, translational folding rate, overall conformation, and posttranslational modifications, all of which possess potential functional consequences on cell behavior and disease risk (111)(112)(113).
This perspective identifies conventional and unconventional innate immune genes (summarized in Section 3) that meet the following criteria.First, there is evidence that each gene participates in, is a target of, or is associated with innate immunity including inflammation.Second, there is evidence that each gene is associated with at least one form of cancer and at least one cardio-metabolic disease.Finally, each gene occurs among the global population as at least one population-enriched variant, which we define as a widely occurring missense variant distributed unevenly among populations.
We have employed a hand-curated discovery process to identify population-specific innate immune genes at the intersection of cancer and metabolic disease.From the primary and secondary literature, gene lists associated with innate immunity (49,114,115), cancer (116,117), or cardio-metabolic disease (118,119) were vetted for the following characteristics: 1) Evidence in the primary or secondary literature (accessed through Google Scholar) indicated that the candidate gene was involved in all three disease categories: innate immunity/inflammation, cancer, and cardiometabolic disease.
2) Indication in gnomAD that the candidate gene occurs as at least one nonsynonymous SNP/missense variant with a. a high minor allele frequency (MAF ≥ 0.2 in at least one of the six major populations defined by 15): African/African American (AFR/AA), East Asian (E ASN), non-Finnish European (EUR), Latino/Latina (LAT), Middle Eastern (MID E), and South Asian (S ASN), b. a difference in MAF among significant populations of ≥ 0.2 from the highest to lowest frequency.
Note that among genes with missense variants, we chose only those with common variants that occur widely among individuals in one or more populations, i.e., missense variants that occurred in at least 20% of individuals in one or more populations (by definition, having a minor allele frequency (MAF) ≥ 0.20 and varying widely in the frequency of their occurrence among populations.This approach was based on our rationale that variants selected and retained in the human genome provide a survival benefit for the population(s) in which they occur, even as they may also paradoxically contribute to complex disease as discussed above for HbS and APOL1 variants (see Section 1.3).

Candidate innate immune genes at the intersection of cancer and cardiometabolic disease disparities
Among the candidate innate immune genes that we identified at the intersection of cancer and cardio-metabolic disease, we found both "conventional" innate immune genes, such as cytokines and cytokine receptors, pattern recognition receptors, and other genes that have widely acknowledged roles in immune cell function, and "unconventional genes" with pleiotropic functions that include innate immunity, such apolipoproteins, biomolecule transporters, and transcription regulators.Using the approach described in Section 2, three lists of innate immune genes implicated in cancer and cardiometabolic disease were generated.Each gene listed in the three tables below possesses at least one population-enriched variant with an amino acid replacement that differs in its distribution among populations, suggesting its potential role in both cancer and cardio-metabolic disparities.The 52 genes identified provide a representative but not exhaustive list of candidate genes, thus serving as preliminary data for further investigation.Section 3.1 summarizes conventional innate immune genes and their corresponding population-enriched variants previously shown to impact disease or biological function.Similarly, Section 3.2 summarizes unconventional innate immune genes (better known for their nonimmune functions) and their corresponding population-enriched variants that have been previously shown to impact disease or biological function.Finally, Section 3.3 summarizes genes associated with innate immunity, cancer, and cardio-metabolic diseases and their corresponding population-enriched variants whose impact on disease or biological function has not yet been established.

Conventional innate immune genes with previously characterized populationenriched variants
Table 4 includes 14 genes best known for their roles in immunity, including inflammation, that are present as at least one population-enriched variant shown to impact biological function.Among these are cytokines and cytokine receptors, including macrophage inhibitory cytokine 1 (MIC-1/GDF15), interleukin 3 and the alpha subunit of its receptor (IL3 and IL3RA), along with subunits for interleukin 4, 6 and 7 receptors (IL4R, IL6R, and IL7R), and the leptin adipokine receptor (LEPR).Additional immune receptors include the soluble receptor for MHC I antigens I (leukocyte Ig-like receptor A3, LILRA3/CD85E) and two pattern recognition receptors, the intracellular pattern recognition receptor nucleotide-binding oligomerization domain containing 2 (NOD2) and the five transmembrane stimulator of interferon response CGAMP interactor 1 (STING1/TMEM173).Also included were the catalytic enzyme in the rate-limiting step of the kynurenine pathway during inflammation indoleamine 2,3dioxygenase 2 (IDO2), the temperature-sensitive cation channel TRPM8, and two adhesion molecules, one expressed in lymphocytes (integrin alpha L, ITGAL/LFA-1/CD11A) and the other expressed in leukocytes (junctional adhesion molecule-like, JAML/AMICA).

Interleukin 3 and interleukin 3 receptor alpha chain
IL-3 is a growth factor produced by activated T-cells (129) that regulates the growth of hematopoietic progenitor cells and activates mature neutrophils and macrophages (208).IL-3 is also implicated in priming (131) and activating (130) basophils.Intriguingly, increased serum levels of IL-3 have recently been associated with the onset of type 2 diabetes in African American women as determined by serum levels of glucose and HbA1c (133).Genetic variations in IL3 have been noted in colon and rectal cancers (132).The Pro27Ser variant (5-132060785-C-T) has been associated with protection against malaria (134) but also with an increase in miscarriages following in vitro fertilization (IVF) in women of various populations (209).
The interleukin 3 receptor is a heterodimer comprised of an interleukin 3-specific alpha chain (IL-3RA, CD123) and the common cytokine beta chain CSF2RB, another candidate listed below in Section 3.3, that also forms dimers with the alpha chains of both GM-CSF and IL-5 receptors.High-affinity IL-3 binding induces hetero-dimerization of IL-3RA and CSF2RB, and subsequent disulfide linkage of these receptor chains is required for receptor activation and CSF2RB phosphorylation (210).IL-3RA expression varies among CD34+ hematopoietic cell types, with negative/low expression in primitive hematopoietic cells and little or no surface expression in early erythroid progenitors, but high expression in B-lymphoid and myeloid progenitors (135).The X-chromosome-linked IL3RA Val323Leu variant (X-1378751-G-C) was associated with non-complete response to neoadjuvant chemotherapy against locally advanced rectal cancer in Hong Kong patients (138).

Interleukin 4 receptor alpha chain
The IL-4R alpha chain (IL4R, CD124) forms heterodimers with at least two partners.Type 1 IL-4 receptors are composed of IL-4R complexed with the common cytokine receptor gamma chain (IL2RG, CD132), which may alternatively dimerize with IL-2, IL-7 and IL-21 cytokine receptors, so that IL-2, IL-7, and IL-21 receptors compete with IL-4R for binding to IL2RG.Type 2 IL-4 receptors are composed of IL-4R complexed with IL-13RA1 (IL13Ra1, CD213A1).Thus, IL-4 activates both Type 1 and Type 2 IL-4 receptors, while IL-13 activates Type 2 IL-4 receptors.Both IL-4 and IL-13 signaling through the IL-4R mediate type 2 (humoral, as opposed to type 1 cellular) immunity against helminths, toxins and tropical parasites such as plasmodium (malaria) and trypanosomes (African sleeping sickness/Chagas disease) (139)(140)(141)211).Both IL-4Ra and IL13-Ra1 have also been implicated in cancer progression and were recently identified as prognostic indicators in soft-tissue sarcoma patients when present in the nucleus.IL-4 regulates lipid metabolism (143), and (142) recent findings highlight an intriguing relationship between non-hematopoietic IL-4Ra activation of a non-canonical signaling pathway that regulates a high-fat, high-carbohydrate dietdriven induction of obesity and impacts the severity of obesityassociated sequelae in mice (212).Numerous genetic epidemiological studies have also shown that IL4 and IL4R and their gene polymorphisms play important roles in asthma in various populations.Notably, individuals carrying one or two copies of the IL4R Glu400Ala (16-27362551-A-C) minor allele were at higher risk to suffer from allergy (145) and asthma (144, 213).

Interleukin 7 receptor alpha chain
The integral membrane interleukin 7 receptor (IL-7R) transmits pro-inflammatory signals initiated by IL-7 at the cell surface.The functional IL-7 receptor is a heterodimer comprised of the IL-7 receptor alpha chain (IL7R, IL7Ra, CD127) and the same common cytokine receptor gamma chain (IL2RG, CD132) that dimerizes with the IL-4R alpha chain.The assembled IL-7R recognizes not only IL-7 but also thymic stromal lymphopoietin (TSLP), both cytokines with 4 a-helical strands (214).Multiple transcriptional and posttranscriptional mechanisms exist to regulate expression of the IL-7R protein (215).Some of these mechanisms are homeostatic, molecular and cytokine-mediated, where IL7Ra transcription decreases in CD4 + and CD8 + cells once naïve T cells become activated.Notably, IL-7 binding to IL-7R activates the Janus kinase (JAK/STAT) pathway, which plays an essential role in lipid metabolism (216).However, peripheral blood mononuclear cells (PBMCs) in breast cancer patients show defects in STAT5 phosphorylation and altered expression of IL-7Ra that ultimately impacts memory T cell development (156).

Unconventional innate immune genes with previously characterized populationenriched variants
Table 5 includes 18 genes representing several classes of proteins primarily associated with non-immune functions that occur as population-enriched variants shown to impact biological function.These genes include transport membrane proteins, consisting of the multidrug resistance pump (ABCB1), the Niemann-Pick cholesterol transporter 1 (NPC1, SLC65A1), and the Na+-dependent multivitamin transporter (SLC5A6).Among the class of regulatory metabolic enzymes are alcohol dehydrogenase (ADH1C), mitochondrial dihydroorotate d e h yd r oge n a se (D H OD H) , hyd r oxy st e r oid ( 1 7-be t a) dehydrogenase 4 (HSD17B4) involved in peroxisomal fatty acid beta-oxidation, and glycogen phosphorylase B (PYGB) involved in regulating glycogen mobilization.Among the genes that participate in signal transduction are the membrane glycoprotein signaling coreceptor neuregulin (NRG1), phosphodiesterase 10A (PDE10A, which regulates cAMP concentrations), along with the small bioactive neuropeptide neuromedin B (NMB).Transcription factors and/or nucleic acid binding protein genes coded as population-enriched variants include hypoxia-inducible factor 2A (EPAS1, HIF2A), Iroquois homeobox 2 (IRX2), mismatch repair MutL homolog 3 (MLH3), the novel intracellular and extracellular ribonuclease T2 (RNASET2) and the SURP and G-Patch domain containing 1 (SUGP1) splicing factor.Also included are the lipid transport protein apolipoprotein B (APOB), the triacylglycerol lipase patatin-like phospholipase domain containing 3 (PNPLA3), and the adhesion cadherin family member desmoglein 2 (DSG2).

Multidrug resistance gene
The ATP binding cassette subfamily B member 1 (ABCB1) gene is commonly known as the first of two multidrug resistance (MDR1) genes in humans and is one of 48 ABC family members (217).ABCB1 functions at the plasma membrane as a 170 kDa monomer with 12 transmembrane domains (TMs), is glycosylated on the first extracellular loop (between TM1 and TM2), and has two intracellular ATP binding sites (one located between TMs 6 and 7, and the other in the carboxy terminus downstream of TM12).ABCB1 is expressed in a wide range of tissues (such as intestine, colon, placenta, liver, and blood-brain barrier) to protect against the intracellular build-up of xenobiotic molecules in vulnerable cells and organs by expelling toxins, including chemotherapeutics, from the cell interior.Thus, ABCB1 has become a widely-known source of and marker for chemoresistance (c.f.219).ABCB1 also functions as a broad specificity lipid translocase (326).In a Chinese cohort, a variant in the ABCB1 promoter showed pleiotropic effects related to T2D and lipid metabolism (221).Notably, the ABCB1 Ser893Ala variant (7-87531302-A-C, rs2032582) has been correlated with obesity in a Japanese population (220) and with increased susceptibility to lung cancer in a Spanish cohort (223).This ABCB1 variant occurs in 91% of Africans/African Americans, but in only 35-62% of other populations (gnomAD) and was shown to impact drug (etanercept) efficacy in the treatment of Chinese Han patients with ankylosing spondylitis (222).

Mismatch repair protein MutL homolog 3
MLH3 is a homolog of the mismatch repair protein MutL.DNA mismatch repair (MMR) proteins play a vital role in maintaining genome integrity and in antibody maturation during class switch DNA recombination and somatic hypermutation (276).In cases of microsatellite instability, tumors often display somatic mutations in MLH3, while hereditary nonpolyposis colorectal cancer type 7 (HNPCC7) has been associated with germline mutations in the same gene (276,327).Further, reduced MLH3 expression was observed in individuals diagnosed with grade II and III breast cancer, suggesting MLH3 may serve as a reliable susceptibility marker (278,328).There was no correlation between the MLH3 Pro844Leu variant (14-75047125-G-A, rs175080, predominantly found in the Middle East) and susceptibility to colorectal cancer in a predominantly white cohort (279).However, in Chinese patients this variant was associated with both cervical cancer (280) and hepatocellular carcinoma (281).

Apolipoprotein B
Lipoproteins enclose otherwise insoluble lipid particles (made up of a central core of cholesterol esters and triglycerides and an outer layer of phospholipids, free cholesterol, and apolipoproteins) for transport through the blood to various tissues (329).Apolipoprotein B (APOB) serves as the primary carrier for several classes of serum lipid particles, including chylomicrons, low-density lipoprotein (LDL), very low-density lipoprotein (VLDL), intermediate-density lipoprotein, and lipoprotein.In LDL particles, APOB interacts with the apoB/E (LDL) receptor, facilitating the removal of LDL cholesterol from the circulation via cellular uptake followed by intracellular LDL breakdown.In a small Japanese study correlating variants of genes related to lipid regulation (including apolipoproteins), the population-enriched missense APOB variant 2-21002409-C-T (rs1042034) correlated with HCV infection (235) variant has an allele frequency of 0.85 in African American populations but only 0.26 in East Asian populations (gnomAD).Another population-enriched missense APOB variant, 2-21008652-G-A (rs676210) (present in 73% of East Asians vs. 15% of Africans/African Americans (gnomAD)) correlated with the occurrence of initial non-cardioembolic ischemic stroke in a small European cohort (239).A third population-enriched missense APOB variant, 2-21028042-G-A (rs679899) (present in 85% of East Asians vs. 17% of Africans/ African Americans (gnomAD)) and was protective against acute coronary syndrome in a Mexican population (238).This was associated with both hypertension and chronic kidney disease in a cohort of 3696 Japanese individuals (240).
Functional effects of additional APOB missense variants have also been reported.The Arg3638Gln variant (2-21005955-C-T, rs1801701), which is present in no more than 10% of any population, was associated with survival outcomes in non-small cell lung cancer (NSCLC) patients (236).Additionally, two    nonsynonymous variants unique to the Asian population, namely 2-21006289-G-A (rs144467873, MAF = 0.001253 and 0.0003594 in East and South Asians, respectively, but < 0.00008 for all other populations (gnomAD v2.1.1)and 2-21029662-G-A (rs13306194, MAF = 0.1343 in East Asians, MAF < 0.007 in all other populations) were evaluated for their association with lipid profiles, metabolic syndrome and risk of diabetes in a large Taiwan Biobank study (237).Both variants were independently associated with total, LDL, and non-HDL cholesterol levels, whereas rs144467873 (Arg3527Trp) was associated with elevated lipid levels and metabolic syndrome, while rs13306194 (Arg532Trp) was linked with serum triglyceride levels.

Dihydroorotate dehydrogenase
Dihydroorotate dehydrogenase (DHODH), which catalyzes the initial and rate-limiting step of the de novo pyrimidine pathway, is positioned on the inner mitochondrial membrane (330).DHODH has been a therapeutic target for the treatment of rheumatoid arthritis, psoriasis, autoimmune disorders, and Plasmodium, bacterial and fungal infections (241).For over five decades, elevated DHODH expression has been known to promote tumor progression.De novo pyrimidine synthesis becomes essential during increased demands for nucleic acid precursors in rapidly dividing cells making cancer cells highly dependent on DHODH and suggesting that this enzyme is a strategic target for cancer therapy (245).Recently, DHODH was also shown to protect against mitochondrial ferroptosis by preventing the lipid peroxidation that triggers this phenomenon (244).Notably, cancer cells exhibit low levels of glutathione peroxidase 4 (GPX4) and inhibition of DHODH hinders respiration, boosts glycolysis and enhances GLUT4 translocation to the plasma membrane (246).This is further supported by the activation of the tumor suppressor p53, which elevates the levels of GDF15/MIC1 (another candidate listed in Table 4), a cytokine known for its appetite-reducing effects and ability to extend lifespan.DHODH inhibition that depletes pyrimidine ribonucleotides is also thought to be responsible for reduced RNA virus replication and decelerated growth in rapidly dividing cells, such as activated T cells and, as just mentioned, cancer cells (243).Interestingly, uridine, a pyrimidine nucleoside present in RNA, has been shown to modulate insulin activity and glycogen synthesis through its interaction with uridine diphosphate (UDP)-glucose (247).The base sequence of the DHODH gene is remarkably conserved, with one exception being a prevalent Lys7Gln missense polymorphism (16-72008783-A-C, rs3213422) found in its first exon (248).This variant is found in 75% of individuals in East Asia vs. 34% of individuals in the Middle East (gnomAD) and has been linked with drug (leflunomide) response to rheumatoid arthritis (248-250).

Population-enriched variants with unknown/uncharacterized function
No known effect on gross phenotype or evidence of association with disease has yet been reported among the population-enriched variants identified with the 20 genes listed in Table 6.However, a newly released resource, GWAS Central (457), was accessed to provide phenotype associations with a subset of variants in Table 6.Further, disease disparities related to the parent gene and/or other variants of the gene were identified and/or the predicted impact of a population-enriched variant on the coded change in protein function were evaluated and listed in Table 6.

Understudied genes SIPA1L2 and TVP23C
Among the 20 genes in Table 6, six of these remain understudied, including the exosomal CCDC105/TEKTL1, the putative protein disulfide isomerase CRELD2, the FAM131C protein with unknown function, the putative immune checkpoint ITPRIPL1 membrane protein, the presumptive neural GTPase activator SIPA1L2, and the putative vesicular protein transporter TVP23C.Notably, evidence of an impact on function does exist for one of two population-enriched variants of SIPA1L2 and one of three population-enriched variants of TVP23C.In the case of SIPA1L2, both characterized and uncharacterized variants occur at the same high frequency (MAF = 0.48) in East Asians, but Gly1639Ser increases the number of potential phosphorylation sites, whereas Thr1322Ala reduces them, which may result in different functional outcomes (e.g.changes in activation status and/or protein-protein interactions).In both SIPA1L2 variants, eight of nine possible transcripts code for missense mutations, whereas with TVP23C, only in the canonical transcript does the variant result in a missense mutation among five (Ser256Arg) or twelve (Trp202Arg and Ser199Thr) possible isoforms, some of which are read-through fusions with CDRT4 (CMT1A Duplicated Region Transcript 4).It is likely that the TVP23C Trp202Arg and Ser199Thr variants commonly co-occur, given their proximity to one another on the gene and their matching frequency distribution, as both have MAFs that range from 0.54 in East Asians to 0.28 in South Asians.Thus, one might speculate that the unknown functional impact of Ser199Thr matches that of Trp202Arg, which was found in a choriocarcinoma patient (458).Notably, choriocarcinoma shows a geographical disparity as it occurs at a ten-fold greater frequency in Southeast Asia than in the West (reviewed in 439).The third TVP23C variant Ser256Arg is most common among Africans/African Americans (MAF = 0.24) and involves the loss of a potential phosphorylation site about 50 amino acid residues downstream of the other two TVP23C variants.

Additional representative genes of interest
The remaining 14 genes in Table 6 are better characterized; notably, many have pleiotropic functions beyond the functions initially attributed to them.ATPase Phospholipid Transporting 10D (ATP10D) codes for the catalytic subunit of a glycoslyceramide flippase complex at the endoplasmic reticulum (ER), nucleoplasm, and plasma membrane.DnaJ Heat Shock Protein Family (Hsp40) Member B11 (DNAJB11) codes for an ER-resident and secreted co-chaperone of BiP/GRP78/HSPA5.Desmocollin 1 (DSC1) codes for an adhesive glycoprotein cadherin family member.The Immunoglobulin Like Domain Containing Receptor 1 protein (ILDR1) maintains structural  Predicted Impact: variant alters potential phosphorylation status (-Thr) in a disordered region of this transcription activator [344] between two zinc finger clusters of the canonical protein that bind DNA independently [456] Genes listed have been associated with innate immunity/inflammation, cancer, and cardio-metabolic disease and have at least one variant in the human genome that occurs in at least 20% (Minor Allele Frequency (MAF) ≥ 0.2) of one or more populations.Missense variants are described by their location in the GRCh38 reference genome (accessed from gnomAD v3.1.2),rs number (reference SNP cluster ID), and amino acid location numbers and identities of the original and coded replacement.Populations are defined by Karczewski 2020 (15): African/African American (AFR/AA), East Asian (E ASN), non-Finnish European (EUR), Latino/Latina (LAT), Middle Eastern (MID E), and South Asian (S ASN).The number of affected transcripts listed include total transcripts (first number) and transcripts with missense mutations (in parentheses) that contain the gene variant, but do not include transcripts of any overlapping genes.
DNA metabolism, nuclear import, and response to UV light.The Semaphorin 6D (SEMA6D) gene codes for an integral membrane protein member of the semaphorin family whose members collectively sculpt axonal paths, branches, conduction, and target selection; the distribution of nine SEMA6D transcript isoforms varies according to developmental stage and tissue type.Tre-2/ BUB2/CDC16 (TBC) Domain Family Member 4 (TBC1D4, also referred to as Akt Substrate of 160 kD or AS160) is a Rab-GTPase activator with multiple transcript variants; isoform 2 promotes SLC2A4/GLUT4 presentation at the plasma membrane to increase cellular glucose uptake (344).Thymocyte Expressed, Positive Selection Associated 1 (TESPA1) interacts with COP9 and TCR signalsomes and participates in T cell differentiation and T cell receptor signaling.Three zinc finger (ZNF) proteins ZNF23, ZNF267, and ZNF628 localize to the nucleus and regulate transcription.Parent genes and the corresponding populationenriched variants of the common cytokine receptor beta chain CSF2RB and the transcription co-repressor RB1 are both discussed below.

CSF2RB
Colony stimulating factor 2 receptor beta (CSF2RB, CD131) forms dimers with the alpha receptor subunits for cytokines IL-3, IL-5, and GM-CSF (CSF2).As noted above, a population-enriched variant of the IL3RA subunit also exists, although the population distributions of these two variants are very different: the Val323Leu IL3RA variant is found least frequently among Africans/African Americans (MAF = 0.06, Table 4), whereas the Glu249Gln CSF2RB variant is more predominant in Africans/African Americans than any other population (MAF = 0.21).
CSF2RB is associated with pulmonary alveolar proteinosis (PAP), which involves the accumulation of surfactant and macrophage dysfunction in alveoli (reviewed in 462).Although studies so far have not suggested geographic or population differences in PAP occurrence, the most common PAP co-morbidities include cardiovascular disease, type 2 diabetes, and hypertension, all of which are unevenly distributed among populations.Further, a rare Arg461Cys CSF2RB variant (MAF< 0.001, not listed in Table 6) was found in individual patients with leukemia (355) and breast cancer (356).Notably, both of these cancers show racial and ethnic disparities [430 and 352 respectively].

RB1
Retinoblastoma (RB1) was one of the first tumor suppressors to be identified.Alterations in the expression and sequence of the RB1 gene have been implicated in several cancers besides retinoblastoma where they were originally characterized (reviewed in 391).More than 40 years of extensive research indicates that regulation of and by RB1 is highly complex, linked with multiple signaling pathways, and varies with context.Not surprisingly, the number of proteins shown to interact with RB1 is more than 30 as curated by UniProt (344) and more than 150 as curated in BioGRID (463) and IntAct (464).The functional diversity of the binding partners of RB1 is consistent with its pleiotropic effects, which extend beyond transcription and cell cycle control to include progenitor maturation, terminal differentiation, and immune evasion (391).
Five protein coding transcripts of RB1 have been identified.These include 1) the MANE select (canonical) protein composed of 27 exons encoding a total of 928 aa residues; 2) a closely related transcript that is 5 as shorter and differs from the canonical protein by 18 of its last 19 C-terminal residues; and 3) three much shorter transcripts (coding for 53, 103 or 110 aa peptides) which include all or portions of only 2 or 3 exons of the canonical protein.Of these shorter transcripts, the two shortest are derived from the Nterminal portion of RB1.In contrast, the 110 aa non-canonical transcript codes for an unidentified N-terminal residue equivalent to the Ser501 residue of the canonical protein and then aligns with all canonical residues up through Ser565; the remaining noncanonical aa residues 66-110 are located downstream of the canonical C-terminal residue 928.It is in this extra-exonic portion of the non-canonical 110 aa RB1 isoform that the Leu99Ser population-enriched variant, which introduces a potential phosphorylation site, is found.In spite of the high number of aa residues (n ≥ 105) in the canonical RB1 protein that are known to be post-translationally modified, within the aa 501-565 residue range that overlaps with the first 65 residues of the 110 aa isoforms, only two potential ubiquitination sites have been identified in the vicinity of aa 550) (391).

Conclusion
Population studies have traditionally focused on querying individual diseases or combinations of diseases, including cancer and cardio-metabolic disease, which frequently show disparate prevalence and/or severity in non-European populations.In this perspective, we have introduced a complementary approach that explores the intersection of innate immunity, cancer, and cardiometabolic diseases.The effective elimination of disease disparities will involve not only addressing the profound social and behavioral determinants of health, but also identifying and treating the biological contributors of disease that include novel genes as well as previously characterized genes that participate in novel pathways.
We suggest that careful evaluation of population differences in conventional and unconventional innate immune genes and their related pathways will provide key insights into the underlying mechanisms that connect cancer and cardio-metabolic diseases.At the same time, the genes we have identified in this study that are associated with both cancer and cardio-metabolic diseases may play critical roles in under-appreciated facets of innate immunity and their contribution to disease disparities.Further, we predict that the geographic ancestral distribution of innate immune gene variants will match the geographical distribution of the environmental stressors (including but not limited to infectious agents) that they are designed to mitigate as described above for HbS and DARC variants with malaria (Section 1.3).
The genes we have identified serve as potential targets for diagnostics and/or therapeutic interventions.Notably, the development and clinical use of therapeutics targeting these candidate genes is likely to require a nuanced approach since variations in these genes across different global populations are likely to alter the activity and/or expression of their coded proteins, with the subsequent potential to impact therapeutic outcomes.Assessing the prevalence of specific target variants in one or more major populations and, more precisely, the presence of these specific target variants in individuals is a consequential step towards increasing the safety and effectiveness of emerging therapies.This perspective highlights the importance of 1) considering genetic diversity in identifying and developing treatments and 2) continuing to incorporate ongoing GWAS projects as they identify and characterize new or understudied genes and their population-enriched variants associated with complex and infectious diseases.

TABLE 1 Ethnic
Disparities in US Cancer Incidence and Mortality.

Table 9 .
Incidence and Mortality Rates for Selected Cancer by Race and Ethnicity, US"(6).standard font indicates most frequently occuring cancer among aggregate populations; italics indicate most frequently occuring cancer for a specific ethnic group (not aggregate).

TABLE 2
Deaths from Cancer, Cardio-Metabolic, and Infectious Diseases in the US as of 2018.

TABLE 3
Innate immune genes that provide protection against malaria (adapted from 41, 42).

TABLE 4
Candidate Conventional Innate Immune Genes at the Intersection of Cancer and Cardio-Metabolic Disease.
Genes listed have been associated with innate immunity/inflammation, cancer, and cardio-metabolic disease and have at least one variant in the human genome that occurs in at least 20% (Minor Allele Frequency (MAF) ≥ 0.2) of one or more populations.Missense variants are described by their location in the GRCh38 reference genome (accessed from gnomAD v3.1.2),rs number (reference SNP cluster ID), and amino acid location numbers and identities of the original and coded replacement.Populations are defined by Karczewski 2020 (15): African/African American (AFR/AA), East Asian (E ASN), non-Finnish European (EUR), Latino/Latina (LAT), Middle Eastern (MID E), and South Asian (S ASN).The number of affected transcripts listed include total transcripts (first number) and transcripts with missense mutations (in parentheses) that contain the gene variant, but do not include transcripts of any overlapping genes.

TABLE 5
Candidate Unconventional Innate Immune Genes at the Intersection of Cancer and Cardio-Metabolic Disease.

TABLE 5 Continued
Genes listed have been associated with innate immunity/inflammation, cancer, and cardio-metabolic disease and have at least one variant in the human genome that occurs in at least 20% (Minor Allele Frequency (MAF) ≥ 0.2) of one or more populations.Missense variants are described by their location in the GRCh38 reference genome (accessed from gnomAD v3.1.2),rs number (reference SNP cluster ID), and amino acid location numbers and identities of the original and coded replacement.Populations are defined by Karczewski 2020 (15): African/African American (AFR/AA), East Asian (E ASN), non-Finnish European (EUR), Latino/Latina (LAT), Middle Eastern (MID E), and South Asian (S ASN).The number of affected transcripts listed include total transcripts (first number) and transcripts with missense mutations (in parentheses) that contain the gene variant, but do not include transcripts of any overlapping genes.

TABLE 6 Continued
[440]se Disparity: reduced expression of this tumor repressor gene in ovarian and endometrial cancers[440]; oviarian cancers show ethnic disparities[364]Predicted Impact: variant occurs in a putative N-terminal strong transcriptional repressor KRAB domain [442], loss of Ser may alter activity and/or binding interactions Note: ZNF23 KRAB domain is truncated and does not appear to alter repressor activity[440], however not all ZNF23 interactors (such as mitochondrial ATPAF2.keratin-associated KRTAP10-8, myelin-associated MOBP, growth factor signaling regulators SPRED1 and SPRY1, and TNFR associated adaptor TRAF1), are transcription factors