Role of Genetic Ancestry in 1,002 Brazilian Colorectal Cancer Patients From Barretos Cancer Hospital

Background: Colorectal cancer (CRC) is the third most frequent and the second deadliest cancer worldwide. The ethnic structure of the population has been gaining prominence as a cancer player. The purpose of this study was to determine the genetic ancestry of Brazilian CRC patients. Moreover, we intended to interrogate its impact on patients' clinicopathological features. Methods: Retrospective observational cohort study with 1,002 patients with CRC admitted from 2000 to 2014 at Barretos Cancer Hospital. Following tumor DNA isolation, genetic ancestry was assessed using a specific panel of 46 ancestry informative markers. Survival rates were obtained by the Kaplan–Meier method, and the log-rank test was used to compare the survival curves. Multivariable Cox proportional regression models were used to estimate hazard ratios (HRs). Results: We observed considerable admixture in the genetic composition, with the following average proportions: European 74.2%, African 12.7%, Asian 6.5%, and Amerindian 6.6%. The multivariate analysis for cancer-specific survival showed that clinical stage, lymphovascular invasion, and the presence of recurrence were associated with an increased relative risk of death from cancer (p < 0.05). High African proportion was associated with younger age at diagnosis, while high Amerindian proportion was associated with the mucinous histological subtype. Conclusions: This represents the larger assessment of genetic ancestry in a population of Brazilian patients with CRC. Brazilian CRC patients exhibited similar clinicopathological features as described in Western countries. Impact: Genetic ancestry components corroborated the significant admixture, and importantly, patients with high African proportion develop cancer at a younger age.


INTRODUCTION
confirmed Lynch syndrome cases, 1.5% (15/1,002) confirmed familial adenomatous polyposis (FAP) syndrome cases, and 0.4% (4/1,002) of unclassified hereditary syndrome (21). Clinicopathological and treatment data of CRC patients were collected from patient medical records. The present study evaluated 21 variables. The seventh edition of the American Joint Committee on Cancer (AJCC) was used for tumor staging. The Institutional Ethics Committee approved the study (protocol number: 600/2012-CAAE: 02468812.30000.5437).

Genetic Ancestry Determination
DNA samples were recovered from formalin-fixed paraffinembedded (FFPE) tissue of tumor specimens obtained from surgical or endoscopic procedures. The DNA was isolated using the DNA Micro kit (Qiagen), according to the method previously established by our group (22).
The ancestry of the patients was determined using ancestry informative markers (AIMs) as previously reported (14,(23)(24)(25). Briefly, 46 small insertion-deletion (INDEL) polymorphisms were ascertained to maximize the divergence between four human major population groups: Amerindian (AME), European (EUR), African (AFR), and East Asian (ASN). These markers were selected due to their high allele frequency divergence between different ancestral or geographically distant populations, including more than 1,000 individuals from 40 reference populations from the Human Genome Diversity Project (HGDP)-Centre d/Etude du Polymorphisme (CEPH), plus individuals from Angola, Portugal, Taiwan, and indigenous Brazilian, which allowed to establish the ancestral proportions in high admixture individuals and populations, like the Brazilian one (14). Moreover, they were assembled in a simple multiplex reaction following a short amplicon strategy, adequate for challenging samples such as FFPE (15,26,27). The primer sequences and PCR conditions were according to Giolo et al. (14).
After DNA extraction, and multiplex PCR with 46 primers, the amplified products were further subjected to capillary electrophoresis and fragment analysis on an ABI 3500 Genetic Analyzer (Applied Biosystems) according to the manufacturer's instructions. These 46 INDELs are used mainly to estimate ancestry proportions in admixed populations and assess the structure of those populations. Two observers independently analyzed the electropherograms, and the genotypes were automatically assigned with GeneMapper Software v4.1 (Applied Biosystems).
The ancestry ratios were evaluated using the Structure Software v2.3.4 (23,24,28,29), considering the four main population groups, AME, EUR, AFR, and ASN, as possible contributors to the current Brazilian genetic composition. Briefly, the data available for the HGDP-CEPH panel were used as a reference for the ancestral populations, and a supervised analysis was performed to estimate ancestry relationship proportions of the individuals involved in the study. The Structure software runs considering K = 4 consisted of 100,000 burning steps followed by 100,000 Markov Chain Monte Carlo iterations. The option "Use population information to test for migrants" was used with the admixture model, considering allele frequencies The overall survival (OS) and the cancer-specific survival (CSS) rates were obtained using the Kaplan-Meier method. Survival rates were estimated in months. Survival was defined as the period from diagnosis to the date of death or the time at which information was last obtained. For the analysis, the event of interest was death by any cause for OS and death related to cancer for CSS. Cases that were alive were censored for OS, and cases that were alive or dead from other causes were censored for CSS. Such information was obtained through direct consultation to the death certificate or medical records. The follow-up median of our sample was 62.0 months. The log-rank test was used to compare survival curves, and results were considered significant when the p < 0.05.
Multiple confirmatory models were used to check whether genetic ancestry component (AFR, EUR, ASN, AME) by AIMs panel was related to the prognosis of CRC. Multivariable Cox proportional hazards regression models were used to estimate hazard ratios (HRs) and 95% confidence intervals (CIs) for the variables with p < 0.20 in univariate analyses and adjusted with treatment period and genetic ancestry components by AIMs panel. Fisher exact test was used for association analysis.
For tabulation and statistical analysis, the IBM R SPSS R Statistics 21.0 software for Windows (IBM Corporation, Route 100, Somers, NY 10589) was used. The level of statistical significance was set at 0.05 for all analyses.

Clinicopathological Features
The present study included 1,002 cases, and the main clinicopathological features are summarized in Table 2. A detailed description of therapeutic regimens is shown in Supplementary Table 2. There were more men than women in

Genetic Ancestry
The present study also aimed to evaluate the genetic ancestry of the patients, which was performed in 934/1,002 (93.2%) of the cases. In a small subset of cases (n = 68), the genetic ancestry could not be evaluated due to low quantity and poor-quality DNA. We observed a great admixture in genetic composition, with the following averages of ancestral proportions: AFR 12.7% (SD = 15.7%), EUR 74.2% (SD = 20.6%), ASN 6.5% (SD = 11.3%), and AME 6.6% (SD = 7.1%) (Figure 1). The average of each genetic ancestry component according to the Brazilian state of origin is plotted in Figure 2. The ancestry proportions were further categorically defined as low, intermediate, and high based on tercile distribution ( Table 1 and Supplementary Table 1). We further investigated the association of genetic ancestry with patients' clinicopathological characteristics ( Table 3). We observed significant associations between the AFR component and younger age at diagnosis (p = 0.013), Brazilian region of origin (p < 0.001), and recurrence of the disease (p = 0.034). For the EUR component, we found significant associations with the region of origin (p < 0.001), adenocarcinoma (p = 0.023), higher histological grade (p = 0.040), and presence of synchronous tumors (p = 0.012). For the AME component, a significant association with the mucinous histological type (p = 0.033) was observed.

Survival Analysis
An initial univariate analysis of survival was performed, including 1,002 individuals: 489 events occurred in OS, and 422 events occurred in CSS. The probability of patients living for more than 5 years was 58.2% for OS and 62.3% for CSS ( Table 4). Several significant associations were observed between OS and CSS and patients' features, including gender, clinical stage, histological type, histological grade, lymphovascular invasion, perineural invasion, presence of recurrence, treatment period, neoadjuvant chemotherapy, adjuvant chemotherapy, and radiotherapy ( Table 3 and Supplementary Figure 2). On the univariate survival analyses (OS and CSS), the genetic ancestry categorically defined as low, intermediate, and high based on terciles was not associated with CRC survival ( Table 4).
The multivariate analysis for CSS adjusted by treatment period and genetic ancestry components showed that clinical stage, lymphovascular invasion, and the presence of recurrence were associated with an increased relative risk of death from cancer (p < 0.05), whereas adjuvant chemotherapy was associated with a lower risk of death ( Table 5). These results are explained by the different therapeutic approaches used in distinct clinical stages (Supplementary Tables 3, 4).

DISCUSSION
CRC is one of the most common neoplasms in men and women worldwide (3,30,31). Although its incidence is declining in the US and other western countries (32); in others, including Brazil, we are still witnessing an increase in the number of cases, and it is a major public health problem. In this study, we intended to characterize the genetic ancestry of an extensive series of 1,002 CRC patients admitted at the Barretos Cancer Hospital. Knowing that the Brazilian population is ethnically one of the most heterogeneous in the world (14,18), with an essential contribution from the main ethnicities that formed the background of our population, we also intended to correlate the ancestry components (EUR, AFR, ASN, and AME) measured genetically with the different clinical-pathological factors and its prognostic role.
There was a slight male predominance, with an incidence of 1.08. In all regions of the world, despite the similarities between genders, the rates were higher for males (vs. females, 1.3) in the American population (2,33), as well as in Europe (1) and Asia (34). Others have a higher incidence among women in the colon (4,35).
The main studies divide the samples into three age categories: below 50, between 50 and 75, and above 75 years old. The age of 50 years old is critical to differentiate between hereditary and sporadic CRC cases. This age limit has been used in the Amsterdam criteria (36,37) and also to recommend screening colonoscopic examination for people at average risk for CRC (38,39). Although it has been reported that 21 to 33% of patients are older than 75 years [Surveillance Epidemiologic and End Results (SEER)] (40), they may account for more than 40% and are underrepresented in the clinical studies. These clinical studies use in their inclusion criteria an age group of up to 75 years old as a limit to be treated (41)(42)(43)(44), mainly due to comorbidities. Therefore, we adopted the upper limit range as those with 75 or more years old (45).
The mean age at diagnosis in our population was 57.7 years (SD = 13.8), below the American age of 68 years (31) and the European age of 72 years (45). The predominant age group in our population was between 50 and 75 years old (60.5%), similar to that in the SEER (31) data. Our population had a high incidence of patients younger than 50 years old (28.9%), higher than the 20% reported in studies including North American populations (31) and Asian patients (3-14%) (46). This finding can be due to the inclusion criteria and to the potential presence of some hereditary cases in the present analysis. In the present study, patients with a known and genetically confirmed familial history of Lynch or APC represented <4% of cases (21); however, we cannot rule out the existence of hereditary cases in the cohort. Since 1992, the incidence in cases under 50 has increased by 1.5% per year (3,45), especially from 20 to 34 years. According to the American College of Gastroenterology (39), colorectal cancer screening begins at age 50, except for those of African origin, where it is recommended to start at age 45 (47). Moreover, some studies even question to initiate at 40 years old (48). In concordance with these findings, we observed that Brazilian CRC patients depicting higher African proportion were associated with younger age of disease onset.
The importance of primary tumor location, being associated with distinct clinical-pathological features, as well as a differential prognostication has been widely discussed. For this, we performed the categorization of the cases included into the right colon, left colon, and rectum (49)(50)(51)(52)(53). In our population, 25% of the tumors were in the right colon. This percentage is within the average of other studies that ranged from 22.7 to 39% (49). However, in contrast, we did not find that laterality was associated with disease outcome.
Another critical variable is the TNM staging. In our study, the majority of cases were stage II (37.6%), followed by III (33.2%) and IV (16.7%). The percentage of stage IV at diagnosis is in agreement with several regions of the world (31,45,54).
Another goal of our study was to evaluate the main prognostic factors in our CRC patients. To this end, we estimated the OS and CSS and correlated with the different variables collected and selected in the multivariate analysis. The followup median of our sample was 62.0 months, very similar to the SEER that was 65.2 months (31). In the study of OS and CSS, we interrogated whether the variables selected in the multivariate analysis would be influenced by other variables such as the treatment period and the genetic ancestry components. Therefore, following adjustment of both variables, namely, treatment period (patients treated from 2000 to 2009 and from 2010 to 2014, where the introduction of the molecular target drugs, such as cetuximab, were included by the Department of Oncology of the Barretos Cancer Hospital), and ancestry, a multivariate analysis was performed.
The multivariate analysis for OS and CSS adjusted by genetic ancestry showed that the clinical stage, lymphovascular invasion, and recurrence of the disease were associated with an increased relative risk of death from cancer. In contrast, adjuvant chemotherapy was associated with a better outcome, as expected.
About 1/3 of our patients had lymphovascular invasion. The association of lymphovascular dissemination and adverse outcomes (55, 56) is well-described, besides being a known definer regarding therapeutics, especially in stage II (3).
The ancestry of the individuals assumes importance concerning its association with specific pathologies, immunological, and therapeutic responses, yet in the vast majority of studies, it is not evaluated (57,58). Currently, with the availability of molecular tools for genetic studies, self-declaration and/or family origin can no longer be a proxy/authentication of the ancestral origin of an individual or population, especially in regions with a high degree of population admixture such as Brazil (57). However, in a large number of studies, skin color alone is used to assert ethnical origin of CRC patients (7-9, 12, 59). There is an extensive amount of studies, based on self-declaration, suggesting that black skin color patients have a higher incidence and lower survival of CRC (7,12). However, it is unclear whether the ancestral component alone would influence survival or whether there are other confounding factors, such as less reference to screening methods (60), presence of a higher number of comorbidities diagnosis (7), lack of access to treatment services (61), or low educational/economic level, which can justify this fact (11,60).
There are several ways to analyze the genetic composition of a particular population, and the selection criteria of genetic markers may diverge between studies, originating different values of the ethnic groups (62) for the Brazilian population. In our study, we analyzed four major ethnic compositions (EUR, AFR, ASN, and AME) using AIMs according to previous studies (13,14,18,63). The present study was retrospective, dependent on the collection of information in medical records, and there was no mention of self-declaration of skin color. Therefore, our data collection instrument did not contemplate this aspect. If we had this information, we could have carried out a cross-referencing of information between the genotype and the phenotype to try to evaluate the fidelity of the latter.
When ancestry was measured genetically, we did not evaluate the data individually, but rather the four ancestor components that form the demographic base of Brazil. Our study did not intend to assess the causal relationship between genetic ancestry and CRC cancer as already done in other studies (64), but rather to correlate them with the various clinical-pathological characteristics of the patients.   As expected, the predominant ancestral component was the EUR one, with an average of 74% followed by the AFR with 13%, and by the ASN and AME with 7%, agreeing with previous studies of the Brazilian population (13,63,65). In agreement with other studies (62), a predominance of the European ancestral component in the Southeast and South regions was observed (63). Some differences in our study were observed, for example, the African component concentrated more strongly in the north region, unlike other studies based on the mitochondrial DNA (mtDNA) and not on autosomal AIM-INDELs, where this happened more in the Northeast Brazilian region (13,17,62,63). The contribution of Asian ancestry in the northeast region of 5% is very close to that of the regions known to be colonized by Asians, but this may perhaps be explained first by the small sample size representing this region in our study and/or the proximity of the gene pool between Amerindians and eastern Asians considering the modern history of these human groups (14). The high SDs identified in our study show how miscegenated our Brazilian population is.
When we evaluated the individual components separately, we found that the European ancestral component was significantly associated with the absence of synchronous tumors. The African component was associated with younger patients, in agreement with other international studies (7,8,10). The Amerindian component predominated in the Northern region which correlates with other studies and is corroborated by the IBGE (Brazilian Institute of Geography and Statistics) selfdeclaration assessment (13,18,63,66). Interestingly, we observed an association of the Amerindian component with mucinous histological type.
In our study, there was no correlation between the different ancestry proportions and patient survival. However, some North American studies reported an association of African ancestry based on self-declaration with tumors located in the right colon (67) and that these would be associated with more aggressive behavior histopathology, which would lead to worse survival (7,11,12).
Finally, despite the exciting and important findings, this study harbors some limitations, such as the retrospective nature of the study, based on the analysis of medical records, which often do not have complete and accurate information. The extent of the AIM panel could also be arguably higher. Nonetheless, the employed AIM indel set harbors a sufficient number of markers sparsely distributed throughout the genome and is simply analyzed in a multiplexed short-amplicon strategy, which are desirable characteristics considering the challenging nature of the source tumor samples included in our study. Despite the large number of patients and their diverse geographic origin, it does not represent all Brazilian states and the fully ethnical diversity of the Brazilian population, so further studies are warranted to extend our findings.

CONCLUSION
This pioneering work determined the genetic ancestry profile of more than 1,000 Brazilian patients diagnosed with CRC from a single oncology reference center. We described the main clinicopathological features of the population and observed that patients with a high African proportion develop cancer at a younger age. The present study can contribute to drawing a nationwide portrait of Brazilian CRC patient and may help in the design of management strategies for these patients.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Barretos Cancer Hospital Ethics Committee.
Written informed consent was not provided because this was a retrospective study.

AUTHOR CONTRIBUTIONS
RD participated in study design, collection of the data, analyzed the data, and prepared the manuscript. GB participated in the collection of the data, carried out the ancestry experiments, and analyzed the data. AC participated in the collection of the data, analyzed the data, and preparation of the manuscript. CS-N participated in the collection of the data, re-evaluation of the diagnosis, and participated in the ancestry experiments. RP participated in the ancestry experiments and analyzed the data. MO participated in the statistical analysis and analysis of the results. DG participated in study design and analysis of the results. RR participated in study design, coordination, analysis of the results, and prepared the manuscript. All authors read and approved the final manuscript.