Breast cancer subtype and clinical characteristics in women from Peru

Introduction Breast cancer is a heterogeneous disease, and the distribution of the different subtypes varies by race/ethnic category in the United States and by country. Established breast cancer-associated factors impact subtype-specific risk; however, these included limited or no representation of Latin American diversity. To address this gap in knowledge, we report a description of demographic, reproductive, and lifestyle breast cancer-associated factors by age at diagnosis and disease subtype for The Peruvian Genetics and Genomics of Breast Cancer (PEGEN-BC) study. Methods The PEGEN-BC study is a hospital-based breast cancer cohort that includes 1943 patients diagnosed at the Instituto Nacional de Enfermedades Neoplásicas in Lima, Peru. Demographic and reproductive information, as well as lifestyle exposures, were collected with a questionnaire. Clinical data, including tumor Hormone Receptor (HR) status and Human Epidermal Growth Factor Receptor 2 (HER2) status, were abstracted from electronic medical records. Differences in proportions and mean values were tested using Chi-squared and one-way ANOVA tests, respectively. Multinomial logistic regression models were used for multivariate association analyses. Results The distribution of subtypes was 52% HR+HER2-, 19% HR+HER2+, 16% HR-HER2-, and 13% HR-HER2+. Indigenous American (IA) genetic ancestry was higher, and height was lower among individuals with the HR-HER2+ subtype (80% IA vs. 76% overall, p=0.007; 152 cm vs. 153 cm overall, p=0.032, respectively). In multivariate models, IA ancestry was associated with HR-HER2+ subtype (OR=1.38,95%CI=1.06-1.79, p=0.017) and parous women showed increased risk for HR-HER2+ (OR=2.7,95%CI=1.5-4.8, p<0.001) and HR-HER2- tumors (OR=2.4,95%CI=1.5-4.0, p<0.001) compared to nulliparous women. Multiple patient and tumor characteristics differed by age at diagnosis (<50 vs. >=50), including ancestry, region of residence, family history, height, BMI, breastfeeding, parity, and stage at diagnosis (p<0.02 for all variables). Discussion The characteristics of the PEGEN-BC study participants do not suggest heterogeneity by tumor subtype except for IA genetic ancestry proportion, which has been previously reported. Differences by age at diagnosis were apparent and concordant with what is known about pre- and post-menopausal-specific disease risk factors. Additional studies in Peru should be developed to further understand the main contributors to the specific age of onset and molecular disease subtypes in this population and develop population-appropriate predictive models for prevention.


Introduction
Globally, breast cancer is the most commonly diagnosed cancer and the leading cause of cancer death in women (1,2). Breast cancer risk and mortality vary based on several risk factors. Age, race/ ethnicity category, family history, genetics, lifestyle, anthropometric, reproductive, and hormonal factors have been associated with the risk of developing breast cancer (3)(4)(5). In addition, tumor subtype, socioeconomic status, education level, and access to care have been shown to impact mortality after diagnosis (6,7). Analyses stratified by race/ethnicity category have shown that despite sharing risk factors for developing breast cancer, disease risk, clinical characteristics, and risk of mortality differ between populations (6,(8)(9)(10). For example, U.S. Hispanics/Latinas (H/Ls) are less likely to develop breast cancer than non-Hispanic White (NHW) and African American women (11). However, after diagnosis, H/L women are at higher risk of mortality compared with NHW women (12).
The use of gene expression profiles for molecular classification of breast cancer tumors (i.e., PAM50) has identified three main intrinsic subtypes: Luminal (A and B), HER2-enriched, and Basallike (13,14). A combination of immunohistochemical markers for estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor 2 (HER2) are routinely used in clinic to classify tumors into these subtypes and to provide relevant information for individualized therapeutic decision making. Hormone receptor (HR) positive tumors, defined by ER and/or PR expression, are classified as HR+HER2− and HR+HER2+, based on the HER2 expression status, and are overrepresented among luminal intrinsic subtypes. HR−HER2+ and HR−HER2− are overrepresented among HER2-enriched and basal-like subtypes, respectively. Besides chemotherapy, patients with an HR+ disease diagnosis can benefit from endocrine therapy, such as tamoxifen or aromatase inhibitors (15), whereas patients with HER2+ tumors can be treated with anti-HER2 therapy (mainly trastuzumab and pertuzumab) (16). For the HR−HER2− subtype, treatment options are limited. Currently, these patients receive systemic therapy, although targeted therapies, such as PARP and immune checkpoint inhibitors, are being evaluated in clinical trials and approved for BRCA1 and BRCA2 mutation carriers (17).
Multiple studies have suggested heterogeneity in the association between established breast cancer risk factors and tumor subtype. Family history of breast cancer in a first-degree relative is associated with increased breast cancer risk (3,18,19), and specific patterns of cancer family history increase the risk of particular tumor subtypes (20,21). For example, having one firstdegree relative with a history of breast cancer was shown to be associated with increased risk of HR+ subtypes, whereas having two or more was associated with increased risk of HR− disease (20,21). However, some studies have failed to confirm these findings (3,(22)(23)(24). Among reproductive factors, early menarche, and late menopause increase the risk of developing breast cancer (3,20,(25)(26)(27) with no evidence of heterogeneity by tumor subtype (3,20,26,27). Parity is associated with reduced risk of HR+ disease (3,19,20,(27)(28)(29)(30)(31)(32)(33) and increased odds for developing HR− subtypes (3,24,27,31,(33)(34)(35) in populations of European and African origins. Some studies have reported that older age at first full-term pregnancy was associated with increased risk of HR+ disease (27, 28,30). Longer breastfeeding history is associated with reduced breast cancer risk with lower odds of developing HR− tumors (19, 20, 25-28, 30-34, 36). Among African Americans, prolonged lactation is associated with reduced risk of HR−, but not HR+ disease, with an increased risk of HR− disease among parous women who have not breastfed (34,37). This observation has also been described among NHW women (32). Reports on lifestyle factors, such as alcohol intake and smoking history, have shown heterogeneity by tumor subtype, with a stronger association with HR+HER2− subtypes (3,38).
The effects of some of the abovementioned factors are different among pre-and post-menopausal women. Controversial evidence shows that high BMI (obesity) is protective against breast cancer in premenopausal women, and conversely, it suggests that obesity increases the risk in postmenopausal women (39,40), especially for HR+ subtypes (41-43). Other factors known to affect breast cancer risk in both groups in the same direction can present different magnitudes of the effect by menopausal status, such as alcohol intake (44), physical activity (45, 46), and breastfeeding (47).
Previous studies have assessed the association of breast cancer risk with numerous structural, social, environmental, and genetic factors (4,(48)(49)(50); however, these studies are primarily composed of individuals of European origin. Few breast cancer studies describe patient characteristics in Latin America (26,(51)(52)(53)(54), a region characterized by cultural and genetic heterogeneity (55)(56)(57). For example, Indigenous American genetic ancestry estimates vary across different Latin American countries, ranging between~5% in Puerto Rico and~80% in Peru and Bolivia (56-58). Previous studies have identified that the degree of Indigenous American genetic ancestry may modify the magnitude and direction of association with currently known breast cancer risk variants among H/L women (59) and is associated with differential lifestyle risk factors (60). Latin American cohorts with high proportions of Indigenous American ancestry are underrepresented in breast cancer research (61).
The Peruvian Genetics and Genomics of Breast Cancer Study (PEGEN-BC) is a hospital-based cohort including patients from the Instituto Nacional de Enfermedades Neoplaśicas (INEN) in Lima, Peru. We have previously described the distribution of demographic, anthropometric, reproductive, lifestyle, and clinical factors for 1,312 breast cancer participants, with an emphasis on the distribution by breast tumor subtypes (62). Moreover, we reported that increasing Indigenous American ancestry is associated with higher odds of developing the HR−HER2+ subtype (62). The current report aims to provide a more complete and updated description of these variables by tumor subtype and age at diagnosis, including a total of 1,943 breast cancer patients, highlighting potential heterogeneity in the latter categories.

Study participants
The Peruvian Genetics and Genomics of Breast Cancer Study (PEGEN-BC) is a hospital-based cohort study. As of April 2022, we have recruited 1,943 participants from the INEN in Lima, Peru. Women were invited to participate if they had a diagnosis of invasive breast cancer in 2010 or later and were between 21 and 79 years of age when diagnosed. A blood sample was drawn by a certified phlebotomist at the INEN central laboratory. The present report includes analyses with a subset of 1,796 patients with available genetic ancestry estimates (63). This study was approved by the INEN and the University of California Davis Institutional Review Boards. All individuals provided written informed consent to participate.

Data collection
Each PEGEN-BC participant completed a standardized survey administered by a trained research coordinator at INEN. The survey includes questions regarding anthropometric (weight and height), demographic (place of birth and residence), lifestyle (alcohol intake and smoking history), and reproductive (menopause status, age at first pregnancy, number of full-term pregnancies, and breastfeeding history) variables, and family history of breast cancer. Weight and height were assessed by trained nurses/professionals at INEN at the time of diagnosis. Body mass index (BMI) was calculated as weight (kilograms) divided by height (meters) squared and categorized as underweight (BMI < 18.5 kg/m 2 ), normal (BMI ≥ 18.5 < 25 kg/ m 2 ), overweight (BMI ≥ 25 < 30 kg/m 2 ), and obese (BMI ≥ 30 kg/ m 2 ). Alcohol use was assessed as the self-reported frequency of glasses of alcohol consumed per day and categorized as < 1 glass/ day, > 1 glass/day, and non-drinker (never). Smoking status was classified into "ever" (current and former) and "never." If there was a history of familial breast cancer, the relative (i.e., mother, sister, and aunt) was indicated to determine cases with breast cancer family history in a first-degree relative. Clinical variables, including ER, PR, HER2, lymph node status, tumor grade, and clinical stage, were extracted from electronic records.
Genetic ancestry estimates for 1,796 PEGEN-BC participants were available from a previous study (63). Briefly, genome-wide genotype data obtained with the Affymetrix Precision Medicine Array were pruned using PLINK v. 1.9 (64) [window size = 50, number of variants = 5, variance inflation factor threshold = 2] and merged with data from four reference populations from the 1000 Genomes project (65): Admixed Americans (Peru, Colombia, Mexico, Puerto Rico), Europeans (Americans with Northern and Western European Ancestry, Italy, Spain, Finland, Scotland), East Asians (China, Japan, Vietnam), and African populations (Nigeria, Kenya, Gambia, Sierra Leone). Individual continental, global genetic ancestry was estimated using ADMIXTURE (66) (unsupervised, k = 4), including 122,605 independent variants. The PEGEN-BC study includes a large proportion of patients with > 98% Indigenous American ancestry, as previously reported (62), and therefore provides a source of nonadmixed reference samples for this component.
Tumoral tissues were obtained from core biopsy or freshly resected invasive breast cancers pre-treatment that were formalinfixed and paraffin-embedded following standard protocols at INEN. Tumor subtypes were defined using immunohistochemistry (IHC) markers by a certified pathologist at INEN. HR positivity was defined at 1% or more cells showing ER and/or PR staining. HER2 positivity was defined as 3+ staining by IHC or by gene amplification detected by fluorescence in situ hybridization following a 2+ (borderline) IHC result. These markers were used to classify tumors as HR+HER2−, HR+HER2+, HR−HER2+, and HR−HER2−. Two independent pathologists from the University of California San Francisco reviewed the IHC slides at INEN for a subset of 52 patients. The concordance rate was 100% for ER, 87% for PR, and 85% for HER2. Most of the discordant calls for HER2 were scored as "negative" or 1+ at INEN and 2+ by the independent pathologists. Immunohistochemical subtype classification was not available for 141 samples (7%).

Statistical analysis
We performed descriptive analyses of available demographic, anthropometric, reproductive, and clinical characteristics by breast cancer subtype. Differences in characteristics between tumor subtypes were tested by means of one-way ANOVA for normally distributed continuous variables and Chi-squared tests for categorical variables. Age at first full-term pregnancy presented a non-normal distribution; therefore, it was log 2 transformed. The correlation between genetic ancestry and continuous and categorical variables was performed using Pearson's correlation coefficient test and Point-Biserial Correlation Coefficient, respectively. Multinomial logistic regression models were used to calculate odds ratios (ORs) and 95% confidence intervals (CI) for the association of multiple variables and subtype-specific breast cancer. East Asian and African ancestry proportions were not included in multivariable models due to the low contribution of these components and high correlation with the Indigenous

Demographics, anthropometrics, and lifestyle factors in the PEGEN-BC study by tumor subtype
The most common breast cancer subtype among PEGEN-BC study participants was HR+HER2− (52.4%), followed by HR+HER2+ (18.7%), HR−HER2− (16.0%), and HR−HER2+ (12.9%) ( Table 1). The average age at diagnosis was 49.8 years (SD = 11), and differences by tumor subtype were not statistically significant (p = 0.087). PEGEN-BC study patients included individuals born in the three main biogeographic regions of Peru ( Figure 1): The Coastal (55.5%), Mountainous (36.4%), and Amazonian (7.5%) regions. Less than 1% of the patients were born in another country (mainly Venezuela). These groups did not show statistically significant differences in their distribution by tumor subtype (Table 1). Most patients resided in the Coastal region (7%), and differences in the proportion of patients who resided in each biogeographic area by tumor subtype category were not statistically significant ( Table 1).
Estimates of individual continental genetic ancestry were available for 1,796 patients. Average Indigenous American ancestry among patients was 76.5%, followed by 18.0% European, 4.2% African, and 1.4% East Asian (Table 1). Furthermore, 92% of PEGEN-BC study participants had > 50% of Indigenous American ancestry, 25% at least 90%, and 12% at least 95% of Indigenous American ancestry ( Figure 2A). Seven patients (0.4%) had more than 50% of East Asian ancestry, and eight (0.4%) had more than 50% African ancestry. Principal components analysis showed that the PEGEN-BC patients defined the Indigenous American cluster along principal component (PC) 1 when compared against 1000 Genomes Project reference populations ( Figures 2B, C), reflecting the high degree of Indigenous American genetic ancestry that characterizes this cohort.
We found that the average Indigenous American ancestry proportion of participants was different across tumor subtypes. Individuals diagnosed with HR−HER2+ tumors showed the highest average proportion of Indigenous American ancestry (79.5%, SD = 15) ( Table 1).

FIGURE 1
Biogeographical regions of Peru. Red star shows the location of INEN. This figure was created using the ggmap, maps, and mapdata R packages.
*Immunohistochemical subtype classification was not available for 141 samples (7%). **Estimates of individual continental ancestry were unavailable for 147 patients (7.6%). ***Category not included in the Chi-square test due to small sample size. "Missing" categories were excluded from tests.
The average height of patients was 153.3 cm (SD = 6.6), with lower average height among patients diagnosed with HR−HER2+ tumors compared with all other subtypes (152.1 vs.~153.6 cm, p = 0.032). There were no statistically significant differences in weight or BMI by tumor subtype, with a large overall proportion of patients being overweight (40.1%) ( Table 1).
Most PEGEN-BC patients (68.7%) reported low levels of alcohol consumption (< 1 glass/day), whereas 7.4% reported consuming more than one glass per day. Moreover, 27.9% of participants reported being a current or past smoker. There was no statistically significant association between alcohol consumption, smoking history, and tumor subtype (Table 1).
Demographic, anthropometric, and lifestyle variables that did not show statistically significant differences by tumor subtypes did not show significant differences by HR status either (Supplementary Table S1).

Reproductive variables by tumor subtype
The average age at menarche among PEGEN-BC patients was 12.9 years (SD = 1.7), the average age at first full-term pregnancy was 23.2 years (SD = 5.7), and the average number of full-term pregnancies was 2.42 (SD = 1.8). Most study participants reported having had at least one child (83.5%), and 80% of parous women had at least two children ( Table 2). The frequency of parous women and number of births differed by tumor subtype, being higher among HR− tumors (p < 0.001) ( Table 2).
Breastfeeding was a common practice among parous women (96.3%), and we did not observe the differences in the proportion of women who breastfed their children by tumor subtype category (Table 2).
More than 85% of women reported being menopausal at recruitment. Patients with HR+HER2− tumors were more likely to report being menopausal than patients with other tumor subtypes (p = 0.016). However, since many of these patients had induced menopause due to treatment, we did not consider this variable in subsequent multivariate analyses and stratified by age at diagnosis instead.
All these variables remained significant in analyses stratified by HR status (Supplementary Table S2

Clinical characteristics by tumor subtype
Overall, approximately 8% of PEGEN-BC study patients reported a family history of breast cancer in a first-degree relative (Table 3). Differences in breast cancer family history by breast cancer subtype were not statistically significant. More than 90% of patients were diagnosed with Grades 2 and 3 tumors (Table 3). Patients with HR+HER2− tumors were more likely to be diagnosed with Grades 1 and 2 disease, whereas those with HR −HER2+ and HR−HER2− tumors were more likely to be high grade (Table 3). Most PEGEN-BC participants were diagnosed with stage II or III disease, with a larger number of stage I and II diagnoses among HR+HER2− patients than those with other subtypes (Table 3). Concordant with the distribution of tumor stage, we observed a high proportion of positive lymph node status among patients overall (64.3%), with a statistically significantly higher proportion of lymph node positivity among patients with HR−HER2+ tumors compared with those with other disease subtypes (78.2% vs.~67%) ( Table 3). Distribution of these variables by HR status is shown in Supplementary Table S2.

Distribution of patient characteristics by age at diagnosis
We compared the distribution of anthropometric, demographic, reproductive, clinical, and lifestyle risk variables between patients diagnosed before the age of 50 years (N = 981) and at 50 years or older (N = 955). Compared with patients diagnosed at 50 years or older, younger patients had higher average Indigenous American ancestry (78.6 vs. 74.3, p < 0.001); they were more likely to reside in the Mountainous region (17.3% vs. 12.8%, p = 0.015), and they were 1.4 cm taller (p < 0.001) and had lower prevalence of obesity (25.4% vs. 30.0%, p = 0.036) (Table 4). Additionally, there was a higher proportion of older patients with more than three children compared with the younger group (31% vs. 13%, p < 0.001), and a larger proportion of younger patients reported breastfeeding their children (98% vs. 95%, p = 0.001) ( Table 5). Regarding clinical characteristics, younger patients reported lower family history of breast cancer in a first-degree relative (6.5% vs. 9.5%, p = 0.02) and presented with more advanced disease (44% diagnosed at stage III compared with 42%, p = 0.017) ( Table 5). We did not observe statistically significant differences in subtype distribution between both age categories.
Additional stratified analyses comparing demographic, anthropometric, reproductive, and clinical factors by tumor subtype in the two different age groups are included as Supplementary Materials (Supplementary Tables S3 and S4). As additional stratification reduced the number of observations per category, we suggest taking these results with caution.

Correlation between Indigenous American genetic ancestry and other patient and tumor characteristics
We assessed the correlation between Indigenous American ancestry and patient and tumor characteristics to better understand the observed patterns in ancestry distribution and those factors by tumor subtype in the PEGEN-BC study. We observed an inverse correlation between Indigenous American ancestry and age at diagnosis (r = −0.15, p < 0.001), weight (r = −0.11, p < 0.001), height (r = −0.25, p < 0.001), age at first full-term pregnancy (r = −0.08, p = 0.002), family history of breast cancer in a first-degree relative (r = −0.12, p < 0.001), smoking history (r = −0.11, p < 0.001), HR+ status (r = −0.06, p = 0.012) and a positive correlation with age at menarche (r = 0.06, p = 0.017) and HER2+ status (r = 0.053, p = 0.029).

Multivariable analyses testing the association between demographic, lifestyle factors, and breast cancer subtype
Variables that showed statistically significant associations at the 10% level with tumor subtype in the univariate analyses (Tables 1-3)  were included in a multivariate model, using HR+HER2− as reference ( Family history of breast cancer in a first-degree relative was not included as a covariate in the multivariate model because the number of patients that reported family history of breast cancer in a first-degree relative was relatively small and rendered unstable estimates when included. We tested models excluding patients with a family history of breast cancer, and results were similar to those using the full dataset (Table 6). Indigenous American ancestry, region of residence, height, BMI, breastfeeding history, number of full-term pregnancies, and family history of breast cancer in a first-degree relative showed statistically significant associations at the 10% level with age at diagnosis categories. These variables were included in a multivariate model using age at diagnosis < 50 as reference (Table 7). We found that increasing Indigenous American ancestry and increasing height were associated with reduced odds of being diagnosed at 50 years or older (OR = 0.63, 95% CI = 0.53-0.75, p < 0.001 and OR = 0.96, 95% CI = 0.95-0.98, p < 0.001, respectively). Patients that resided in the Mountainous region had reduced odds of being diagnosed at 50 years of age or older compared with those in the Coastal region (OR = 0.63, 95% CI = 056-0.9, p = 0.004). Breastfeeding was associated with lower odds of being diagnosed at 50 years of age or older (OR = 0.35, 95% CI = 0.2-0.7, p = 0.001). Compared with nulliparous women, giving birth to at least one child increased the odds of being diagnosed at an older age (OR = 1.55, 95% CI = 0.2-0.7, p < 0.001). Increasing BMI was no longer associated with age at diagnosis (Table 7).

Discussion
In the present report, we aimed to provide a more complete description of the distribution of anthropometric, demographic, clinical, and known breast cancer-associated risk factors among Peruvian women that are part of The Peruvian Genetics and Genomics of Breast Cancer Study (PEGEN-BC). This work constitutes an update of a previously reported study, including a larger number of recruited patients and extending analyses to describe the distribution of patient characteristics not only by tumor subtype but also by age at diagnosis (62).
Being a hospital-based cohort, the PEGEN-BC study included a large proportion of women who resided in the Coastal region, where the INEN main hospital is located (Figure 1). Despite this bias in terms of residential representation, when looking at place of birth, the proportion of the cohort's patients from the Coastal region followed closely that of the Peruvian population (58.0% Peru vs. 55.5% of cohort patients). The study has an overrepresentation of patients born in the Mountainous region (28.1% Peru vs. 36.4% of cohort patients) (68) and an underrepresentation of patients born in the Amazonian region (13.9% Peru vs. 7.5% of cohort patients) (68). The proportion of patients within each geographical region is consistent with what has been reported in two studies describing mortality of breast cancer (69) and incidence of triple-negative breast cancer tumors in Peruvian women (70).
A large proportion of patients were overweight/obese (67%), and the prevalence of exposure to alcohol and tobacco was higher than what has been previously reported for Peruvian women (71, 72). The average Indigenous American ancestry among the PEGEN-BC patients is 76.5%, which is higher than the average ancestry proportion of women in other breast cancer studies, including Latin America and U.S. Latinas (12,51,60,(73)(74)(75)(76)(77)(78)(79)(80)(81)(82)(83)(84)(85)(86)(87)(88)(89). In addition, the average height in our cohort was consistent with what has been reported in the literature for the Peruvian population (90) and with the known inverse correlation with Indigenous American ancestry (91). Overall, some reproductive variables showed a similar trend to what has been reported, including a similar age at menarche (92) and a high breastfeeding rate (93). The number of full-term pregnancies reported here (average of 2.8 children) was more closely related to what has been observed in rural areas of Peru (2.5) compared with urban areas (1.4) (94).
The distribution of tumor subtypes is similar to what has been previously described in other Latin American countries (95), with differences being partially explained by the inclusion of KI-67 expression and tumor grade for subtype classification (95), as indicated by the 2013 St. Gallen consensus (96). This classification criterion was not used in this report since KI-67 was not available for more than 20% of patients, and parameters for subtype determination based on this marker tend to be unstable across populations and studies (97). A study describing patient and tumor characteristics from Peruvian breast cancer patients at INEN diagnosed between 2000 and 2002 (80) (PEGEN-BC patients were recruited if diagnosed in 2010 or later) reported a lower proportion of HR+ tumors compared with PEGEN-BC (62.5% vs. 71.1%). This difference is likely to be explained by the higher positivity percentage cutoff value for HR+ status used in the previous report (10%, compared with 1% in PEGEN-BC), increasing the proportion of HR+ tumors in our cohort. Other characteristics, such as age at diagnosis and stage, presented similar distribution to the PEGEN-BC study cohort.
We found statistically significant differences by tumor subtype for Indigenous American genetic ancestry and height. In addition, we observed suggestive associations for age at diagnosis, family history of breast cancer in a first-degree relative and tobacco exposure. Differences were mostly driven by the HR−HER2+ subtype. Among patients with HR−HER2+ disease, we observed that the average height was lower compared with patients diagnosed with other tumor subtypes and was less likely to report smoking or a positive family history of breast cancer in a first-degree relative. Even though subtype-specific associations have been reported for these variables in other populations (38, [98][99][100][101], results in the Peruvian cohort showed that of all the above variables Indigenous American ancestry proportion was the only one that was differentially distributed by tumor subtype in multivariable models. We did not find statistically significant differences for age at menarche by tumor subtype. Some studies have shown consistent associations between age at menarche and reduced risk of HR +HER2− breast cancer (3,19,20). One multicenter study did not find subtype-specific associations (27), consistent with our study. The PRECAMA Study, a Latin American population-based casecontrol study of premenopausal breast cancer, reported reduced odds for HR− tumors among women who were > 12 years old at menarche, compared with those younger at menarche (26,51). In the current study, we did not find a statistically significant difference in average age at menarche by tumor subtype despite the observed correlation between the former and Indigenous American ancestry proportion.
We observed a higher frequency of parous women diagnosed with HR− subtypes compared with HR+. Parity (ever vs. never) has been associated with a higher risk of HR−HER2− subtypes, especially among women of African origin (33)(34)(35). Higher number of fullterm pregnancies has been associated with reduced breast cancer risk (19,31), with lower odds of developing HR+ tumors (3,19,20,(24)(25)(26)(27)(29)(30)(31)(32)(33)(34)(35). We found significant differences in number of births by subtype, being higher among HR− subtypes compared with HR+ (2.7 compared with 2.3, respectively). Results suggested a larger proportion of women with > 3 children among those with HR− disease subtypes. This observation was consistent with studies in African American women reporting a higher number of reported fullterm pregnancies among women with HR− disease (33). Studies that have tested the association between age at first full-term pregnancy and subtype-specific risk have shown a decreased risk of developing HR+HER2− tumors with unclear associations for other subtypes (25,27,31). In African American cohorts, limited breastfeeding among parous women is associated with an increased risk for HR−HER2− subtypes (34). The current study does not include detailed pregnancy and lactation history for the patients. As a result, we could not assess the association between time to breastfeeding cessation and cumulative time of breastfeeding and HR− subtypes.
There were statistically significant differences in the prevalence of demographic, anthropometric, and reproductive factors by age at diagnosis categories. The multivariate analysis showed that these variables are independently associated with age at diagnosis. Moreover, the differences in BMI by age at diagnosis were concordant with what is known about pre-and post-menopausalspecific disease risk factors (39)(40)(41)(42)(43). It must be considered that the observed differences in parity and height by age at diagnosis could be due to the correlation between age and the former (i.e., number of children and height are positively correlated with age) and not to an association between those variables and preversus postmenopausal disease.
The observed association between tumor subtype and Indigenous American ancestry could be due to a multiplicity of factors that we might not have collected information on in the PEGEN-BC study. For example, the study did not obtain information on the level of education or socioeconomic status of participants; both variables were previously shown to be associated with Indigenous American ancestry) among U.S. Latinas and Mexican women (76,102,103). Socioeconomic status can also impact screening, which in turn can affect tumor subtype distribution and mortality rates. Reports showed that less than 20% of Peruvian women 40-59 years of age have had a mammography, with vast differences according to socioeconomic status, educational level, health insurance, and region of residence (104,105). Plan Esperanza, launched in 2012, has aimed to provide universal cancer screening and decentralize oncological health care across Peru, focusing on underserved commuties (106).
The PEGEN-BC study had some additional limitations. First, since menopause can be induced by treatment, most of the PEGEN-BC participants were postmenopausal at the time of the interview (86%). Therefore, we did not perform stratification by menopausal status and used age at diagnosis (< 50 vs. >= 50) instead to differentiate early onset versus late onset disease, as it has been widely used in epidemiological studies (107, 108). Even though menopausal status and age at diagnosis are highly correlated, studies have shown that age at diagnosis is a driver for breast cancer heterogeneity, acting as a confounder in analyses stratified by menopausal status (109). For this reason, the use of age as a proxy for menopausal status should be taken with caution. The second limitation concerns the relatively low variability of some of the assessed factors among PEGEN-BC study participants. For example, the assessment of the association between breastfeeding and the number of births and tumor subtype was hampered by the low prevalence of women without children and of women with children who did not breastfeed them. Additionally, we described the distribution of multiple factors across tumor subtypes, which provide evidence of heterogeneity; however, future case-control design studies should further explore subtype-specific breast cancer risk. Finally, average East Asian and African genetic ancestry components showed differences by subtypes in the univariate analyses. However, since ancestry estimates are correlated, and the proportions of East Asian and African genetic ancestries were relatively low as to provide reliable estimates, we focused the current description on the Indigenous American ancestry, which is the dominant component in Peruvians.
In summary, results confirmed the previously reported higher average Indigenous American ancestry among patients with HR −HER2+ breast cancer in this larger sample of PEGEN-BC study participants. Moreover, differences in tumor subtype by age at diagnosis were apparent and concordant with what is known about pre-and post-menopausal-specific disease associated risk factors. Larger studies are needed to understand the consistently observed association between ancestry, age of onset, and disease subtypes, considering the contribution of screening and treatment, to develop population-appropriate predictive models and targeted outreach and prevention campaigns.

Data availability statement
All data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by University of California Davis Institutional Review Boards and the Instituto Nacional de Enfermedades Neoplaśicas (INEN). The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.