Risk of Second Primary Cancers Among Long-Term Survivors of Breast Cancer

Purpose: The current study explored the risk of developing second primary cancers (SPCs) among long-term early-stage breast cancer survivors and identified risk factors to build an externally validated clinical prediction model. Methods: The cumulative incidence of SPCs was calculated by Gray method among survivors of early-stage initial primary breast cancer (IPBC). Comparisons of treatment-related risk by selected organ sites were performed. A nomogram was established to estimate the individual risk of developing SPCs based on the multivariate Fine and Gray risk model. Decision curve analysis (DCA) was used to evaluate clinical usefulness of the model. Results: The cumulative incidence of developing SPCs after early-stage IPBC was 7.43% at 10 years, 14.41% at 15 years, and 20.08% at 20 years. Radiotherapy was associated with elevated risks of any SPCs and with elevated risks of lung cancer (SHR: 1.109; P = 0.045), breast cancer (SHR: 1.389; P < 0.001), and AML (SHR: 1.298; P = 0.045). Chemotherapy was significantly associated with a declined risk of any SPCs, with decreased risks of lung (SHR: 0.895; P = 0.015) and breast cancers (SHR: 0.891; P < 0.001), as well as elevated risks of other leukemias (SHR: 1.408; P = 0.002). HR-positive status was associated with decreased risks of any SPCs; with decreased risks of breast (SHR: 0.842; P < 0.001) and ovarian cancers (SHR: 0.483; P < 0.001); and with elevated risks of urinary tract cancers (SHR: 1.214; P = 0.029). Conclusion: We found that the cumulative incidence of developing SPCs increased over time and did not plateau. Risk factors for developing SPCs identified by our study were not consistent with those of previous studies. The prediction model can help identify individuals at higher risk of SPCs.


INTRODUCTION
Breast cancer is worldwide the leading cancer among women (1). Advances in early systematic screening, effective treatments, and supportive care have caused an elevated proportion of breast cancer survivors (2). For some survivors, these survival benefit have been diluted by the late long-term effects of initial cancer and its therapy, with second primary cancers (SPCs) comprising  (2). It is difficult to find an exact estimation of how frequently SPCs occur or the likelihood that IPBC survivors will develop one (4). Risk stratification by age and race have been extensively explored, demonstrating that survivors of premenopausal age at initial diagnosis and black women had an elevated risk of developing SPCs (2). Each of these previously used methods has inherent limitations when attempting to ascribe causation, especially when several risk factors are involved (5). Therefore, the patterns of SPC development are still poorly understood. Clinicopathological factors have also been proposed to explain the elevated risks. Only a few researches estimated the effect of initial treatment on the development of SPCs (2,6). The results from these researches were inconsistent, and prediction models of developing SPCs were not provided for survivors.
The purpose of the current research is to estimate cumulative incidence of SPCs and examine risk factors of developing SPCs in long-term early-stage breast cancer survivors in the presence of competing risks. Furthermore, we built an externally validated competing nomogram to help select patients at increased risk of developing SPCs.

Inclusion and Exclusion Criteria
Only Female breast cancer patients in the National Cancer Institute Surveillance, Epidemiology, and End Results (SEER) registry with histologically confirmed early-stage (stage I-III) who survived for 5 years and more were retrospectively reviewed from 1990 to 2010. In total, 250,764 eligible female patients at 20-80 years old with complete clinicopathological information were included. The inclusion and exclusion criteria was showed in flow chart (Supplemental Figure 1). The follow-up time for SPCs for each patients began 5 years after the IPBC diagnosis and ended at diagnosis of SPCs, death from IPBC or the end of follow up (December 2017), or death from other causes.

Variable Declaration
Age was regrouped into four subpopulation (20-40, 41-60, 61-70, and 71-80). Race was regrouped into white race, black race and other race. Marital status was regrouped into married status, single status or divorced status. The hormone receptor (HR) status was stratified to HR positive (estrogen receptor (ER) or progesterone receptor (PR) was positive) and HR negative (both ER and PR were negative). Histology was divided as invasive ductal carcinoma (IDC), invasive lobular carcinoma (ILC), mixed (mix of IDC and ILC), and other. Surgery was regrouped as breast conserving surgery (BCS) (including partial mastectomy, lumpectomy excisional biopsy, and segmental mastectomy) and mastectomy (including total mastectomy, modified radical mastectomy, radical mastectomy, extended radical mastectomy). Topography and morphology were used to explore the organ site specific risk using International Classification of Diseases for Oncology (ICD-O).

Study Design and Methods
The cumulative incidence of SPCs was calculated based on the Gray method with a competing risk framework: deaths from IPBC or other causes, whichever occurred first, was regarded as competing event (7). The Kaplan-Meier method was constructed to estimate difference in overall survival (OS) between survivors with and without SPCs.
We randomly divided the entire cohort into a development cohort (75%) and another validation cohort (25%) for development and validation of the competing risks nomogram. Standardized mean differences (SMDs) was used to assess distributional differences in the baseline variables between the development and validation cohorts. Values of P > 0.1 imply a potential difference between development and validation cohort (8).

Variable Selection
The forward and backward stepwise methods was used to select the predictive variables from the development cohort for the prediction model based on the Akaike information criterion (AIC) (9). To further reduce the final model, multivariate Fine and Gray competing risks regression model was used to exclude variables based on a backward selection algorithm with a P > 0.05. Furthermore, based on all of the selected features, independent effects of initial cancer treatment (chemotherapy and radiotherapy) and HR status on the risk of developing SPCs in selected organ sites were also examined based on the multivariable competing risk model (7).

Validation of the Prediction Model
We assessed the calibration for risks of developing SPCs by comparing the observed risks based on the Gray method with the mean predicted risks predicted risks from the prediction model. Likewise, an external validation was performed in the validation cohort. The C-index was also used to quantify the discrimination ability of the prediction model.

Risk Stratification Ability
The decision curve analysis (DCA) in the validation cohort was used to examine the clinical utility and net benefits of competing risks model for developing SPCs. DCA is a suitable method for evaluating alternative diagnostic and prognostic strategies that has advantages over other commonly used measures and techniques (10). We divided the survivors into three subgroups by the 25th and 75th percentile risk score of the nomogram-based estimated SPC risks. We then calculated the cumulative incidence using the Gray method for each subgroup and compared them across the different risk subgroups (7).
All statistical analysis were conducted using R software (https://www.r-project.org/). Significance level set as P < 0.05.
We randomly divided entire cohort into two parts: a development cohort (188,073 patients) and a validation cohort (62,691 patients). Baseline characteristics, such as initial diagnosis age, race and treatment-related factor, were similarly distributed in the development and validation cohorts (Supplemental Table 1).

Comparisons of Treatment and HR Status Related Risk by Organ Sites
Furthermore, the effects of initial cancer-treatment (chemotherapy and radiotherapy) and HR status on the SPCs risk in selected organ sites were estimated based on the multivariable Fine and Gray risk model. We found that, The results were shown in a forest plot (Figure 2 (Figure 3). After adjusting for initial age of IPBC diagnosis, race, histology, IPBC stage, radiotherapy, and chemotherapy, HR-positive status patients had a declined risk of any SPCs and with decreased risks of second breast (SHR = 0.842; 95% CI: 0.807-0.879; P < 0.001) and ovarian cancers (SHR = 0.483; 95% CI: 0.415-0.563; P < 0.001), with elevated risks of urinary tract cancer (SHR = 1.214; 95% CI: 1.020-1.444; P = 0.029). The results were shown in a forest plot (Figure 4). The risk of developing SPCs by selected organ sites was summary in Table 3.

Establishment and Validation of the Competing Risks Nomogram
The established nomogram based on the multivariable Fine and Gray model shows the relative importance of each independent variable: age was the vital predictors of developing SPCs, followed by the IPBC stage, radiotherapy, race, HR status, histology,   and chemotherapy (Figure 5). The validated C-index of this prediction model in the development cohort was 0.59 (95% CI: 0.56-0.61). The C-index in the validation cohort was 0.58 (95% CI: 0.55-0.62). Calibration plots for internal (development cohort) and external (validation cohort) validation of the prediction nomogram were shown in Supplemental Figure 2.
Point assignment and risk score in the nomogram was summarized in Supplemental Table 2.

Risk Stratification: Variation of SPC Risks Based on the Prediction Model
Cumulative incidence of developing SPCs across different risk subgroups defined nomogram-predicted risk score, which shows a wide stratification of the SPC risks at 15 years, from 12.01% for the 25th interquartile group to 17.42% for the 75th interquartile group with a statistical significance according to the Gray test (P < 0.001), demonstrated a well-discrimination among low and high risk subgroups (Supplemental Figure 3). The decision curve analysis using the 15-year risk of SPCs from the competing risks nomogram in the validation cohort to inform clinical decisions was better than the strategies of treat all or treat none across a wide range of thresholds between 0.01 and 0.24 (Figure 6).

DISCUSSION
In the present study, we calculate the cumulative incidence of SPCs among survivors of early-staged IPBC in the presence of competing events, evaluate risk factors for developing SPCs based on the multivariate Fine and Gray model, and build and externally validate a clinical prediction model. Our study supports and expands on previous studies demonstrating an elevated standardized incidence ratio (SIR) for SPCs following an IPBC, especially among elderly, early-stage, HR-negative, and irradiated survivors compared with the general population.
To our knowledge, this is the first available nomogram for developing SPCs in IPBC survivors in the presence of competing events, which was helpful in individual risk estimation, patient follow-up and counseling. The DCA inform clinical decisions was better than the strategies of treat all or treat none across a wide range of thresholds between 0.01 and 0.24, which shows the higher clinical utility of our risk prediction model. The previous studies demonstrated that young patients had a higher SIR than elderly patients (11). Inconsistent with those studies, our study found that elderly survivors have higher risks of developing SPCs. Previous studies is not directly comparing SPCs rates between older and younger population. To calculate age-adjusted standardized rates, one must first have the agespecific rates of disease for each of the populations to be compared. Studies based on SIR analysis, which is obtained by dividing the observed number of cases of breast cancer by the "expected" number of cases (12). Additionally, a high SIR does not necessarily imply a high cancer burden, given that the expected incidence of second cancers may be low (13). Overall breast cancer incidence increases with age, so the difference between the observed and expected risks of developing SPCs in the elderly group will be lower (12). And more young patients have a higher risk of mortality from IPBC, preventing the development of SPCs (13). More young patients have a higher risk of mortality from IPBC, preventing the development of SPCs (14). In addition, SIR study was not enough to ascribe causation when several risk factors are implicated (5).
Few studies have explored the effect of the extent of the initial disease on the development of SPCs. We found that increased patients with higher IPBC stage had a declined risk of SPCs, necessarily attributed to higher possibility of mortality from IPBC before SPCs occur (15). Consistent with previous studies, our study found that patients with HR-positive breast cancer had a declined SPCs risk. Of note, 60-90% of germline mutation BRCA1-associated breast cancers are HR negative, which may be a possible explanation for the increased second FIGURE 5 | Competing risks nomogram for predicting the 10-, 15-, and 20-year risk of developing second primary cancers. The competing risks nomogram provides a method to calculate 10-, 15-, and 10-year probability of cumulative incidence (CI) of developing second primary cancers (SPCs) on the basis of a patient's combination of covariates. To use, locate the patient's age at initial diagnosis, draw a line straight up to the points axis to establish the score associated with that age. Repeat for the other five covariates (race, histology, stage, HR, chemotherapy, and radiotherapy). Add the score of each covariate together and locate the total score on the total points axis. Draw a line straight down to the 10-, 15-, and 20-year SPCs cumulative incidence axis to obtain the individual probability.
SPCs in HR-negative IPBC patients (16). BRCA1 and BRCA2 mutation patients had a respective 4.5-and 3.4-fold elevated risk of developing contralateral breast cancer (17). Previous studies found that Endocrine therapy approximately reduces 33% second breast cancer (18). We found that HR-positive patients were associated with an increased risk of second urinary tract cancers, which may be explained by hormone use. A Dutch study also found that hormonal therapy and shared etiological risk factors were associated with elevated risks of developing second urinary tract cancers (13).
In the present study, we compared treatment-related SPC risks by selected organ sites. A study estimated that 9% of any SPCs and 25% of the irradiation-associated site SPCs were ascribed to radiation therapy (19). A meta-analysis demonstrated that breast cancer patients with radiotherapy had an elevated overall risks of second non-breast cancer (20). In the present study, we found that patients with radiotherapy had an elevated risk of any SPCs and with elevated risks of lung, breast, and AML, which was consistent with the previous study. A study based on a SEER dataset demonstrated the risk of secondary malignancies and concluded that SPCs were significantly higher for cases that received chemotherapy after adjusting for known confounders (21). A populationbased study including 58,068 Dutch patients demonstrated that patients with chemotherapy had a decreased risks of developing second non-breast cancers and colon and lung cancer (13). Our result was consistent with the Dutch study, finding that chemotherapy was associated with a modest protective effect of developing SPCs. Organ specific analysis showed that patients with chemotherapy had an elevated risks of leukemia (excluding AML). Given a SEER chemotherapy sensitivity of only 68% (22), our results should be treated with FIGURE 6 | Decision curve analysis for the competing risks nomogram for 15-year second primary cancer risks in the validation cohort. The X-axis is the risk threshold probability that changes from 0 to 1 (right truncated at 0.25) and the Y-axis is the calculated net benefit for a given threshold probability. The dashed lines depict the net clinical benefit of the competing risks-based selection strategy for intervention, whereas the gray and black curves display the net benefits in the alternative strategies of treating all patients (gray) vs. treating no patients (black) in the cohort.
caution and need to be further confirmed in other populationbased datasets.
A previous study also identified that black breast cancer survivors had a higher risk of developing SPCs (23). SPCs reflect not only the late effects of cancer and its treatment but also the influence of shared lifestyle, genetic susceptibility, environmental exposures, and gene-environment interactions (3). A Spanish cohort study demonstrated that smoking history, obesity, and high blood pressure were risk factors for SPCs (24). SEER does not provide all the above-listed information, which may lead to the lower C-index observed in our prediction model. Despite the lower C-index, our competing nomogram has a stratification ability to classify the cohort into subgroups with distinct risks of SPC development. SEER does not provide information of regimens. We recognize that the treatment regimens data is an inevitable limitation of our study.
Cumulative incidence of developing SPCs elevated over time and did not plateau. There is a significant difference in OS between survivors with and without SPCs. Consistent with previous studies, our study found that HR negative with radiotherapy and black race were significantly associated with increased risks of SPCs. In contrast, chemotherapy was associated with a modest protective effect. Inconsistent with previous reports, we found that elderly patients was associated with an elevated risk of developing SPCs. For the first time, we found that lower IPBC stage was also associated with elevated risk of developing SPCs. Furthermore, an externally validated clinical prediction model was established to help select high-risk patients.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

AUTHOR CONTRIBUTIONS
DL conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents, materials and analysis tools, prepared figures and/or tables, and authored or reviewed drafts of the paper. SW and XT conceived and designed the experiments, performed the experiments, analyzed the data. CZ, NZ, YC, and DX performed the experiments and authored or reviewed drafts of the paper. YY conceived and designed the experiments, performed the experiments, authored or reviewed drafts of the paper, and approved the final draft. Supplemental Figure 2 | Internal (A: development cohort) and external (B: validation cohort) validation plots of the competing risks nomogram. The X-axis is average predicted probabilities of the competing risks nomogram. The Y-axis is the observed cumulative incidence probabilities for the respective cohort. Vertical lines are 95% CIs of the cumulative incidence. Dashed lines are the reference lines, which indicate where an ideal nomogram would lie.
Supplemental Figure 3 | Cumulative incidence of second primary cancers (SPCs) by different risk subgroups defined by the estimated nomogram-predicted risk score. The marginal cumulative incidence of SPCs was calculated, and the difference of the cumulative incidences across distinct risk subgroups was tested using the Gray method.
Supplemental Table 1 | Comparisons of patient characteristics of the study population in the development and validation cohorts.
Supplemental Table 2 | Point assignment and risk score in the nomogram.