A Novel Prognostic Scoring System Integrating Gene Expressions and Clinicopathological Characteristics to Predict Very Early Relapse in Node-Negative Estrogen Receptor-Positive/HER2-Negative Breast Cancer

Background: Despite low aggressiveness in tumor biology and high responsiveness to endocrine therapy, subgroups of patients with estrogen receptor-positive/HER2-negative (ER+/HER2-) breast cancer relapse early in the first two years after initiation of endocrine therapy, indicating potential endocrine resistance. Accordingly, we attempted to establish a scoring system to inform the first-2-year prognosis (F2P Score). Methods: Patients with node-negative ER+/HER2- breast cancer and complete data of gene expressions in a 21-gene panel were retrospectively retrieved from Shanghai Jiao Tong University Breast Cancer Database (SJTU-BCDB). The F2P Score was established based on the clinical and genomic variables associated with the first-2-year relapse after shrinkage correction and validated using the bootstrap resampling method. Model performance was quantified by Harrell's concordance-index (C-index) and Bayesian information criteria (BIC). Results: The F2P Score was established by integrating the clinical (age and tumor size) and genomic (ESR1, PGR, BCL2, CD68, GSTM1, and BAG1) variables with a C-index of 0.71 and BIC of 397.46. Bootstrap C-index was 0.72 (95% CI, 0.62–0.81) and BIC was 396.75 (95% CI, 252.37–541.13). A higher score indicated an increased likelihood of a first-2-year relapse, when used as continuous (HR, 2.94; 95% CI, 1.87–4.61) or categorical (HR, 3.68; 95% CI, 1.70–8.00) predictors in multivariate analysis. Both continuous and categorical F2P Score also remained prognostic for overall survival and other endpoints. No significant interaction was observed between the F2P Score and treatment subgroups. Additionally, the F2P Score outperformed the IHC4, clinical treatment score and 21-gene test in predicting first-2-year relapse. Conclusion: The F2P Score reported herein, integrating the clinicopathological and genomic variables, may inform prognosis and endocrine responsiveness. After the benefits and risks have been considered, treatment escalation may be an alternative strategy for patients with a higher score.


INTRODUCTION
Estrogen receptor-positive and human epidermal growth factor receptor 2-negative (ER+/HER2-) breast cancer constitutes ∼70% of malignant breast neoplasms (1,2). Endocrine therapy is considered the therapeutic backbone for this subtype of breast cancer by counteracting estrogen-promoted tumor growth (1). Despite high endocrine responsiveness, there is a persistent risk of relapse in years 0-20 for ER+ breast cancer, and 5 to 10% of patients relapse early in the first two years after the initiation of endocrine therapy (3)(4)(5).
Early relapse during the first two years of endocrine therapy usually indicates the high aggressiveness of tumor biology and potential resistance to endocrine therapy, which remains one of the leading causes of treatment failure (6,7). Some headway has been made concerning the underlying mechanisms of endocrine resistance, including the mutations in the ligand-binding domain of ESR1, the downregulation of progesterone receptor (PR) by hyperactive crosstalk between ER and growth factor signaling pathways, and the imbalance between the non-apoptotic and pro-apoptotic functions of BCL2 family (8)(9)(10)(11)(12). These studies reinforce the idea that molecular biomarkers alone cannot yield accurate predictions for endocrine sensitivity and the likelihood of early relapse. From a clinical perspective, it is of great importance to develop a prognostic approach for relapse in the first two years, since treatment escalation is required for patients classified as high risk of very early relapse who are potentially endocrine-resistant.
To date, several multigene assays have been validated to estimate prognosis for the first five years (5,(13)(14)(15)(16). Yet, the inferior prognostic capability of these assays was reported when compared to their combination with conventional clinicopathological factors (17)(18)(19). Additionally, it remains unclear that if these genomic assays allow the dichotomization of patients at high risk of first-2-year relapse. To address the issue, we attempted to build a scoring system that integrated the clinicopathological factors and gene expressions derived from a 21-gene panel for assessing the first-2-year prognosis (F2P Score) and informing the endocrine responsiveness.

Patients Selection
Women with histologically confirmed invasive breast cancer from 2009 to 2016 were retrospectively selected from the Shanghai Jiao Tong University Breast Cancer Database (SJTU-BCDB). Patients were included based on the following criteria: (1) immunohistochemically (IHC) determined ER positivity with ≥1% immunoreactive tumor cell nuclei (20); (2) HER2 negativity if scored 0/1+ by IHC or 2+ with non-amplified HER2 gene being found on fluorescence in situ hybridization (HER2/CEP17 ratio < 2.0 with average HER2 gene copy number <6.0 signals/cell, or average HER2 gene copy number <4.0 signals/cell regardless of the ratio) (21); (3) no lymph node involvement; (4) available reports of a 21-gene test. We excluded patients with incomplete clinicopathological characteristics and followup data, those diagnosed with de novo metastatic breast cancer, and those who had received neoadjuvant systemic therapy.

Variables Defining
Clinicopathological variables used in the following analyses included age, marital status, menopausal status, comorbidity score, histology, grade, tumor size, PR status, and Ki67. Of these variables, comorbidity scores were calculated based on the sum of a series of comorbid conditions (each was assigned a score of 1, 2, 3, or 6), and then categorized into 0, 1, and ≥2 (22,23). Clinical treatment score (CTS) and IHC4, proposed by Cuzick et al., were used in the procedure of model comparison (24). CTS was computed using age, tumor size, node status, grade, and use of anastrozole, while IHC4 was calculated based on ER, PR, HER2, and Ki67.

Expression of Genes in the 21-Gene Panel
As was reported in our previous study, the expression of the 16 cancer-related genes was measured based on the 21-gene recurrence score assay (25). The tests were performed using formalin-fixed, paraffin-embedded tissue as previously described (5). First, hematoxylin and eosin-stained slides were reviewed to ensure sufficient tissue of invasive breast cancer by a pathologist, and then deparaffinization of the two 10 µm unstained sections was performed using xylene followed by ethanol. RNA extraction was performed using the RNeasy FFPE kit (QIAGEN, Hilden, Germany). Total RNA content was quantified, and the absence of DNA contamination was confirmed. After that, we conducted gene-specific reverse transcription followed by standardized quantitative reverse transcriptase-polymerase chain reactions (RT-PCR) in 96-well plates with Applied Biosystems (Foster City, CA) 7500 Real-Time PCR system. The PCR cycling went as follows: 95 • C for 10 min for one cycle, 95 • C for 20 s, and 60 • C  for 45 s for 40 cycles. The expression of each gene was measured in triplicate and normalized relative to five reference genes. CT was computed as the mean CT value of the reference minus the CT value of the targeted cancer-related genes. The recurrence score was derived from the reference-normalized expression measurement for the 16 cancer-related genes (5).

Statistical Analysis
The primary endpoint was 2-year invasive disease-free survival (IDFS). Secondary endpoints included 2-year distant disease-free survival (DDFS), distant relapse-free survival (DRFS, excluding death from any causes), and overall survival (OS). Detailed definitions were described by the STEEP system (26). Cox proportional hazard model was developed to estimate the regression coefficients, hazard ratios (HR), and 95% confidence intervals (CI) for the clinical and genomic variables associated with first-2-year relapse. In this procedure, variables with a twosided P < 0.1 were selected to establish the scoring system. To improve the predictive value and allow for overfitting, we estimated the global shrinkage factors to penalize the regression coefficients of the clinicopathological and genomic variables, respectively (24,27). After that, the F2P Score was established based on the following equations, where the η denoted the shrinkage factors, β i and β i ' referred to the corresponding regression coefficients of the clinical and genomic variables, v i were clinical variables (continuous or categorical), and ∆CT i were computed as described in section Variables Defining: After that, the F2P Score was internally validated using the bootstrap resampling method with 1,000 resamples.
The performance of the F2P score was quantified and compared using Harrell's concordance index (C-index), Bayesian information criteria (BIC), and the change in likelihood ratio χ 2 ( LR-χ 2 ). In our study, the net reclassification index (NRI) was also adopted to assess the reclassification performance and improvement of the model (28). When the baseline and new models were nested, NRI>0 indicates the improved performance of the new model. Also, the continuous relationship between the F2P Score and log-hazard ratio of first-2-year relapse was presented by cubic smoothing spline approximation. To assess the performance of the scoring system as a categorical predictor, the incidence of first-2-year relapse was estimated using the Kaplan-Meier method and compared using the Log-rank test, with the optimal cutoff point determined by X-tile (version 3.6.1; Yale University, New Haven, CT, USA).
In exploratory analyses, both the landmark analyses with a landmark point in the second year and the tests for interaction between time (0-2-vs. 2-5-years) and clinical/genomic variables were performed to explore the time-dependent effect on relapse. All tests adopted two-tailed P < 0.05 suggesting statistical significance unless otherwise stated. Survival package (version 3.1-8) was used for performing the Kaplan-Meier method, Cox proportional hazards model, and landmark analysis, shrink package (version 1.2.1) for the calculation of shrinkage factors, boot package (version 1.3-24) for bootstrap resampling method, and nricens package (version 1.6) for the calculation of NRI. All statistical analyses were performed in R version 3.5.3 (www.r-project.com).

Baseline Characteristics
Detailed clinicopathological characteristics and distribution of endpoint events were summarized in Table 1. A total of 1,156 patients were identified. Thirty in 66 (45.5%) IDFS events and 10 in 21 (47.6%) distant relapses were observed for the first two years, and the annual rates were 1.39 and 0.47%, respectively. When compared to those without first-2-year relapse, worse prognosis was observed for patients who relapsed on the first two years (log-rank P < 0.001), with a 5-year OS of 98.8% (95% CI 97.8-99.8%) and 70.6% (95% CI 55.1-90.5%), respectively ( Figure S1).

Model Development, Comparison, and Validation
Although a P-value of 0.081 was observed, PR status was not selected for model development due to the correlation between PGR expression and PR status. To avoid collinearity and improve the predictive accuracy, PGR expression was finally selected for the model development due to a lower P-value of 0.015. For the comorbidity score, a three-level variable, the overall P-value was 0.150, and thus, it was not selected as well. Consequently, the F2P Score was established based on the combination of six genomic variables (ESR1, PGR, CD68, BAG1, BCL2, and GSTM1) and two clinicopathological variables (age and categorical tumor size) with the shrinkage factors of 0.314 and 0.888, respectively. The formula of the F2P Score was developed and presented herein:   Table 3). When internally validated by the bootstrap resampling method, a stable performance was observed for the F2P Score with the Cindex of 0.72 (95% CI, 0.62-0.81) and BIC of 396.75 (95% CI, 252.37-541.13) ( Table 3).

Association Between F2P Score and First-2-Year Relapse
A continuously increasing association was observed between the F2P score and the predicted risk of first-2-year relapse (Figure 2) with significant interaction between two periods of years 0-2 vs. 2-5 (interaction P = 0.003). A higher score was indicative of an increased likelihood of first-2-year relapse both before (HR, 2.80; 95% CI 1.93-4.07; P < 0.001) and after (HR, 2.94; 95% CI, 1.87-4.61; P < 0.001) the adjustment for clinicopathological parameters ( Table 4). Subgroup analysis revealed no substantial heterogeneity regarding the prognostic ability across the treatment subgroups (all interaction P > 0.05) (Figure 3).
As to other endpoints, the F2P Score also remained prognostic for 2-year DDFS, DRFS, and even OS when used as a continuous or categorical predictor (Tables 4, 5, Figures 4B-D).

DISCUSSION
Our study focused on very early relapse during the first two years after the initiation of endocrine therapy. We established an F2P Score, which is a novel prognostic approach to estimate the risk of first-2-year relapse in node-negative ER+/HER2-breast cancer by integrating both the clinicopathological and genomic factors. With per one unit increase in F2P score, an ∼3-fold higher risk of first-2-year relapse was observed in the current study, indicating an increased potential of endocrine resistance. The prognostic value was also demonstrated across treatment subgroups, for example, in patients treated with TAM or AI and in those treated with chemotherapy or not. Likewise, the F2P Score may also quantify the likelihood of first-2-year relapse when employed as a categorical predictor.
Of special note is the fact that the continuous or categorical F2P Score may also estimate 2-year OS, since about half of the deaths occurred without relapse (4). It also correlates significantly with other different endpoints (DDFS/DRFS) when used as a continuous or categorical predictor.
Consistent with earlier findings, our study found that increased expression of estrogen-related genes (ER, PGR, and BCL2) and individual genes (GSTM1 and BAG1) correlates with reduced likelihood of very early relapse, although most previous studies have focused on the years 0-5 (16,29). Three genes of the scoring system came from the estrogen module of the 21gene panel, supporting the idea that the F2P Score might indicate endocrine responsiveness. To date, numerous basic works have demonstrated that these estrogen-related genes play pivotal roles in the development of endocrine resistance (8)(9)(10)(11)(12). Zong et al. have also reported a higher rate of first-2-year relapse in patients with ER+/PR-/HER2-breast cancer when compared to those with ER+/PR+/HER2-tumor, suggesting that PR status could be adopted as a stable and reliable indicator of endocrine resistance in routine clinical practice (30). As to CD68, data at our disposal were interesting and we found that it played different roles between years 0-2 and 2-5, which was observed for PGR and BAG1 as well. We hypothesize that this time-dependent effect could be related to different polarization patterns of tumorassociated macrophages because CD68 is recognized as a panmacrophage biomarker (31,32). However, the 21-gene panel contains a small number of genes and thus, other important immunity-related genes could not be included in the analysis in the present study. The specific mechanism and whether the timedependent effect is attributed to regulation of the immune system or not remains unclear, and further investigation is required.
Despite the potential prognostic value of these genes, it seems unlikely that molecular biomarkers alone can predict prognosis accurately, and thus, we combined gene expressions with clinicopathological characteristics. In clinical practice, age and tumor size are routinely adopted as indicators of relapse as well as to inform treatment decisions, and their combination with gene expression can improve the prognostic performance (5,(17)(18)(19). Pan et al. reported that patients with T2-stage tumors were at higher risk of distant recurrence when compared to those with T1-stage ones during adjuvant endocrine therapy (3). Consistently, the current study found that tumor size was also of strong predictive value for very early (first-2-year) relapse. Additionally, Dowsett et al. revealed that the integration of clinicopathological factors and RAM50 ROR could substantially enhance the prognostic ability of either clinicopathological or genomic approaches alone (17). Likewise, earlier studies also demonstrated improved performance when comparing 21-gene RS to its combination with clinicopathological variables (17,19). Considering these existing data, the F2P Score, which integrated age, tumor size, and gene expressions, outperformed both geneonly models and several other prognostic tools in predicting first-2-year relapse. Another reason accounting for the superior performance may be that most prognostic tools were developed to estimate the risk of 5-or 10-year distant relapse rather than the first-2-year relapse examined in this discussion. Very early relapse during the first two years of adjuvant endocrine therapy is regarded as an indicator of primary endocrine resistance (7). Accordingly, patients with a higher F2P Score might present low responsiveness to endocrine therapy and relatively unfavorable prognosis. Indeed, worse OS was observed for patients who experienced the first-2-year relapse in our study. These results are of particular clinical relevance and the F2P Score may inform decision making concerning the escalation in systemic therapy to improve the prognosis. In our study, patients receiving chemotherapy, aromatase inhibitors, or ovarian suppression presented a numerically lower HR, revealing the phenomenon that the association between the F2P Score and first-2-year relapse differs to a certain extent among patients with diverse treatment options. Consequently, patients with different F2P Scores may benefit from different systemic treatment approaches and the F2P Score may facilitate the treatment decision, but a cautious interpretation was required since there was no significant interaction. A wide confidence interval and limited events among a small number of cases using ovarian suppression were also observed. Consequently, further studies exploring the treatment strategy for patients with various F2P Scores are required.
Our work has several limitations. First, this is a retrospective study and thus, the F2P Score should be rigorously validated in prospective trials. Second, despite great necessity, external validation was not performed in our study since data presented herein came from a single institute. We performed the validation based on the bootstrap resampling method. Third, it is likely that some key genes might not have been detected due to the false negativity that results from a limited sample size. To address this issue, we included the genomic variables with P < 0.1. Fourth, data of gene expression in our study were from a 21-gene panel rather than the sequencing dataset, thus limiting the candidate genes selected for model development. Further investigation is required to establish a prognostic model based on a large microarray or sequencing dataset. However, it is also economic and of great convenience that patients who received a 21-gene test can also obtain an endocrine-sensitivity score. Last, a cutoff point of 2.5 for the F2P Score was determined. However, it is only for reference, since the gene expressions are tested based on various platforms and protocols in different institutes and standardization of the optimal cut-off point is required in a prospective setting.
In conclusion, the F2P Score reported herein, taking into account both the clinicopathological and genomic factors, may inform the prognosis and endocrine responsiveness in ER+/HER2-breast cancer. A higher F2P Score may indicate an increased likelihood of first-2-year relapse and therefore, suggest a potential resistance to endocrine therapy. Accordingly, after the potential benefits and risks have been considered, treatment escalation may be an alternative strategy for patients with a high F2P Score.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: http:// bcdb.mdt.team:8080/, Available on the request of the corresponding author.

ETHICS STATEMENT
The current study was approved by the independent Ethical Committee of Ruijin Hospital. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
CL, JW, and LZ: study concept and design. CL, JW, LL, and LZ: data analysis and interpretation. CL: visualization. LZ: final approval and funding acquisition. All authors: data acquisition and manuscript preparation.

FUNDING
This study was funded by the National Natural Science Foundation of China (No. 81572581) and Science and Technology Commission of Shanghai Municipality (No. 16411966900). The funders had no role in the design of the study and collection, analysis, interpretation of data, or in writing the manuscript.