Risk Factors, Prognostic Factors, and Nomogram for Distant Metastasis in Breast Cancer Patients Without Lymph Node Metastasis

Background Lymph node negative (N0) breast cancer can be found coexisting with distant metastasis (DM), which might consequently make clinicians underestimate the risk of relapse and insufficient treatment for this subpopulation. Methods The clinicopathological characteristics of N0 breast cancer patients from the Surveillance, Epidemiology, and End Results (SEER) database between January 2010 and December 2015 were retrospectively reviewed. Multivariate logistic and Cox analyses were used to identify independent risk factors in promoting DM and the 1-, 3-, and 5- year cancer-specific survival (CSS) in this subpopulation. Result Seven factors including age (<40 years), tumor size (>10 mm), race (Black), location (central), grade (poor differentiation), histology (invasive lobular carcinoma), and subtype (luminal B and Her-2 enriched) were associated with DM, and the area under curve (AUC) was 0.776 (95% CI: 0.763–0.790). Moreover, T1-3N0M1 patients with age >60 years at diagnosis, Black race, triple-negative breast cancer subtype, no surgery performed, and multiple DMs presented a worse 1-, 3-, and 5-year CSS. The areas under the ROC for 1-, 3-, and 5- year CSS in the training cohort were 0.772, 0.741, and 0.762, respectively, and 0.725, 0.695, and 0.699 in the validation cohort. Conclusion The clinicopathological characteristics associated with the risk of DM and the prognosis of female breast cancer patients without lymph node metastasis but with DM are determined. A novel nomogram for predicting 1-, 3-, 5- year CSS in T1-3N0M1 patients is also well established and validated, which could help clinicians better stratify patients who are at a high-risk level for receiving relatively aggressive management.


INTRODUCTION
Breast cancer is currently the most frequent malignancy and one of the leading causes of cancer death in the United States (estimated 279,100 new cases and 42,690 death) (1) and China mainland (estimated 304,000 new cases and 70,000 deaths) (2). Although the long-term survival of patients with breast cancer has been significantly increased in the past years with the application of targeted therapy (3), endocrine therapy (4), and even immunotherapy (5,6), distant metastasis (DM), as the most common form of recurrence and the main cause (approximately 90%) of death, could reverse this favorable outcome (7,8). Historically, the "Halsted" hypothesis indicated that the processing steps of breast cancer metastasis were mechanized and orderly, including primary focus enlargement, invasion to the regional lymph nodes, and further metastasis to distant organs via the bloodstream. However, subsequent studies on the biological characteristics of breast cancer metastasis have shown that the DM in breast cancer was a non-random process as it allowed circulating tumor cells (CTC) to seed at specific distant tissues, which suggested the metastasis did not require circulation through the lymph system but directly invade the distant organs via the bloodstream. Consequently, the CTC analysis technique has become a novel utility tool for predicting the prognosis of breast cancer patients, which could provide better treatment guidance for clinicians (9,10).
Indeed, as a key component of tumor stage classification, the status of the regional lymph nodes plays an important role in predicting the biological aggressiveness and propensity to spread in patients with breast cancer (11,12). Some scholars believe that regional nodal disease may precede metastatic dissemination (11). Therefore, after surgery, patients with negative lymph node status could remain a favorable outcome, and only a small fraction of them need adjuvant therapy during the postoperative follow-up (11). Additionally, reviewing the recent literature, negative lymph node status was frequently referred to as the "control group" in the study when scholars aimed to explore the risk factors of DM (13)(14)(15)(16). Patients with negative lymph node status were more likely to be assigned to the low-risk group. However, one thing that cannot be ignored was that there were still a considerable proportion of patients screened out having DM but negative lymph node status (17). The insufficient adjuvant therapy and management for this population might increase the risk of relapse in those lymph-nodenegative (N0) patients with multiple risk factors. And clinicians may underestimate the risk of relapse and make insufficient treatment for N0 patients with breast cancer. Therefore, it is equally important to identify the independent risk factors of DM in this particular subpopulation, which would not only help oncologists to begin tailoring treatment strategies to patients but also encourage researchers to investigate the underlying molecular mechanisms in breast cancer metastasis. Although some scholars have made efforts on evaluating the DM in lymph node negative primary breast cancer via evaluating the gene expression profiles and the integration of proliferation and immunity (17,18), whether there was a different clinical pattern between DM and non-DM patients without lymph node involvement was still unclear.
In the present study, we aimed to extract the potential risk clinicopathological factors in promoting DM of N0 primary breast cancer, which would fill the gap in identifying high-risk subgroups. Besides, we also evaluated the cancer-specific survival (CSS) in this subpopulation and further developed a novel predictive model to provide quantitative predictions on the outcome for N0 patients with DM. More aggressive treatment modalities and active surveillance may be justified in high-risk subgroups of patients.

Data Source
This is an observational retrospective cohort study. As a result, the data we analyzed were extracted from a large population-based (Surveillance, Epidemiology, and End Results, SEER, derived from the 18 cancer registries) research program, which included approximately 28% of the U.S. population and various ethnic groups. The medical records collection and analysis were performed by two study researchers, working independently to decrease the selection bias. The reporting of this study followed the guidelines of the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement (19).
Patients who met the following criteria were included: (1) female patients with histological confirmed invasive breast cancer; (2) aged at diagnosis between 18 and 79 years; (3) pathological confirmed negative lymph node status; (4) diagnosed between 2010 and 2015 years; (5) the histology types of breast cancer were infiltrating ductal carcinoma (IDL), infiltrating lobular carcinoma (ILC), and infiltrating ductal mixed lobular carcinoma (IDLC). Patients with T4 (invasion to the chest wall/skin and inflammatory carcinoma) primary site, no regional nodes examined, coexisting with one or more cancers, lost to follow-up, or incomplete medical records were excluded during the patients' selection process ( Figure 1).

Variable Evaluation and Definition
According to the requirement of establishing sample size of multivariate linear regression equation, the sample size in the present study should be at least 10 times of the number of independent variables in the equation. Thus, after excluding the unqualified cases, there were 79,746 female patients with invasive breast cancer enrolled in this study. They were assigned to explore the risk factors in promoting the DM in N0 breast cancer. Besides, for predicting 1-, 3-, and 5-year CSS, the N0 patients with DM between 2010 and 2015 years were randomly divided into a training group and validation group at a ratio of 7:3 via the "R" program.

Statistical Analysis
The primary endpoint of this study was DM and 1-, 3-, and 5year CSS probability. The univariate and multivariate logistic analyses were used for identifying the potential independent clinical risk factors in promoting DM of lymph node negative patients. And the univariate and multivariate Cox regression analyses were performed to find out the prognostic factors of CSS in patients with DM. The analyses were conducted via IBM SPSS (version 25.0). A two-tailed P-value of <0.05 was defined as the criterion for variable deletion when performing backward stepwise selection. The nomogram, calibration curve, and Kaplan-Meier analysis were constructed and plotted based on the results of the multivariate Cox regression analysis via using the "survival," "rms," "survminer," and "foreign" packages of the R software (R Foundation, Vienna, Austria, version 3.5.2, http:// www.r-project.org). Harrell's C-index is calculated to assess the discrimination performance of the present nomogram.

RESULT Clinicopathological Characteristics of Patients With Negative Lymph Node Status
Generally, between the years 2010 and 2015, a total of 79,746 female patients with invasive breast cancer were enrolled in this study with a median age of 61 years (range: 20-79 years) at diagnosis and a median follow-up time of 51 months (range: 0-95 FIGURE 1 | The patients' selection processing. T4, invasion to the chest wall/skin and inflammatory carcinoma; DM, distant metastasis. months). There were 1,069 cases (1.34%) identified coexisting with DM in the N0 patients, in which 748 cases were observed in the training cohort and 321 cases were in the validation cohort (Table 1). Specifically, the most frequent metastasis site was bone, which made up 327 cases (43.72%) and 150 cases (46.73%) of the DM patients in the training and validation cohorts. Notably, 385 (36.01%) patients suffered from multiple DMs. And almost 70.63% (755/1,069 cases) of patients with DM did not receive surgery for the primary tumor.

Univariate and Multivariate Logistic Analyses of the Risk Factors of DM
To investigate the potential clinical factors associated with the risk of DM in female breast cancer with negative lymph node status,

Predictive Nomogram Construction and Validation
Based on the multivariate Cox regression analysis, five variables including age at diagnosis, race, surgery performed, distant metastasis site, and tumor subtype were extracted for constructing the nomogram for predicting the 1-, 3-, and 5ear CSS in patients with negative lymph node status but DM at meet (Figure 3). Each factor represented a score on the points scale, and the total point could be calculated by adding up all the specific values from an individualized patient. The C-index of the nomogram reached 0.694, which represented relatively favorable discrimination (the specific value of each variable was calculated in Table 4). In the training cohort, the AUC of each 1-, 3-, and 5year CSS ROC was 0.772, 0.741, and 0.762 with a cutoff value of 185, 191, and 151, respectively, which indicated a satisfying prediction ability (Figures 4A-C). Moreover, the established nomogram was validated by an internal validation cohort with 321 cases. The results in the validation cohort also presented good discrimination with an AUC of 0.725 in predicting the 1year CSS ( Figure 4D), an AUC of 0.695 in predicting the 3-year CSS ( Figure 4E), and an AUC of 0.699 in predicting the 5-year CSS ( Figure 4F), respectively. Besides, to examine the discrimination of the proposed nomogram, the patients in the training set were categorized into four groups based on the total points obtained from the nomogram. The KM curve presented good discrimination in identifying the high-risk population (Supplementary Figure S3). To further evaluate the accuracy of the nomogram, the calibration curves for the probability of CSS presented a high agreement between 1-, 3-, and 5-year predictions of the nomogram ( Figure 5).

DISCUSSION
Nowadays, breast cancer has become the most frequent malignancy among women worldwide (1,2,20). While the overall survival (OS) rate in breast cancer patients has improved with the help of early-detection and multiple treatment modalities, patients who were diagnosed with DM at presentation still underwent a worse prognosis. In the last decades, great advances have been achieved in understanding and detecting breast cancer metastasis. Breast cancer was no longer regarded as a locoregional but systemic disease with an inherent feature of metastasis (8). There is no doubt that regional lymph node involvement is one of the important predictive factors in breast cancer DM. Even some scholars suggested and validated that regional node metastasis could precede metastatic dissemination (11). Notably, a considerable number of N0 patients were observed occurring de novo DM. With the wide application of circulating tumor cell (CTC) analysis, many scholars recognized that the DM was considered triggered by hematogenous spread of CTCs, rather than by lymphatic or direct intracavitary spread, which possibly occurred by a different mechanism. For this reason, breast cancer patients without regional lymph node metastasis but distant organ invasion would be the objects for exploring the underlying mechanisms.
However, only a few previous studies could be reviewed in predicting the risk factors and the prognosis of N0 patients (17,18,21). Herein, we provided a new insight in exploring whether there was a significant difference between N0 patients with DM or not, and the prognosis of those patients with DM was also evaluated. In this study, the incidence rate of DM in N0 patients was about 1.34% (1,069/79,746). Several clinicopathological factors including age at diagnosis, tumor size, race, tumor location, differentiation grade, histology, and subtype were significantly associated with DM. Younger patients (especially <40 years) have nearly twice the risk of DM than elderly patients, which was in accordance with Sabiani's report (22). Consistent with previous studies on evaluating the risk factors of DM in patients with invasive breast cancer, patients with tumor size (>10 mm), ILC, estrogen receptor (ER), and progesterone receptor (PR) as well as Her-2-positive subtype, and Black race (Supplementary Figure S2) presented a higher risk of DM (12,14,21). In terms of the tumor location, it has been determined that tumor location was significantly associated with the regional lymph node metastasis, especially when the tumor originated from the nipple and central location as well as overlapping of the breast (23)(24)(25)(26). We took it a step further that the nipple and central tumor locations were identified had a higher risk of DM (p<0.0001) in N0 women. Despite that we have discovered seven independent risk factors associated with DM in N0 patients, further studies are needed to verify the underlying molecular mechanisms in promoting this complex process.
Notably, some researchers have conducted to explore the risk factors of DM and the prognosis of patients with DM at presentation (7,13,15,16,27). For instance, Rosa Mendoza determined that tumor stage, primary tumor size, and lymph node involvement were the major predictors of DM in adult breast cancer (14). Besides, the Black race and Her-2-enriched subtype were also identified as the risk factors of DM in a recent study (28)(29)(30). In the present study, we explored the prognostic factors of 1-, 3-, and 5-year CSS among 748 N0 patients with DM. Although the N0 women at a young age were more likely to have DM, compared with elderly women, the young population, however, had better long-term outcomes than the elderly On the contrary, in another study by Sabiani and colleagues, they concluded that patients at a young age (<35 years) had the lower estimated disease-free survival (DFS) and OS rate (22). These discrepancies might be due to the differences in sample size and patient inclusion criteria. For example, all included patients in their study were under 50 years old, while the patients in ours were at the age between 20 and 79 years old combined with negative lymph node status. Additionally, we determined five independent risk factors in the poor CSS probability of N0 patients with DM. Moreover, the role of surgical treatment for the primary focus is regarded as a palliative surgery for patients with DM, and whether patients with DM can benefit from it remains controversial (32)(33)(34)(35). One meta-analysis derived from two randomized controlled trials presented that there was no final conclusion about the role of surgery performed in breast cancer patients with DM at presentation (35). With further exploration, some studies, including the present study, found that locoregional surgery would improve the CSS and OS outcomes of metastatic breast cancer (15,32,34,36). Indeed, there were still many questions on the discussion of the timing, type, and extension of the surgical procedures, which needed to be addressed in future works (33). Noticeably, compared with the previous study on evaluating the prognostic factors for patients with DM, primary tumor size (p=0.123) and grade differentiation (p=0.101) were not significantly associated with the CSS in the N0 population. In a similar studied population, Yu and his colleagues (29) determined that the larger tumor size was non-linear with the DM in N0 patients. They consequently believed the primary tumor biological features rather than the accumulated metastatic ability during tumor evolution likely determined the potential of distant dissemination, which indicated the indolent biological characteristics of the tumor. Accordingly, our results support this hypothesis but need further evaluation.
To visualize and more intuitively present the prognostic factors we determined for clinical use, the nomogram model was subsequently plotted. Markedly, in the nomogram, the breast cancer subtype accounted for a major part of the scoring system. Referencing similar nomograms for evaluating the prognostic of breast cancer (37, 38), the TNBC subtype was determined to yield the highest score. Consequently, the clinicians could obtain the risk coefficient in 1-, 3-, and 5-year CSS probability. Compared with other recent works on evaluating the 3-and 5-year CSS in breast cancer women with bone metastasis, the C-index of the present nomogram was 0.694, which was higher than Liu's (0.660) (16) and very close to the C-index of nomograms developed by Wang (0.705) (15) and Zhao (0.723) (37), confirming the promising discrimination of our model. To evaluate the accuracy of the nomogram, an independent cohort was subsequently used for validation. Expectedly, the AUC of the 1-, 3-, and 5-year CSS predicting ROC in the validation cohort reached 0.725, 0.695, and 0.699, respectively, which further proved the utility of our model to be applied to access the long-term CSS in this subpopulation. Besides, compared with the study of Wang and colleagues (17) Alternatively, this study has some limitations that have to be addressed in the future works. First, this is a retrospective study in which selection bias inevitably exists. Second, while the SEER database contains approximately 28% population-based cancer registration data, some significant confounding prognostic factors including but not limited to Ki-67 index (39), BRCA1and BRCA2-related mutation (40,41), as well as high 21-Gene Recurrence Score (21-GRS) (42), which have been proved to be related to worse survival in patients with breast cancer, are unavailable in the SEER database. Third, further information about adjuvant management of these patients was not reported in the present study, as these data were limited in the SEER database. Consequently, future works are supposed to fill this gap to get robust clinical evidence. Besides, with the technical advances in multidisciplinary management, the CSS in patients with breast cancer would increase in the future, which could influence the predictive ability of the model. Lastly, another weakness of this study is the lack of an external validation cohort, which limits further enforcing the reliability and clinical application of the nomogram. Thus, more external validation cohorts from multicenter and countries are urgently demanded to further evaluate the feasibility of our nomogram.

CONCLUSION
In summary, this study first identified the potential risk clinicopathological characteristics of DM in N0 patients and the prognostic factors in patients with DM at presentation. N0 patients with younger age at diagnosis, larger tumor size, central tumor location, Black race, poorer differentiated grade, ILC, and luminal B subtype have the highest risk of DM, which could help clinicians to avoid underestimating the risk of DM and subsequent undertreatment in N0 patients. However, DM patients with elderly age at diagnosis, TNBC subtype, and multiple metastasis sites have the worst prognosis. Besides, the novel validated nomogram could help clinicians to better stratify patients who are at high risk of cancer-specific death for receiving relatively aggressive treatment and management. Meanwhile, we propose more external validation to further strengthen our findings.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
Ethical approval was waived by the local Ethics Committee of the Chongqing Medical University in view of the retrospective nature of the study and all the procedures being performed were part of the routine care.