Nomogram for Predicting Lymph Node Involvement in Triple-Negative Breast Cancer

Background Lymph node metastasis of triple-negative breast cancer (TNBC) is essential in treatment strategy formulation. This study aimed to build a nomogram that predicts lymph node metastasis in patients with TNBC. Materials and Methods A total of 28,966 TNBC patients diagnosed from 2010 to 2017 in the Surveillance, Epidemiology and End Results (SEER) database were enrolled, and randomized 1:1 into the training and validation sets, respectively. Univariate and multivariate logistic regression analysis were applied to identify the predictive factors, which composed the nomogram. The receiver operating characteristic curves showed the efficacy of the nomogram. Result Multivariate logistic regression analyses revealed that age, race, tumor size, tumor primary site, and pathological grade were independent predictive factors of lymph node status. Integrating these independent predictive factors, a nomogram was successfully developed for predicting lymph node status, and further validated in the validation set. The areas under the receiver operating characteristic curves of the nomogram in the training and validation sets were 0.684 and 0.689 respectively, showing a satisfactory performance. Conclusion We constructed a nomogram to predict the lymph node status in TNBC patients. After further validation in additional large cohorts, the nomogram developed here would do better in predicting, providing more information for staging and treatment, and enabling tailored treatment in TNBC patients.


INTRODUCTION
Breast cancer, the most common malignant tumor in women, is a heterogeneous disease. Triplenegative breast cancer (TNBC) represents one of the subtypes described in recent years, which does not express estrogen receptor (ER), progesterone receptor (PR) or human epidermal growth factor receptor 2 (HER2). It shows a variety of biological, clinicopathological and molecular characteristics, responses significantly differently to treatment and achieves divergent prognosis (1,2). Despite the low incidence, accounting for about 10 to 20% of all breast cancer cases, TNBC shows strong invasiveness, high malignancy and short relapsefree survival, reflecting the vital role of early diagnosis and accurate staging (3). Compared with other subtypes, patients with TNBC are more likely to show lymph node metastasis at the initial diagnosis (4).
Studies have shown that lymph node status is crucial for prognosis prediction and treatment decision in TNBC (5)(6)(7). At present, sentinel lymph node biopsy (SLNB), axillary lymph node dissection (ALND) and subsequent pathological diagnosis are commonly used methods to evaluate lymph node status in TNBC. The false negative rate of SLNB is 5-10%, which may result in improper patient management. Sufficient ALND can effectively reduce the risk of TNBC metastasis, but may cause chronic side effects such as numbness, stiffness in the upper body, and lymphedema. Moreover, extra-axillary lymph node metastasis also occurs (8), implying that SLNB or ALND might not be sufficient for the diagnosis of lymph node metastasis in TNBC. Therefore, it is helpful to classify TNBC cases preoperatively based on clinicopathologic factors, which contributes to the development of individualized surgical treatments and reducing overall mortality and morbidity in TNBC.
Clinical researchers and clinicians always make unremitting effort in predicting lymph node (LN) status. Several studies have developed multiple models for LN status prediction, but mostly are based on limited cases (9). Tan et al. constructed an immunerelated genes (IRGS)-based nomogram to accurately estimate the preoperative ALN status of 214 operable TNBC cases (10). Despite its strong performance, the gene-based model may be difficult to promote. Therefore, this study aimed to develop a risk nomogram based on clinical data to determine lymph node metastasis, which could help to identify TNBC patients with positive lymph nodes more quickly.

Patients
We extracted the data of 28,966 triple-negative breast cancer patients registered between January 1, 2010 and December 31, 2017 from the SEER program. HER2 status was absent in SEER's breast cancer cohort before 2010, and an enormous number of patients diagnosed before this time point were not included. Analysis cohorts were identified according to the following criteria: unilateral, invasive carcinoma of the breast (ICD-O-3 8500); diagnosis confirmed by positive histology and not by autopsy or a death certificate, as the first and only primary tumor; adjusted AJCC stage I-III; known tumor size; histological grade I-III; known regional lymph node status; ER, PR, HER2 negative. Patients with Paget's disease or younger than 18 years old were excluded. The patients were randomized 1:1 to the training and validation sets, respectively, for the construction and verification the nomogram. The following information was collected and transformed into categorical variables: age, race, gender, laterality, grade, location, histological type, and T stage.

Construction and Validation of the Nomogram
Lymph node status was determined according to the Regional Nodes Positive term. We first screened the lymph node status-related clinicopathological characteristics, and found that statistically significant variables included age, race, grade, location, histological type, and T stage (P<0.05). All these variables were analyzed by univariate logistic regression analysis, and the correlated ones (P<0.05) are estimated through multivariate logistic regression analysis. As a result, significantly independent predictors were identified to construct a well-calibrated nomogram. Odds ratios (ORs) and 95% confidence intervals (CIs) were also calculated. Nomogram performance was quantified with respect to calibration and discrimination. Calibration was assessed graphically by plotting the relationship between actual (observed) and predicted probabilities by the Hosmer goodness-of-fit test (11). Internal validation of performance was estimated by the bootstrapping method (1,000 replications). According to the nomogram, total points for all patients were determined with the "nomogramFormula" package in the R software. Discrimination (ability of a nomogram to separate patients with different lymph node statuses) was quantified by the area under the receiver operating characteristic (ROC) curve (AUC). A larger AUC (range 0.5-1.0) reflected a more accurate prediction.
Finally, the best cut-off value is determined by the Youden's index, according to which, the training and validation cohorts were divided into two subgroups. The correlation between the nomogram and the risk of lymph node metastasis was estimated by univariate logistic regression analysis.

Statistical Analyses
The chi-square test was performed to evaluate the associations of lymph node status with appropriate variables. Fisher's exact test was carried out if necessary. Statistically significant was defined as two-sided P<0.05 was considered, unless otherwise stated. All statistical analyses were performed using STATA (version 14.1) and R (version 3.6.1). The R packages caret, rms, pROC, ggplot2, parallel, and nomogramFormula were applied.

Patient Characteristics
There were 28,966 patients enrolled in this study, with 8,710 (30.07%) lymph node positive ( Table 1). The demographics and clinicopathologic characteristics related to lymph node status included age, race, grade, location, histological type and T stage. Younger patients (age<60) have a higher rate of lymph node involvement (32.43%) compared with older ones (age≥60, 26.98%) (P < 0.001). As for race, 33.79% black patients had positive lymph nodes versus 29.00% for white patients and 30.00% for others (P < 0.001). The positive rate of lymph nodes was higher in patients with grade III cancer than grade II and grade I (31.33% vs. 25.88% and 12.20%, respectively; P < 0.001). Patients with primary tumor located in the axillary tail of the breast were more likely to have positive lymph nodes (46.26%), while cases primarily located in the central portion of the breast ranked second

Independent Predictors in Training Set
According to univariate Cox analysis, age, race, location, grade, histologic type, and T stage were significantly associated with the positive rate of lymph nodes (Table S1). These factors were included in multivariate logistic regression analysis ( Table 2). The result confirmed that grade was not an independent predictor (P=0.421) and the others were statistically significant and independent predictors for lymph node status (P<0.05).

Construction and Validation of the Nomogram
We established a nomogram based on significant and independent predictors determined by multivariate analysis (Figure 1), including age, race, location, histological type, and T stage. By adding up the scores of all the variables, the probability of a specific patient to have positive lymph nodes could be predicted. As we can see, younger black patients with T4 and IDC/ILC tumor at the axillary tail had highest scores, while elderly white cases with non-ILC or non-IDC, and T1 tumors had a lower risk of lymph node metastasis. The novel nomogram predicted the risk of positive lymph nodes between 0.05 and 0.8. In order to test the performance of the new nomogram, 1,000 bootstrap resampling was carried out for internal verification through the calibration chart in the training set ( Figure 2). The calibration curve indicated a good calibration effect of the nomogram. The effectiveness of the nomogram for predicting lymph node status was further evaluated using ROC curves for the training ( Figure 3A) and validation ( Figure 3B) sets. In the training set, AUC was 0.684 (95%CI: 0.675-0.693), which is similar to the AUC observed in the validation set (0.689, 95%CI: 0.679-0.698). These results indicated that the nomogram is a useful predictor for lymph node status in TNBC.

Risk Stratification by the Nomogram
The cut-off value of total scores for predicting lymph node status was determined by Youden's index in the training set. Both the training and validation sets were subdivided into the low score groups (total points ≤ 82) and high score groups (total points>82), respectively. After applying the cut-off value to the training set, univariate analysis found a significant difference in the probability of lymph node metastasis between the high and low score groups (OR=3.24, 95%CI:3.03-3.49; P<0.001), consistent with the results obtained in the validation set (OR=3.30, 95%CI 3.07-3.56; P<0.001; Table 3).

DISCUSSION
In this study, the risk factors associated with lymph node metastasis in triple-negative breast cancer were determined, and a predictive model was developed by logistic regression, with a nomogram attached. We found that age, race, T stage, primary site, grade, and histological subtype were related to lymph node status by univariate logistic regression analysis. These variables were independent predictors of lymph node status confirmed by multivariate logistic regression except for grade. These factors were shown to be predictors of axillary lymph node metastasis. As shown above, the risk of lymph node metastasis was positively correlated with T stage. The increase in T stage was significantly associated with the risk of lymph node metastasis, which was previously reported (12). Young patients had higher odds of developing lymph node metastasis compared with older ones. Patients with the axillary tail as the primary site were more likely to have metastatic lymph nodes. These results indicate that the primary site of the tumor is important in predicting lymph node metastasis. It was also confirmed that the pathological type of ILC is more prone to lymph nodes metastasis. To validate the Previous studies have constructed nomograms to predict both sentinel and non-sentinel lymph node metastases in breast cancer, performing well in cohorts at different institutions (13)(14)(15). Several well-designed nomograms have been accepted worldwide, with some adopted by clinicians (16)(17)(18)(19)(20). For example, Hwang et al. incorporated sentinel lymph node metastasis size into a nomogram that accurately predicts the likelihood of having additional axillary metastasis (16). Nevertheless, these models only show limited performance in triple-negative breast cancer. For predicting non-sentinel lymph node metastasis in TNBC, some of these widely used nomograms are not much better than coin tossing, with AUCs around 0.55. It is noteworthy that such nomograms still work well in ER positive patients in the same institute (21). This phenomenon can be partly attributed to that rather than being a single subtype, triple-negative breast cancer is a general concept covering a group of diseases, with a variety in biological behavior, as well as great differences compared with other subtypes (22). To settle this, the cohort used to build a model should be large enough to cover each "subtype" of TNBC with an adequate number. SEER, a nationwide program covering nearly a quarter of the US population, is an optimal cohort for building such a model.
Apart from the excellent cohort as the data source for the nomogram, this model has other advantages. First, our research used the clinical information of TNBC patients to predict lymph node metastasis. Meanwhile, existing researches (10,13) assessed TNBC at the genetic level, using IRGS to predict lymph node metastasis, and the obtained results were also good.   However, clinical information is more intuitive to make decisions easily in clinic. Secondly, compared with the long and complicated formulas of Cox and logistic predictive models, nomograms, composed of several simple scaled parallel lines, provide a reliable prognostic information that is unique to a given patient. Limited by the data and the characteristics of analysis, this study had some limitations. We were unable to obtain more information from the SEER database, including invasion of lymphatic or blood vessels, multifocality and even molecular biomarkers, which, if included, could improve the sensitivity and specificity of the present nomogram. In addition, as a retrospective study, selection or information bias was hardly avoidable. The main cohort in this study was the American population, and it is worth considering whether the results are applicable to other populations.

CONCLUSION
In summary, a predictive nomogram for lymph node metastasis detection in TNBC patients was developed. Evaluating lymph node metastasis remains a major concern in the treatment and staging of breast cancer. The present findings reveal the features of lymph node metastasis in TNBC, providing a reference for future treatment which would take neoadjuvant chemotherapy and sentinel lymph node biopsy into consideration, eventually optimizing clinical diagnosis and treatment.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: Surveillance, Epidemiology, and End Results (SEER) database (https://seer.cancer.gov/).

AUTHOR CONTRIBUTIONS
XC: conception of the work, data collection, data analysis and interpretation, drafting the article, critical revision of the article, and final approval of the version to be published. HZ and JH: conception of the work, critical revision of the article, and final approval of the version to be published. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by grants from the Training Plan of Excellent Talents of The First People's Hospital of Shangqiu (SQFPH2019). The funder had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.