A nomogram for predicting the risk of male breast cancer for overall survival

Background Male breast cancer (MBC) is a rare disease, accounting for <1% of all male carcinomas. Lack of prospective data, the current therapy for MBC is based on retrospective analysis or information that is extrapolated from studies of female patients. We constructed a nomogram model for predicting the overall survival (OS) of MBC patients and verify its feasibility using data from China. Methods Constructed a predictive model using 1224 MBC patients from the Surveillance, Epidemiology and End Results (SEER) registry between 2010 and 2015. The performance of the model was externally validated between 2002 to 2021 using 44 MBC patients from the Fujian Medical University Union Hospital. The independent prognostic factors were selected by univariate and multivariate Cox regression analyses. The nomogram was constructed to predict individual survival outcomes for MBC patients. The discriminative power, calibration, and clinical effectiveness of the nomogram were evaluated by the receiver operating characteristic (ROC) curve, and the decision curve analysis (DCA). Results A total of 1224 male breast cancer patients were in the training cohort and 44 in the validation cohort. T status (p<0.001), age at diagnosis (p<0.001), histologic grade (p=0.008), M status (p<0.001), ER status (p=0.001), Her2 status (p=0.019), chemotherapy (p=0.015) were independently associated with OS. The diagnostic performance of this model was evaluated and validated using ROC curves on the training and validation datasets. In the training cohort, the nomogram-predicted AUC value was 0.786 for 3-year OS and 0.767 for 5-year OS. In the validation cohort, the nomogram-predicted AUC value was 0.893 for 3-year OS and 0.895 for 5-year OS. Decision curve analysis demonstrated that the nomogram was more benefit than the AJCC stage. Conclusions We developed a nomogram that predicts 3-year and 5-year survival in MBC patients. Validation using bootstrap sampling revealed optimal discrimination and calibration, suggesting that the nomogram may have clinical utility. The results remain reproducible in the validation cohort which included Chinese data. The model was superior to the AJCC stage system as shown in the decision curve analysis (DCA).


Introduction
Breast cancer is one of the most common malignancies worldwide for women. However, male breast cancer (MBC) is a rare disease, accounting for <1% of all male carcinomas (1). Due to the lack of data on risk factors, prognostic value, and treatment options related to MBC, the therapeutic patterns for male breast cancer that clinicians recommended are based on female breast cancer (2,3).
However, whether the management of female breast cancer (FBC) can be used as a reference for MBC is still controversial. Some studies have concluded that MBC and FBC are two completely different types with different biological behaviors and should be treated differently (4,5).
Therefore, a personalized prediction model is required for patients with male breast cancer. A nomogram is a simplified numerical model for statistical predictions that combines different independent factors (6)(7)(8). However, can the model built using the Surveillance, Epidemiology, and End Results (SEER) database be applicable to the Chinese? Few articles have been published on this subject.
In our study, a nomogram model was constructed by the SEER database for predicting the overall survival (OS) of MBC patients. Further, it was investigated whether the model was also applicable to the Chinese population.

Patient selection and data collection
Data were acquired from the open-access, authoritative database of the SEER Program. Launched in 1973 by the United States Centers for Disease Control and Prevention and National Cancer Institute, the SEER database includes information on patients with endocrine, respiratory, digestive system, and other tumors, and covers approximately 34.6% of the population in the United States. The training cohort data used in this study came from a public, anonymous database and did not require ethics committee approval or informed consent. The validation cohort data were approved, and informed consent was obtained from the ethics committee of Fujian Medical University Union Hospital.
Training cohort data of MBC patients from 2010 to 2015 in the SEER database were extracted and screened by SEER Stat version 8.3.5 software. Validation cohort data from 2002 to 2021 in Fujian Medical University Union Hospital were extracted. Inclusion criteria were 1) pathologically diagnosed patients with breast cancer, based on the malignant behavior of International Classification of Diseases (ICD)-O-3 SEER site/histology v a l i d a t i o n c o d e 8 5 0 0 / 3 , 2 ) m a l e , a n d 3 ) c o m p l e t e clinicopathological and follow-up data. Exclusion criteria were 1) unknown important date, 2) with history of other types of cancer, 3) with less than 1 month of survival, and 4) diagnosis depends on biopsy/autopsy. According to the inclusion and exclusion criteria, cases meeting the criteria were gradually screened out, and 1,224 MBC patients were finally included in the training cohort. A total of 44 patients were included in the validation cohort. The study was not subject to review by the Institutional Review Board because we used unidentified, previously collected, and publicly available data. The flowchart of the male breast cancer selection is shown in Figure 1.
The clinicopathological information of patients in Fujian Medical University Union Hospital and the SEER database, including age, marital status, radiotherapy, chemotherapy, surgery, stage, grade, estrogen receptor (ER) status, progesterone receptor (PR) status, human epidermal growth factor receptor 2 (HER2) status, and subtype, was compared. Also, variables such as survival state and time were compared. Data from 1,224 patients extracted from the SEER database were used as the training cohort to analyze the independent influencing factors of MBC prognosis and establish a prediction model. The validation of the model was further demonstrated using the data of 44 patients from Fujian Medical University Union Hospital as the validation cohort.

Statistical analysis
Demographic and clinical characteristics were summarized using descriptive statistics. Categorical variables were reported as whole numbers and proportions, and continuous variables were reported as medians with standard deviation (SD). Pearson's c 2 test and Fisher's exact test were used for categorical variables, and the Mann-Whitney U test was used for rank variables to compare the baseline characteristics of the training cohort and the validation cohort. The Kaplan-Meier method was used to describe the OS curve, and the log-rank test was used to evaluate the survival differences of distinct subgroups of each variable. The cutoff age for male breast cancer was determined by the X-tile procedure at 64 to 80 years ( Figure 2). Patients were divided into three groups for further analysis (age ≤ 64, 65-80, and >80 years). Significant variables were screened by Cox univariate analysis, and variables with p < 0.1 in univariate analysis were included in the multivariate Cox proportional hazards model. The above statistical analyses were performed with IBM SPSS Statistics 26.
The prediction performance of the nomogram was internally verified by 1,000 resampling using the bootstrap method. The discrimination degree of the model was evaluated by the consistency index (concordance index (C-index)), receiver operating characteristic (ROC) curve, and area under the curve (AUC), and the model was detected by drawing the calibration curve. Degree of calibration was performed to ensure that the model is accurate and reliable. Decision curve analysis (DCA) was used to evaluate the overall survival of the nomogram compared with American Joint Committee on Cancer (AJCC) staging. Test level a = 0.05 (two-tailed). The above statistical analyses were performed with R 4.1.0 software. X-tile analysis of optimal cutoffs for age. (A) X-tile plot of the age training cohort. (B) Cutoffs are highlighted with histograms of the entire cohort.

Patient characteristics
(C) Different prognoses determined by cutoffs are shown with Kaplan-Meier plots (age ≤ 64 years = blue, age 65-80 years = gray, and age >80 years = magenta). The flowchart of the selection for male breast patients in SEER database. SEER, Surveillance, Epidemiology, and End Results.

Nomogram construction and validation
Multivariate-derived coefficients were used to develop a novel nomogram to predict male breast cancer 3-year overall survival and 5-year overall survival ( Figure 4).
According to the results, the nomogram contains age, histologic grade, T status, M status, ER status, HER2 status, receipt of chemotherapy, and surgery type. The nomogram illustrates that the ER status accounted for a vast majority of the proportion compared with other clinical features. The calibration curve of the nomogram showed high consistencies between the predicted and observed survival probability in both the training and validation cohorts ( Figure 5). Perfectly calibrated models are indicated by dashed lines, and the results all show a good fit to the actual probabilities of the predicted probabilities. The ROC curves of the 3-year OS nomogram and 5-year OS nomogram for both the training and validation cohorts are shown in Figure 6. In Figure 6A, the 3-year OS AUC value was 0.786 in the training cohort and 0.893 in the validation cohort. In Figure 6B, the 5-year OS AUC value was 0.767 in the training cohort and 0.895 in the validation cohort. DCA curves showed that the nomogram could better predict the 3-and 5-year OS, as it added more clinical benefits compared with AJCC staging for all threshold probabilities in the training cohorts (Figure 7).

Discussion
Breast cancer has become the most common malignancy in women worldwide, but breast cancer in men is still very rare. Due to its rarity, many clinical decisions have been informed and developed by the practice of female patients (9). However, MBC  is considered to be a disease with distinct characteristics from FBC (5, 10). Meanwhile, an analysis from the National Cancer Database showed that overall survival rates for MBC remained lower than for FBC after adjusting for age, race, clinical, and treatment issues (11). Therefore, clinical characteristics and overall survival of MBC need to be further investigated. From the baseline characteristics of MBC, the median age at the time of diagnosis of MBC is 65.35 ± 12.24 years, similar to a previous study (12). The majority of patients present with grade I or grade II disease (62.4%), ER-positive (97.3%), and less distant metastases (93.5%), compared with previous female studies (13,14).
Traditionally, AJCC staging is the most general tool used to assess prognosis. It indicates the objective tumor load and metastasis status. However, the prognosis of tumors is composed of multiple biological and clinical factors. Current National Comprehensive Cancer Network (NCCN) and American Society of Clinical Oncology (ASCO) guidelines recommend the use of ER, PR, HER2, and Ki-67 status also as important prognostic factors in medical decision making. In addition to T, N, M, ER, PR, and HER2 status, in our Cox analysis, age, histologic grade, and whether or not to perform surgery and chemotherapy were also associated with OS. Therefore, it is necessary to establish a more comprehensive model to predict OS in MBC. Previous research attempted to use predictive models for FBC on male breast cancer patients (15), but it was found that the predictive factors were not the same, possibly due to differences in the biological determinants of male and female breast cancer. Therefore, it is necessary for us to establish an independent predictive model based on data from male breast cancer.
In our study, in addition to surgery type, age, T status, M status, and histological grade, the expression status of ER and HER2, as well as the use of chemotherapy, also play important roles in the prognosis of MBC. It is noteworthy that N status was found to be significant in our univariate analysis but lost its significance in the multivariate analysis when considering multiple factors. This finding deviates from previous research results (16,17). It is possible that the lack of significance of the N stage in the multivariate analysis could be due to a small sample size of male breast cancer cases included in our study.
It is worth noting that radiotherapy does not improve OS in MBC (p = 0.476). In previous studies of FBC, radiotherapy did improve local relapse in breast cancer patients, but whether radiotherapy improves OS remains controversial (18,19). There are still few relevant studies of MBC. According to Kaplan-Meier survival analysis, our research findings indicate that there was no statistically significant difference in survival rates between male breast cancer patients who underwent total mastectomy and those who underwent partial mastectomy. This is consistent with previous research (20), suggesting that surgical procedures may not significantly impact survival outcomes in male breast cancer. However, adjuvant radiotherapy after partial breast resection may have mitigated potential survival differences. Further research with larger sample sizes and controlled confounding factors are needed for confirmation.
China has the highest number of breast cancer cases, accounting for approximately 18.4% of global breast cancer cases (21). In our study, the median age of diagnosis in China showed different patterns from the United States: the median age of diagnosis in China was almost 7 years earlier than that in the United States. This gap is smaller than in previous studies of FBC (22). Additionally, other different MBC features were demonstrated in our results, such as a higher proportion of T1 status patients, a higher proportion of grade I and II patients, a lower ER positive proportion, and a lower proportion of radiotherapy. There are differences in follow-up duration and basic patient characteristics between the training and validation cohorts. However, based on the ROC curves, it can be observed that the model achieved good validation performance across different baselines. Nevertheless, it cannot be denied that the bias in validation results may be influenced by different baselines. Therefore, further validation on multiple datasets is necessary.
In this model, the DCA curve indicates that this nomogram model has better predictions when compared to the AJCC staging. A higher C-index and a relatively high uniformity of the calibration plots were also shown in the model. In addition, we validated it with single-center data in China. Although there are more differences between the validation cohort and the training cohort, it also shows better validation results when external validation is performed. As far as we know, this is the first study to build and verify a nomogram in MBC with the SEER database and China single-center data.
Inevitably, our study has some limitations. First, this is a retrospective study, in which selection bias is inevitable. Second, some important confounding prognostic factors were not available in the SEER database, which include the Ki-67 index (23) and BRCA1-and BRCA2-related mutations (24,25). Third, due to the data being derived from a single center, there is a need for additional validation using data from multiple centers to further assess the model's reliability and generalizability.

Conclusion
Male breast cancer has been neglected due to its rarity, resulting in fewer studies related to treatment and prognosis. In this study, we developed a clinical prognostic model that combines the prognostic characteristics of male breast cancer and validated it with Chinese male breast cancer data. The results showed that the prediction model is applicable to different ethnic groups.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.