Survival Analysis and Prediction Model for Pulmonary Sarcomatoid Carcinoma Based on SEER Database

Objective This study aimed to investigate the incidence of the pulmonary sarcomatoid carcinoma (PSC), to compare the clinical characteristics and overall survival (OS) of patients with PSC and those with other non-small-cell lung cancer (oNSCLC), so as to analyze the factors affecting the OS of patients with PSC and construct a nomogram prediction model. Methods Data of patients with PSC and those with oNSCLC diagnosed between 2004 and 2015 from the Surveillance, Epidemiology, and End Results database were collected. The age-adjusted incidence of PSC was calculated. The characteristics of patients with PSC and those with oNSCLC were compared, then the patients were matched 1:2 for further survival analysis. Patients with PSC were randomly divided into training set and testing set with a ratio of 7:3. The Cox proportional hazards model was used to identify the covariates associated with the OS. Significant covariates were used to construct the nomogram, and the C-index was calculated to measure the discrimination ability. The accuracy of the nomogram was compared with the tumor–node–metastasis (TNM) clinical stage, and the corresponding area under the curve was achieved. Results A total of 1049 patients with PSC were enrolled, the incidence of PSC was slowly decreased from 0.120/100,000 in 2004 to 0.092/100,000 in 2015. Before PSM, 793 PSC patients and 191356 oNSCLC patients were identified, the proportion of male, younger patients (<65 years), grade IV, TNM clinical stage IV was higher in the PSC. The patients with PSC had significantly poorer OS compared with those with oNSCLC. After PSM, PSC still had an extremely inferior prognosis. Age, sex, TNM clinical stage, chemotherapy, radiotherapy, and surgery were independent factors for OS. Next, a nomogram was established based on these factors, and the C-indexs were 0.775 and 0.790 for the training and testing set, respectively. Moreover, the nomogram model indicated a more comprehensive and accurate prediction than the TNM clinical stage. Conclusions The incidence of PSC was slowly decreased. PSC had a significantly poor prognosis compared with oNSCLC. The nomogram constructed in this study accurately predicted the prognosis of PSC, performed better than the TNM clinical stage.


INTRODUCTION
Pulmonary sarcomatoid carcinoma (PSC) is a rare subtype of non-small-cell lung cancer (NSCLC), accounting for only 0.1%-0.4% of lung cancers (1). PSC refers to poorly differentiated NSCLC containing sarcoma or sarcoma-like components or carcinomas consisting of spindle cells and giant cells (2). According to the 2015 World Health Organization (WHO) classification of lung tumors and International Classification of Disease for Oncology, 3 rd Edition (ICD-O-3), PSC was classified into five categories: pleomorphic carcinoma, spindle cell carcinoma, giant cell carcinoma, carcinosarcoma, and pulmonary blastoma (3). Previous studies showed that PSC was more aggressive than other non-small-cell lung cancer (oNSCLC) and had a worse prognosis (4)(5)(6).
At present, the increasing number of studies on PSC are case reports or retrospective analyses focusing on the clinicalpathological characteristics and prognostic factors (6,7). Despite numerous efforts to study the features of PSC, large population-based study never specifically investigated the incidence of PSC. The Surveillance, Epidemiology, and End Results (SEER) database is a systematic population-based cancer database. Therefore, this study was conducted to provide an overview of PSC incidence based on the data of SEER database.
Besides, only a few studies provided limited information about the detailed distinction between PSC and oNSCLC, and constructed a prediction model for PSC without external validation (8). Hence, this study also aimed to compare the clinicopathological characteristics and survival outcomes with oNSCLC, to explore the clinical features related to PSC overall survival (OS), and to construct and validate a nomogram prediction model.

Data Source
The patient data were obtained from the SEER

Study Design
The process and study design are presented in a flow-chart ( Figure 1). The age-adjusted incidence of PSC was calculated with the patients with PSC. Then, the patients with PSC and the patients with oNSCLC were matched 1:2 between for further survival analysis using propensity score matching (PSM). Eligible patients with PSC were randomly divided into training set and testing set with a ratio of 7:3, and prognostic nomogram to predict 1-year survival for PSC was constructed based on training set and was validated using concordance index (Cindex) and calibration curves in two sets. The total nomogram score of each patient was obtained, and the corresponding area under the curve (AUC) was achieved to compare the accuracy of the nomogram with the TNM clinical stage.

Covariates
The study covariates included age at diagnosis, sex, race, year of diagnosis, grade, laterality, TNM clinical stage, chemotherapy, radiotherapy, surgery, survival months, and vital status.

Statistical Analysis
Covariates were presented as frequency and percentages and compared using Pearson's chi-square test. The age-adjusted incidence was calculated with Rate Session in the SEER*STAT software. PSM method was used to balance baseline covariates between PSC and oNSCLC. Survival analysis was accomplished by the Kaplan-Meier method and the log-rank test. In the PSC group, the Cox proportional hazards model was used to identify the covariates associated with OS and calculate the hazard ratio (HR) with 95% confidence interval (CI). The results were displayed using the forest plot. Based on the Cox model, significant covariates (P < 0.05) were used to construct the nomogram and the C-index was calculated to measure the discrimination ability. Statistical analyses were performed using the IBM SPSS Statistics, version 23.0, and the "survival," "MatchIt," "createtableone," "love.plot," "rms," "nomogramEx," "nomogramFormula," and "survivalROC" packages in the R version 4.0.0 (http://www.r-project.org/).

Annual Incidence
A total of 1049 patients with PSC were enrolled to analysis the rate of incidence. The age-adjusted incidence of PSC was calculated based on the SEER 18 registries. Overall, the incidence of PSC was slowly decreased from 0.120/100,000 in 2004 to 0.092/100,000 in 2015 ( Figure 2A). The incidence of male was obviously higher than female ( Figure 2B). Among pathological subgroups, the incidence of carcinosarcoma increased from 0.015/100,000 in 2004 to 0.028/100,000 in 2015, and the incidence of giant cell carcinoma and spindle cell carcinoma decreased from 0.038/100,000 in 2004 to 0.014/ 100,000 in 2015 and 0.040/100,000 in 2004 to 0.020/100,000 in 2015, respectively. The incidence of pulmonary blastoma was stable ( Figure 2C). In addition, the incidence decreased significantly in TNM clinical stage III from 0.028/100,000 in 2004 to 0.009/100,000 in 2015 ( Figure 2D). Supplementary Table 1 shows the detailed incidence data of PSC.

Patient Characteristics
Of the 1049 patients with PSC and 205866 patients with oNSCLC in the primary SEER database, patients who had unknown TNM clinical stage (PSC: N = 256; oNSCLC: N = 14510) were excluded. Thus, 793 patients with PSC and 191356 patients with oNSCLC were identified ( Figure 1).

PSM for PSC and oNSCLC
Thus, the PSM method was used to balance all characteristics, including age, sex, race, year of diagnosis, grade, laterality, TNM clinical stage, chemotherapy, radiotherapy, and surgery between the two groups.
After PSM, the clinicopathological characteristics between 777 patients with PSC and 1535 patients with oNSCLC were shown as follows. Most of the patients were men, in both PSC (59.2%) and oNSCLC (59.2%, P = 1.000). Approximately 59% patients were older than 65 years in two groups (P = 0.960). Grade IV tumors (13.9% vs 13.1%, P = 0.961) as well as TNM clinical stage IV (49.3% vs 49.6%, P = 0.997) were balanced in two groups, and other covariates, including race, year of diagnosis, laterality, chemotherapy, radiotherapy, and surgery also showed no significantly difference (Table 1 and Figure 3).

Multivariate Analysis of PSC
A total of 793 patients were included in the PSC group. With a ratio of 7:3, patients with PSC were randomly assigned to the training set (N=555) and testing set (N=238). Supplementary Table 2 lists the baseline characteristics, no significant difference was found between the two sets.
As shown ( Figure 5), in the training set, age (≥65 years, HR 1.36, P = 0.003) had a correlation with worse prognosis while female (HR 0.73, P = 0.002) was associated with better OS. The TNM clinical stage (II vs I, HR 2.39, P < 0.001; III vs I, HR 2.70, P < 0.001; IV vs I, HR 5.02, P < 0.001) were also covariates having effect on OS. Patients who received chemotherapy (HR 0.46, P < 0.001) or radiotherapy (HR 0.75, P = 0.005) or underwent surgery (HR 0.40, P < 0.001) experienced superior survival compared with those who did not. Other characteristics (race, year of diagnosis, pathological type, grade, and laterality) had no statistically significant difference (P > 0.05) in the model.

Construction and Validation of the Nomogram
The Cox regression analysis demonstrated that age, sex, TNM clinical stage, chemotherapy, radiotherapy, and surgery were independent prognostic factors for OS. So a nomogram was established to predict 1-year survival based on the results ( Figure  6A). The TNM clinical stage was the largest contributing covariate to prognosis, followed by surgery and chemotherapy. In addition, age, sex and radiotherapy also presented an impact on OS. Each subtype of significant characteristics corresponded to a unique point. The total points of every patient were calculated. It was convenient to estimate the probability of 1year survival by locating it on the scale. Supplementary Table 3 shows the detailed score for each characteristic. The stability of the nomogram was validated using the calibration plot in the training set ( Figure 6B) and testing set ( Figure 6C), respectively. The calibration curves displayed high internal and external consistency with the actual observation for 1-year survival, and achieving C-indexs of 0.775 and 0.790 for the training and testing set, respectively.

Comparison of the Predictive Accuracy for OS Between Nomogram and TNM Clinical Stage
Compared with the TNM clinical stage, in the training test, the AUCs of nomogram and TNM clinical stage were 0.867 and 0.813, respectively; and in the testing set, the AUCs were 0.871 and 0.762, respectively. As shown in Figure 7, the nomogram had better prediction accuracy for the 1-year survival probability compared to TNM clinical stage.

DISCUSSION
This large retrospective study showed that PSC was a rare cancer, accounting for less than 0.5% of oNSCLC, as described in previous studies (4,5). It is remarkable that the incidence of PSC was slowly decreased. As for pathological, the incidence of carcinosarcoma was significant increased. PSC occurred more frequently in the elder (≥65 years) and male individuals (9). In the current study, PSC had a highly aggressive behavior, with a significantly higher proportion of poorly differentiated tumors (grade IV: 14.9% vs 1.5%, P <0.001) compared with oNSCLC, and up to 49.1% of patients were in stage IV when diagnosed.
PSC is difficult to diagnose, leading to a poor prognosis (1,4). In small-sample studies on PSC, mOS was 5-8.5 months (10)(11)(12)(13)(14). In previous SEER-based studies, mOS was 6.0 months for patients with all stages and 3.0 months for patients with advanced disease (4,6). In this study, mOS was 5.0 months for PSC and 12.0 months for oNSCLC (P < 0.001) before PSM. After equalizing significantly different characteristics in PSC and oNSCLC, PSCs were still found to have a significantly poorer clinical outcome compared with oNSCLC (mOS: 5.0 m vs 10.0 m, P < 0.001).
The multivariate Cox proportional hazards model revealed that elderly, male patients with advanced clinical stage had a worse prognosis (2,15), whereas receiving chemotherapy, radiotherapy, or surgery could prolong OS. At present, the standard treatment of PSC is controversial (5,9). Surgery in the early-stage PSC has been demonstrated to provide optimal OS benefit, but a high risk of recurrence and adjuvant chemotherapy should be considered (2,5,6,15). Previous studies indicated that adjuvant chemotherapy after surgery was effective in improving survival outcomes (15,16), and large population-based studies also revealed the benefit of chemotherapy (7). On the contrary, in a study on 69 patients with PSC, Liang et al. reported that adjuvant chemotherapy could not improve OS (5). PSC was a chemorefractory cancer in previous studies (9,11,17). Patients with PSC who received firstline platinum-based chemotherapy did not experience significant benefits (mOS: 7.0 months vs 5.3 months, P = 0.096) (18). Few studies on the effect of radiotherapy on the prognosis of PSC and a small-sample study reported that patients who received radiotherapy had a worse mOS (5.0 m vs 6.0 m, P < 0.001) (6,19). However, a retrospective study based on SEER showed that radiotherapy improved the survival in stage I-III patients with PSC (8). In this multivariate analysis, both chemotherapy (HR 0.46, P < 0.001) and radiotherapy (HR 0.75, P = 0.005) were protective factors and improved the survival. By far, data showing that chemotherapy or radiotherapy prolongs survival is insufficient, and requires further prospective research.
In this study, six features, including age, sex, TNM clinical stage, chemotherapy, radiotherapy, and surgery, were used to construct a nomogram prognostic model. The total score was calculated using the quantitative score of each feature, and the 1year survival rate was predicted scientifically and accurately. The C-indexs of this model were 0.775 and 0.790 in internal and external validation, respectively, indicating good agreement between predicted and actual 1-year survival. Next, the nomogram model was compared with the conventional TNM clinical stage and found to be superior in the 1-year survival of AUC both in the training set and testing set, indicating a more comprehensive and accurate prediction. This model could be used to individualize prognostic assessment and might serve an effective diagnostic tool for making treatment-related decisions (37,38).
This study had several limitations. First, the variables enrolled were restricted; some important characteristics related to prognosis were not included in this study, such as smoking status, performance status score, gene mutation detection by next generation sequencing, MET expression detection, and PD-L1 immunohistochemistry assay. Second, treatment information was limited, without target therapy, immunotherapy, etc. The database only contained the status of surgery, chemotherapy, and radiotherapy, but some of which were not known. Third, the construction of this model was based on retrospective data and requires further confirmation.
In conclusion, the incidence of PSC was slowly decreased, and for pathological subgroups, the incidence of carcinosarcoma was increased. PSC had a significantly poor prognosis compared with oNSCLC. The nomogram constructed in this study accurately predicted the prognosis of PSC, and performed better than the TNM clinical stage. This model is expected to help pathologist and oncologist in designing clinical strategies.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: The SEER 18 registries (www.seer.cancer.gov) of the National Cancer Institute using the SEER*Stat software (SEER*Stat 8.3.6).

AUTHOR CONTRIBUTIONS
JS, MC, and QY designed the study. MC, QY, and ZX contributed to data selection and assembly. BL, FL, and YY analysed data. MC and QY were involved in drafting the manuscript. JS critically revised the manuscript. All authors contributed to the article and approved the submitted version.