Young-Onset Early Colorectal Cancer Had Similar Relative Survival to but Better Overall Survival Than Conventional Early Colorectal Cancer: A Large Population-Based Study

Background: There existed limited evidence about prognosis of young-onset early colorectal cancer (ECRC). In the present study, we aimed to compare prognosis between patients with young-onset ECRCs and patients with conventional ECRCs. Method: Patients with surgically resected, histologically diagnosed ECRCs were retrieved from the Surveillance, Epidemiology, and End Results (SEER) database. Young-onset ECRC was defined as ECRC occurring in patients aged <50 years. Five-years relative survival was calculated at the time of diagnosed year and linear regression was performed to analyze the association between 5-years relative survival and age. The multivariate Cox regression, multivariate competing risk model, and propensity score matching (PSM) and univariate analysis weighted by the inverse probability of treatment weight (IPTW) were used to compare overall survival (OS) between young-onset ECRCs and conventional ECRCs. Results: A total of 51,197 ECRCs were retrieved from SEER database, including 4,634 young-onset ECRCs and 46,563 conventional ECRCs. Five-years relative survival was found to be moderately associated with different age groups (R = −0.725, P = 0.0034). Patients with young-onset ECRCs (96.7%) had similar 5-years relative survival compared with conventional ECRCs (96.3%). However, multivariate Cox regression [HR (hazard ratio), 0.18; 95% CI: 0.16–0.20; P < 0.001] showed better OS in young-onset ECRCs. After PSM, we still found favored prognosis for young-onset ECRCs under univariate Cox regression (HR, 0.18; 95% CI: 0.16–0.21; P < 0.001). Similar results could also be found in the univariate Cox regression weighted by IPTW (HR, 0.17; 95% CI: 0.17–0.18; P < 0.001). Conclusions: Patients with young-onset ECRCs had similar relative survival but better OS compared with conventional ECRCs.


INTRODUCTION
According to the newest global cancer statistics, colorectal cancer (CRC) is considered the second most common cause of cancerrelated death worldwide (1). Although it is well-established that the incidence of CRC increases with age, with ∼90% CRCs occurring in individuals over 50 years old (2), recent studies showed increased incidence trend of CRC in young-onset groups (3)(4)(5). This increased incidence of CRC in young individuals has been observed due to effective screening for CRC in individuals under 50 years old (3)(4)(5).
Limited evidence has shown unique etiology and biology of young-onset CRCs (6,7). Young-onset CRC presents at more advanced stage and with more aggressive clinicopathological characteristics (3)(4)(5). Meanwhile, young-onset CRCs have shown higher rate of NRAS and PTEN mutations and less rate of BRAF mutation (8). Due to the abovementioned difference in etiology and biology for young-onset CRCs, multiple studies have explored whether more attention is needed for management of young-onset CRCs with controversy results (9)(10)(11)(12)(13)(14)(15). Some studies showed favored survival in young-onset CRCs even with more advanced stage and more aggressive behavior (2,9,10,14), while some studies found equivalent survival or even poorer survival for young-onset CRCs (10,11,15).
Early CRC (ECRC) is defined as CRC confined in the mucosa or submucosa regardless of lymph node metastasis (LNM). With advanced development of endoscopic technology, some cases of ECRCs without LNM could be managed with endoscopic resection. Although controversial conclusions about survival of young-onset CRCs were demonstrated, it was demonstrated that young-onset CRCs at early stage may have equivalent survival or even better survival than conventional CRCs (9,10,13,16), indicating that young-onset ECRCs could be managed the same way as conventional ECRCs. However, no study to date have ever been performed to comprehensively analyze whether young-onset ECRCs really have better survival than conventional ECRCs in large population diagnosed ECRCs.
In this study, we would analyze whether there existed difference in cause-specific survival (CSS) and overall survival (OS) between young-onset ECRCs and conventional ECRCs in a large population. Relative survival is also used to adjust for changes in survival in population. Meanwhile, additional propensity score matching (PSM) and the inverse probability of treatment weight (IPTW) would be used to adjust for potential confounding factors.

Patients
In this study, using private Surveillance, Epidemiology, and End Results (SEER) ID (zhangqw), we included patients with ECRCs from the SEER database. Since the SEER database is a publicly available database, no informed consent from patients, or institutional review was required for this study.
Since most guidelines recommend CRC screening begin in average-risk population aged ≥50 years old (17)(18)(19) and it is wellestablished in the literature that 90% of CRCs occurred in patients over 50 years old (2), we defined ECRC occurring in patients aged <50 years old as young-onset ECRC and ECRC occurring in patients aged ≥50 years old as conventional ECRC. Considering there existed a different definition for young-onset CRC with cutoff age of 50 years (9,12,13) or 45 years (11,14) or 40 years (10,20), we also did sensitivity analyses in population using 45 and 40 years as cutoff age for young-onset CRC.

Variables and Outcomes
In the present study, ECRCs were classified as young-onset ECRC and conventional ECRC with cutoff age of 50 as definition. Race of patients was recorded as white, black, or others (mainly including American Indian, Asian, and Pacific Islander). Sex was recorded as male or female. Location of primary tumor was mainly classified into three different sites: colon of left side, colon of right side, and rectum. Colon of right side consisted of appendix, cecum, ascending colon, and hepatic flexure, and colon of left side consisted of transverse colon, splenic flexure, descending colon, and sigmoid colon. Rectum consisted of rectum and rectosigmoid junction. In this study, grade I (welldifferentiated) and grade II (moderately differentiated) were recoded as low grade and grade III (poorly differentiated) and grade IV (undifferentiated) were recoded as high grade. With respect to the tumor size, CRCs were classified into four groups: ≤2, ≤3, ≤5, and >5 cm. According to the invasion depth of ECRCs, they were coded as mucosa and submucosa. Since the number of adequate retrieved lymph nodes as 12 was recommended for staging of CRC and associated with survival of CRC, we also divided number of retrieved lymph nodes of ECRCs into two groups: 1-11 and no <12.
Survival time was defined as the time from diagnosis to the date of death or last contact or November 2016. Relative survival and OS were the primary outcome. Briefly, relative survival was defined as the ratio of observed survival rate of cancer patients to the expected survival rate of the matched general population and OS was defined as death regardless of causes. CSS was the second primary outcome that defined as death due to ECRC. When using competing risk model, death was classified into two groups: death due to ECRCs and death not related to ECRCs. Patients who were still alive were censored at the date of last contact.

Statistical Analysis
For descriptive statistics, the absolute number with proportion for categorical variable, mean, and standard deviation for continuous variable with Gaussian distribution, and median with interquartile range (IQR) for continuous variable with nonnormally distribution were used, respectively. The chi-square test for categorical variable, Student t-test for continuous variable with Gaussian distribution, and the non-parametric Kruskal-Wallis rank sum test for continuous variable with non-normally distributed data or ordinal categorical variable were used for comparisons among different patient groups, respectively.
The CSS was estimated by Kaplan-Meier curves, where patients who had death not related to ECRC or were still alive were considered censored. Log-rank test was used to analyze the differences between them. For competing risk model, the outcome of interest was defined as ECRC-specific death and death not related to ECRC was considered as a competing risk. The patients who were still alive were censored. Cumulative incidence function for ECRC-specific death was conducted, considering death not related to ECRC as a competing risk death. Nelson-Aalen cumulative risk curves of cumulative incidence of ECRC-specific death were also performed and Gray's test was applied.
The expected survival rate for the general population was obtained from SEER and the 5-years relative survival was estimated by using Ederer II method. Meanwhile, linear regression was performed to determine the correlation between 5-years relative survival and different age groups (21).
For extracted patients, each unknown value was imputed by a separate model based on the fully conditional specification (22). Multivariate Cox regression was performed to explore potential risk factors for poor OS or CSS and results were presented with hazard ratios (HRs) and 95% CIs. Meanwhile, Fine and Gray's competing risk regression was also used to determine the potential risk factors associated with ECRC-specific death, with results presented by sub-distribution hazard ratios (SHRs) and 95% CIs.
To adjust for potential imbalance between young-onset ECRCs and conventional ECRCs, we used PSM to get a new database with matched ECRCs for analysis. PSM was performed as follows: first, propensity scores were calculated for every patient with ECRCs using CRC type (young-onset ECRC or conventional ECRC) as the outcome in the multivariate logistic regression model and all potential confounding factors in Table 1 as covariates. Second, we matched young-onset ECRCs with conventional ECRCs using 1:1 matching criterion with a caliper of 0.05. Third, we calculated standard difference (SD) of all clinical characteristics between young-onset ECRCs and conventional ECRCs with a cutoff of SD as 0.1 indicating wellbalanced, indicating that these clinical characteristics between young-onset ECRCs and conventional ECRCs were wellbalanced. Fourth, we performed the Kaplan-Meier survival analyses with log-rank test and Nelson-Aalen cumulative risk curves with Gray's test to compare OS or CSS between youngonset ECRCs with conventional ECRCs. Besides, we also calculated HR or SHR of ECRC type on OS or CSS using univariate Cox regression model or univariate Gray's competing risk regression model.
IPTW is a way that allows one to obtain unbiased estimates of average treatment effects based on propensity score. Compared with PSM, IPTW can conduct data statistics without loss of sample size. Based on propensity score, we calculated the IPTW for each patient (23). SD of all clinical characteristics between young-onset ECRCs and conventional ECRCs were calculated after weighting the full cohort by IPTW with <0.1 indicating well-balanced. The unmatched univariable analysis based on the Kaplan-Meier estimator of IPTW was firstly performed. Meanwhile, univariate Cox regression model for CSS or OS or univariate Gray's competing risk regression model for CSS weighted by IPTW was performed to compare survival between young-onset ECRCs with conventional ECRCs.
Sensitivity analyses were also performed. We did sensitivity analyses in the unmatched population using 45 and 40 years as cutoff age for young-onset CRC since different cutoff ages were used to define young-onset CRC. We also did sensitivity analyses in the unmatched population after excluding patients older than 70 or 60 years old. To examine whether multiple imputation was proper in this study, we also compared survival between young-onset ECRCs and conventional ECRCs in the unmatched population without multiple imputation for unknown values. In the matched population using PSM, we did 1:2 or 1:3 matching or 1:1 matching in the cohort without multiple imputation for unknown values to test stability of our results.
All statistical analyses were carried out by using R statistical software (version 3.5.0) and two-sided P < 0.05 were considered statistically significant.

Clinical Characteristics of Patients With ECRCs
According to the inclusion and exclusion criteria (Figure 1) Results also demonstrated that patients with young-onset ECRCs had more lymph nodes examination than conventional ECRCs. The detailed clinical characteristics of young-onset ECRCs and conventional ECRCs were described in Supplementary Table 1 as young-onset ECRCs were more likely to occur in the left side and rectum, with higher LNM and deeper invasion depth compared with conventional ECRCs.

Comparison of 5-Years Relative Survival Among Young-Onset ECRCs and Conventional ECRCs
We calculated the 5-years relative survival rate of different age groups of ECRCs with a 5-years interval and linear regression  Figure 2 and Table 2).

Comparison of Survival Among Young-Onset ECRCs and Conventional ECRCs in the Unmatched Cohort
To compare survival among young-onset ECRCs and conventional ECRCs, we firstly compared 5-years CSS rate, 10-years survival rate, and CSS curves of two types. We found that patients with young-onset ECRCs had a 5-years survival rate of 97.4% and a 10-years survival rate of 95.1%, which was higher than patients with conventional ECRCs (Figure 2A, P < 0.001). The 5-years and 10-years survival rate of patients with conventional ECRCs were 94.8 and 91.6%, respectively. The rest of the CSS rate for different time was depicted in Supplementary Table 2.
We also calculated and compared OS rate among patients with young-onset ECRCs and patients with conventional ECRCs. As is shown in Supplementary Figure 1A and Supplementary Table 3, patients with young-onset ECRCs had higher 5 and 10-years OS rate than conventional ECRCs. The same results (Supplementary Figure 3A) could be obtained in the comparison of cumulative probability of cancer-specific death using competing risk model.
Unknown values of confounding factors were interpolated using multiple imputation and clinical pathological baseline is shown in Table 1. Then, the multivariate Cox regression model was performed to investigate whether young-onset ECRCs had better CSS than conventional ECRCs after adjusting potential risk factors for CSS. As is shown in Figure 3, results showed that patients with young-onset ECRCs had better CSS than conventional ECRCs in multivariate Cox regression model (HR: 0.45, 95% CI: 0.38-0.52, P < 0.001). We also compared OS and CSS under competing risk model in the multivariate models and results showed that young-onset ECRCs still had better OS (Supplementary Figure 4, HR: 0.18, 95% CI: 0.16-0.20, P < 0.001) and lower cumulative probability of cancerspecific death (Supplementary Figure 5, SHR: 0.51, 95% CI: 0.44-0.60, P < 0.001) than conventional ECRCs. Furthermore, we divided ECRCs into three subtypes: left side only, right side only, and rectum only. The multivariate Cox regression was performed, respectively, to explore whether the location of ECRCs could have a different effect on CSS between these two groups. The results also demonstrated that youngonset ECRCs had better CSS than conventional ECRCs in left side type (Supplementary Table 4, HR: 0.373, 95% CI: 0.285-0.487, P < 0.001), right side type (Supplementary Table 5 HR: 0.488, 95% CI: 0.337-0.707, P < 0.001) and rectum type (Supplementary Table 6 HR: 0.504, 95% CI: 0.403-0.632, P < 0.001).
To investigate whether different age cutoffs defined for youngonset ECRCs could influence CSS among these two groups, we selected 45 and 40 years old for cutoff, respectively, for sensitivity analysis. The multivariate Cox regression model (Figure 4)  (Supplementary Figures 6, 7).

Comparison of Survival Among Young-Onset ECRCs and Conventional ECRCs in the Matched Cohort
Using PSM, we matched 4,634 young-onset ECRC patients with 4,634 conventional ECRC patients. SD calculated before and after matching showed that all confounding factors were balanced after matching ( Table 1 and Supplementary Figure 8). Supplementary Tables 2, 3, patients with youngonset ECRCs had higher CSS (Figure 2B   The univariate model of 1:2 PSM, 1:3 PSM in the cohorts, or 1:1 PSM in the cohort without multiple imputation for unknown values still identified favored CSS, OS, or CSS under competing risk model in the young-onset ECRCs (Figure 4 and Supplementary Figures 6, 7). Figure 2C displays the results of IPTW-adjusted analysis of CSS between patients with young-onset ECRCs and patients with conventional ECRCs. Young-onset ECRCs had higher CSS (5 years: 97.5%, 10 years: 92.2%) than conventional ECRCs (Supplementary Table 2 Table 3 and Supplementary Figures 1, 3).

Comparison of Survival Among Young-Onset ECRCs and Conventional ECRCs Using IPTW Adjusted Analysis
The univariate Cox regression weighted by IPTW showed similar results that young-onset ECRCs favored better CSS (Figure 4, HR: 0.43, 95% CI: 0.41-0.46, P < 0.001) than conventional ECRCs. Similar results could be obtained using univariate Cox regression weighted by IPTW with OS and CSS under competing risk model as outcomes, respectively (Supplementary Figures 5, 7).

DISCUSSION
To the best of our knowledge, the present study was the first study to compare survival of young-onset ECRCs with conventional ECRCs comprehensively. Results of the multivariate Cox regression model showed that patients with young-onset ECRCs had better OS and better CSS than conventional ECRCs, which was consistent with results from univariate analysis after PSM or univariate Cox regression weighted by IPTW. However, patients with young-onset ECRCs and conventional ECRCs showed similar 5-years relative survival.
In this study, young-onset ECRC was defined as ECRC occurring in patients aged <50 years. It should be acknowledged that definition of young-onset CRC varied in different studies with some studies using a cutoff age of 50 years (9,12,13) for definition or 45 years (11,14) or 40 years (10,20) for definition. Though the age cutoff used in the present study was somewhat arbitrary, the cutoff of 50 was selected based on the fact that American, European, and Asian guidelines all recommended CRC screening begin in average-risk population aged ≥50 years old (17)(18)(19). Consistent with reports from most studies (2, 9-11, 13-16, 20), even diagnosed at early stage, young-onset CRC was more likely to demonstrate higher rate of LNM, submucosal invasion, and location in left-side colon or rectum (Table 1), which proved our rationality of definition for young-onset CRC to some extent. Besides, we also did sensitivity analyses by using different cutoff ages for definition for young-onset CRC and results were also consistent with our primary results.
Some studies also compared prognosis of young-onset CRCs with that of conventional CRCs in subgroups of T1 stage CRCs (9,10,13,16). Studies found lower 5-years CSS (10, 13), OS (16), and disease-free survival (13) for young-onset T1 CRCs but no statistical significance were found in these studies. Only one study analyzed OS in localized CRC including T1 CRCs with results indicating that young-onset T1 CRCs may have better survival than conventional CRCs still without significant difference (9). Therefore, to explore whether young-onset T1 CRCs had comparable prognosis than conventional T1 CRCs, we performed this comprehensive analysis using multivariate regression model, univariate regression model after PSM, and univariate regression model weighted by IPTW. However, different from results found in previous studies, we found that In our study, 5-years relative survival rate of young-onset ECRCs and conventional ECRCs was 96.7%, which did not show significant difference compared with conventional ECRCs of 96.3%. Patients with conventional ECRCs had 5-years CSS of 94.8%, which was consistent with results reported in previous studies (10,13). However, patients with young-onset ECRCs had 5-years CSS of 97.4%, which was a little higher than results in previous studies (10,13). For OS, patients with young-onset ECRCs had 5-years OS of 95.7%, which was higher than results (93.3%) in a previous study (16). Five-years OS was 82.2% for conventional ECRCs, which was lower than the reported 5-years OS of 94.9% (16). Some reasons may explain our results. First, OS and CSS could not eliminate the effect of frailty of patients caused by age so that the results contradict the relative survival rate. Second, ECRC was defined as CRC confined to the mucosa or submucosa regardless of LNM. Therefore, we also included mucosal CRC (Tis) in our study, which was excluded for analysis in other studies (9,10,13), causing different results from other studies. Third, only colon or rectal cancers were included in all studies (10,13,16), but all CRCs were included for analysis in the present study, which may also cause different results.
Some limitations should be discussed in the present study. First, values of analyzed confounding factors missed for some cases, which may cause some bias in the present study. However, multiple imputation was performed to impute missed values for cases before we performed the multivariate regression model. Sensitivity analysis in the cohort without multiple imputation also showed similar conclusion to the main findings, which indicated that multiple imputation technology in the present study was reasonable. Second, due to the limited information provided in the SEER database, other outcomes including cancer recurrence and progress-free survival could be analyzed in the present study. However, cancer recurrence and progress-free survival were correlated with CSS or OS. Results of CSS or OS indicated that young-onset ECRCs may also have better cancer recurrence or progress-free survival than conventional ECRCs. Besides, the competing risk model, which takes death not related to ECRCs into consideration, was also used to assess and compare prognosis of youngonset ECRCs and conventional ECRCs. Third, some risk factors such as lymphatic vessel involvement (LVI) were missed in the present study. Actually, our aim was to compare survival between young-onset ECRCs and conventional ECRCs and LVI was a risk factor for LNM. To some extent, LNM was correlated with LVI and may not be both included in the final multivariate model.
In summary, although patients with young-onset ECRCs had higher risk of LNM, they favored better survival outcomes compared with conventional ECRCs.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://seer.cancer.gov/.

AUTHOR CONTRIBUTIONS
X-BL and J-NC: study concept and design. J-NC, Q-WZ, and Y-BP: acquisition of data. J-NC, X-TZ, and Y-BP: analysis and interpretation of data. J-NC, Q-WZ, and X-TZ: drafting of the manuscript. X-BL: critical revision of the manuscript for important and obtained funding. J-NC and Q-WW: statistical analysis.