Radiomics Signature Facilitates Organ-Saving Strategy in Patients With Esophageal Squamous Cell Cancer Receiving Neoadjuvant Chemoradiotherapy

After neoadjuvant chemoradiotherapy (NCRT) in locally advanced esophageal squamous cell cancer (ESCC), roughly 40% of the patients may achieve pathologic complete response (pCR). Those patients may benefit from organ-saving strategy if the probability of pCR could be correctly identified before esophagectomy. A reliable approach to predict pathological response allows future studies to investigate individualized treatment plans. Method All eligible patients treated in our center from June 2012 to June 2019 were retrospectively collected. Radiomics features extracted from pre-/post-NCRT CT images were selected by univariate logistic and LASSO regression. A radiomics signature (RS) developed with selected features was combined with clinical variables to construct RS+clinical model with multivariate logistic regression, which was internally validated by bootstrapping. Performance and clinical usefulness of RS+clinical model were assessed by receiver operating characteristic (ROC) curves and decision curve analysis, respectively. Results Among the 121 eligible patients, 51 achieved pCR (42.1%) after NCRT. Eighteen radiomics features were selected and incorporated into RS. The RS+clinical model has improved prediction performance for pCR compared with the clinical model (corrected area under the ROC curve, 0.84 vs. 0.70). At the 60% probability threshold cutoff (i.e., the patient would opt for observation if his probability of pCR was >60%), net 13% surgeries could be avoided by RS+clinical model, equivalent to implementing organ-saving strategy in 31.37% of the 51 true-pCR cases. Conclusion The model built with CT radiomics features and clinical variables shows the potential of predicting pCR after NCRT; it provides significant clinical benefit in identifying qualified patients to receive individualized organ-saving treatment plans.


INTRODUCTION
Neoadjuvant chemoradiotherapy (NCRT) followed by esophagectomy has significantly improved the survival of resectable locally advanced esophageal cancer compared with surgery alone and has been established as the standard treatment (1, 2). Although the response to NCRT varies among patients, the pathologic complete response (pCR) rate can be as high as 43.2% in esophageal squamous cell carcinoma (ESCC) and 27% in esophageal adenocarcinoma (1-4). For patients who achieve pCR after NCRT, individualized organ-saving strategies such as active surveillance or definitive chemoradiation are recently being explored as an alternative treatment option to surgery, considering the relatively high postoperative complication rate (~65%) and mortality rate (~4-10%) depending on different centers (5,6), as well as the decline in health related quality of life after esophagectomy (7)(8)(9)(10). However, pCR could only be confirmed by histologic assessment of surgical specimens. A reliable means independent from surgical specimen evaluation is required to identify the complete responders that could potentially spare surgery. Current recommended approaches for NCRT response assessment include pathologic evaluation of endoscopic biopsy and 18 FDG-PET that usually involves setting a cutoff value of SUV reduction to discriminate between pCR and non-pCR patients. However, those approaches are not accurate enough to identify pCR patients; thus, some non-pCR patients might be falsely diagnosed as complete responders and inappropriately arranged for surgery omission (11). So far, no biological or radiological marker has been used for guiding the comprehensive esophagus-preserving treatment modality in locally advanced esophageal cancer.
Radiomics is the high-throughput extraction of a large amount of image features (density, grey level heterogeneity, shape, etc.) from radiographic images that are promising in revealing the underlying proteo-genomic and phenotypic information of solid tumors (12). While the histopathologic analysis of biopsy specimens might fail to represent the whole tumor due to the spatial heterogeneities, radiomics is able to profile these heterogeneities and serves as a bridge between tumor genomics and phenotypes. Some radiomics features have been proved to correspond to the gene expression profile and are useful in predicting cancer prognosis and therapeutic response (13). Radiomics features extracted from 18 FDG-PET images combined with clinical information was reported to have decent discriminatory accuracy in predicting pCR in post-NCRT esophageal tumors with AUC (area under the receiver operating characteristic curve) of 0.81 (14). However, the investigation was performed mainly for tumors of gastroesophageal junction or esophageal adenocarcinoma, and the conclusion could not be extended to ESCC, the type that predominates in Asian countries. Therefore, we aim to develop a CT radiomics based model to predict tumor response to NCRT in ESCC and assess its value in organ-saving decision making.

Patients
This retrospective study was approved by the institutional review board of Shanghai Chest Hospital; the requirement for informed consent was waived. Consecutive patients with stage T2-4aN+/-M0 esophageal cancer who received NCRT followed by esophagectomy in Shanghai Chest Hospital from June 2012 to June 2019 were extracted from the hospital database. Patients are only eligible for inclusion if they (i) had histopathologically confirmed ESCC; (ii) had contrast-enhanced CT scans within 3 weeks before NCRT and within 3-8 weeks after NCRT. Patients were excluded if (i) the chemoradiation was done outside Shanghai Chest Hospital, and the treatment details were missing; (ii) delivered radiation dose was less than 40 Gy or more than 50.4Gy; (iii) surgery was done within less than 4 weeks or more than 10 weeks after NCRT-indicating urgent and salvage resections, respectively (2, 3).

Histopathological Assessment
Surgically resected specimens were sent for histopathological assessment by an experienced pathologist and reviewed by another specialized thoracic cancer pathologist. Pathologic complete response (pCR) was defined as the absence of microscopically viable cancer cells in the primary tumor, as opposed to any grade of residual carcinoma (Non-pCR). Evaluation of lymph node metastasis was excluded because radiomics analysis is unreliable when performed on small lesions, and thus only the primary tumor would be involved in the image analysis (3).

Clinical Variable Collection
Demographic information and radiologic test results from CT, EUS (endoscopic ultrasound), and esophagogram were collected as clinical variables. Clinical T stage and lymph node status (N+/ N-) were evaluated by EUS and CT complementarily. dThickness% was calculated as the maximum tumor thickness reduction after NCRT divided by baseline maximum tumor thickness on pre-NCRT CT. Tumor adventitia type was evaluated by CT and classified as smooth or not smooth (tumor outer membrane is coarse or nodular) (15). Esophagogram esophageal cancer gross type was classified as 4 types according to Japan Esophageal Society described as following: type 1: protruding type; type 2: ulcerative and localized type; type 3: ulcerative and infiltrative type; type 4: diffusely infiltrative type (16,17). Pre-Dmin and post-Dmin refer to the esophageal minimum diameter on esophagogram before and after NCRT, respectively. dDmin% was defined as the increase of esophageal minimum diameter on esophagogram after NCRT divided by pre-Dmin. The difference of clinical variables between pCR and non-pCR cohorts was analyzed using Chi-squared test or Student t-test, and only the significant clinical variables were selected for further analysis.

Delineation of Regions of Interest
Contrast-enhanced chest CT images were acquired with a variety of CT scanners according to standard clinical scanning protocols (120kV/140kV, 140~300mA, and slice thickness of 5 mm). All images were reconstructed with the standard reconstruction kernel. The regions of interest (ROIs) were manually delineated on Pinnacle 9.1 system (Philips, Fitchburg, WI) by two expert radiation oncologists, referring to complementary materials such as 18 FDG-PET/CT, barium esophagogram, and esophagoscopy reports. The pre-NCRT ROI was contoured on the pre-NCRT CT images to cover the primary esophageal tumor only. The post-NCRT CT images of each patient were then registered with the corresponding pre-NCRT images, and the contour of the pre-NCRT ROI was projected onto the post-NCRT images. The post-NCRT ROI was manually adjusted from the pre-NCRT ROI to compensate for the circumferential tumor shrinkage after treatment, keeping the craniocaudal length unchanged.

Radiomics Feature Extraction
Radiomics features were extracted using the open infrastructure quantitative image software IBEX (18). A total of 135 radiomics features were extracted from both pre-NCRT and post-NCRT CT images, respectively, including 18 shape and size based features, 52 first order statistic features, and 65 second order features (Supplementary Material 1).
For each of these radiomics features, d-NCRT feature was calculated as the post-NCRT radiomics feature value subtracting the corresponding pre-NCRT one, producing 135 d-NCRT features. Therefore, a total of 405 features would be extracted for each patient.

Feature Reproducibility Evaluation
To assess the inter-observer reproducibility of radiomics features, the pre-NCRT CT images of the first 10 consecutive patients were used, each contoured by another two experienced thoracic cancer radiation oncologist in a blinded fashion. The intraclass correlation coefficient (ICC) was calculated for the feature robustness ranking. The coefficients were interpreted as follows: 0.81 to 1.00: almost perfect agreement; 0.61 to 0.80: substantial agreement; 0.41 to 0.60: moderate agreement; 0.21 to 0.40: fair agreement; 0 to 0.20: poor or no agreement. The feature stability was also validated in test-retest setting using RIDER dataset from The Cancer Imaging Archive (TCIA), which contains two sets of CT scans taken 15 min apart for each of the 31 NSCLC patients. The repeatability in test-retest was evaluated by concordance correlation coefficient (CCC). The radiomics features with both ICCs above 0.4 in inter-observer test and CCCs above 0.75 in test-retest were selected for further analysis (19,20).

Radiomics Feature Selection
Radiomics feature selection was performed in two steps. Robust features selected from reproducibility analysis were first tested by univariate logistic regression with a cutoff p-value of 0.157 according to Wilks' theorem and Akaike Information Criterion requiring c 2 >2 df, where df is degrees of freedom (14). The significant features were then introduced into a regularized multivariate logistic regression with the least absolute shrinkage and selection operator (LASSO) penalty, which shrinks the estimates of regression coefficients and excludes variables by forcing certain coefficients to become 0. The purpose of this shrinkage is to prevent overfitting due to either collinearity of the covariates or high-dimensionality (21). A radiomics signature (RS) was constructed through linear combination of the selected radiomics features weighted by their coefficients in LASSO regression. Student t-test was performed to evaluate the mean difference of RS between pCR and non-pCR cohorts.

Model Development and Statistical Analysis
Two multivariate logistic regression models were constructed to study the value of clinical variables alone (clinical model) and the added value of radiomics signature (RS+clinical model), for the prediction of pCR. The flowchart of the model development process is attached in the Supplementary Materials.
The goodness-of-fit of each model was assessed by Nagelkerke R 2 , Akaike Information Criterion (AIC), and Brier score. The lower the AIC value and Brier score are, the better the model fits: for a binary outcome, the Brier score ranges from 0 for a perfect model to 0.25 for an unsatisfying model (22). On the contrary, higher Nagelkerke R 2 indicates better calibration. Model calibration was visualized by the calibration plot. Discriminative ability of the models was evaluated by area under the receiver operating characteristic (ROC) curve (AUC).
Considering the traditional accuracy metrics, such as AUC, have limited value for telling if an intervention could be performed on the individual patient, decision curve analysis was carried out to investigate the clinical usefulness of the prediction models by quantifying the net benefit, which is calculated as (23,24): where TP and FP refer to true positive count (i.e., true pCR) and false positive count (i.e., false pCR); n is the number of total patients; and P t is the threshold probability. Threshold probability is defined as the minimum probability of pCR above which a patient would opt for observation rather than surgery (higher probability indicates a greater chance of pCR). Finally, a nomogram incorporating the selected clinical variables and RS was generated for clinical reference.
To prevent the overestimation of the final model performance, internal validation by bootstrap resampling with 2,000 replicates was performed to correct the optimism of the model performance.
Statistical analysis was done with R (version 3.6.1) and pvalue less than 0.05 was considered significant unless stated otherwise.

Patient Characteristics and Clinical Variable Selection
A total of 121 patients with ESCC were finally included in the study with an average age of 60.9 ( ± 6.8) years and more males (88.4%) than females (11.6%).The clinical characteristics are shown in Table 1.
As shown in Table 1, older patients and those with a smooth tumor adventitia type on CT was prone to respond better to NCRT. Both post-thickness and dthickness% had significant association with pCR, which was confirmed by p-values of 0.004 from t-test, indicating that a better post-NCRT tumor regression was correlated with a higher chance of pCR. Apparent multicollinearity was found between these two features (Pearson correlation coefficient, 0.92), and dThickness% was selected over post-thickness due to its superior significance in univariate logistic test (p-value, 0.005 vs. 0.059). Furthermore, a larger post-Dmin by esophagogram, indicating a better restoration of esophageal dilatation after NCRT, was significantly associated with pCR. As a result, four significant clinical variables, including age, tumor adventitia type, dthickness%, and post-Dmin by esophagogram, were selected to enter the prediction model.

Model Development and Model Performance
RS+clinical model also demonstrated a superior discriminative performance than the clinical model (AUC: 0.87 vs. 0.73), and this advantage persisted after internal validation (corrected AUC, 0.84 and 0.70; Figure 3).

Clinical Benefit and Nomogram
Net benefits of the two models were presented in Figure 4. Net benefit in our case is interpreted as the benefit of saving esophagus for pCR patients (true positive) who are correctly identified by the prediction model to spare surgery subtracting the harm of tumor residual in non-pCR patients (false positive) who are falsely judged by the model to omit operation. The horizontal solid line represents the clinical decision of preforming esophagectomy on all patients regardless of their response to NCRT, and it serves as a reference to visualize the benefit of treatment decisions by different models. When applying the RS+clinical model, a net benefit higher than that of the clinical model could be achieved at a threshold probability above 25%.
For example, at the 60% threshold cutoff (i.e., the patient would opt for observation if his probability of pCR was >60%), the net benefit was 0% in the all-surgery scheme, 2.23% in the clinical model, and 13% in the RS+clinical model, respectively. In other words, if we make treatment decision based on the RS+ clinical model, the net benefit of 13% was equivalent to avoiding surgeries (taking organ-saving strategy) in 13 per 100 patients without an increase in the number of false-pCR predictions, which is a considerable gain compared with assuming that all patients have residual cancer and performing surgery for all patients. Overall, a total of 37 out of 121 patients (30.58%) could have been spared surgeries by RS+clinical model, while only 7 out of 70 patients (10%) with non-pCR would have been misdiagnosed.  To provide the clinician with a quantitative tool to predict individual probability of pCR, we built a nomogram based on the RS+clinical model ( Figure 5).

DISCUSSION
We developed a prediction model for pCR to NCRT in ESCC using a CT-based radiomics signature and clinical variables. The model was internally validated and presented as a nomogram, showing satisfying performance in guiding clinical decision making.
Establishing a non-surgical approach to evaluate the tumor response to NCRT is crucial for making individualized treatment plans for locally advanced esophageal cancer. Esophagectomy is an effective intervention but comes with a high postoperative complication rate of roughly 65%, high postoperative mortality rate of 4%-10%, and decreased health-related quality of life, especially physical function that would never restore to preesophagectomy levels (6,9,10,25). Patients who have an adequate response to NCRT, especially ESCC patients, of whom up to 43.2% could achieve pCR, might have a chance to spare surgery and preserve the esophagus (4).
In recent years, non-invasive radiomics analysis has been proven effective in prediction of tumor treatment response and patient survival. The underlying rationale is that tumor genetic heterogeneity will be converted to histopathological   comprehensive 18 F-FDG PET texture features with a corrected c-index of 0.77 but failed to find an incremental value in decision curve analysis. However, these studies focused primarily on esophageal adenocarcinoma, of which the tumor biologic characteristics as well as the response to NCRT are quite different from ESCC (pCR rate, 27% vs. 43.2%) (3,4). The existing CT-based radiomics study aiming to predict NCRT response for ESCC contained only a small sample size ranging from 49 to 94 and were mostly unbalanced inregards to the pCR to non-pCR ratio, moreover, the previous studies produced relatively low model effectiveness (AUC of 0.54~0.79) (26)(27)(28). The research by Hu et al. (29) proves the feasibility of using CT radiomics to predict the treatment response of esophageal squamous cell cancer after chemoradiotherapy, but they fail to include traditional clinical and imaging data in the model. In the present study, a prediction model for pCR has been developed exclusively for ESCC, with a larger sample size (n=121) and a promising discriminative performance when uniting radiomics signature with clinical variables (AUC=0.843).
Comparing to PET-based radiomics model (3,14,30,31), CT-based radiomics models have increasingly demonstrated non-inferior performance in NCRT response prediction, not only in ESCC as reported in the present study but also in other tumor types, such as rectal cancer (AUC=0.70) (32) and stage III non-small cell lung cancer (AUC=0.86) (33). Considering that CT is usually more accessible and affordable than PET for most cancer patients, it is reasonable to believe that a CT-based radiomics model is going to play an important role in NCRT response prediction and help to further personalize treatment strategies in multiple cancers. We also anticipate a robuster prediction potential if the model combines the CT and PET radiomics that we would further investigate in the future.
In our study, four clinical variables have exhibited significant association with pCR, including tumor adventitia type, dthickness% by CT, post-Dmin by esophagogram, and age. The value of tumor FIGURE 4 | Decision curve analysis. Decision curves depicting the net benefit (y-axis) of the two models at a range of probability thresholds (i.e., minimum probability of pCR above which a patient would opt for observation rather than surgery; x-axis). The yellow and blue solid lines represent making the same decision in all patients (i.e., Sparing surgery for all patients or performing surgery for all patients, respectively). The net benefit was corrected by internal validation of 2,000-replicate bootstrap. thickness derived parameters (percentage decrease, pre-or post-NCRT maximum tumor thickness, etc.) and the tumor outer membrane type in prediction of response to preoperative treatments has been investigated in previous studies (15,34), but inconsistent conclusions were drawn. According to the study by Chee et al., the minimum luminal width on esophagogram has only moderate effectiveness in evaluating the tumor neoadjuvant treatment response when applied as a single predictive parameter (35). The limited usefulness of tumor thickness on CT and luminal width on esophagogram could be possibly explained by the bulking effect of necrotic and fibrotic tissues after neoadjuvant treatment, which results in the persistent abnormality on imaging tests. Radiomics is complementary to the traditional imaging parameters with its advantage to detect the heterogeneity within tissues, which makes it possible to improve the model performance in tumor response prediction. Interestingly, age was turned out to be related to the pCR status in our study with an OR of 1.08 (1.02, 1.16), indicating 1.08 times increase in the odds of pCR with per year increment in age. A similar finding was reported by Vandendorpe et al. (32) stating that age achieved an OR of 1.05 (1.00-1.10) in a model to evaluate the clinical downstaging of post-NCRT colorectal cancer. The potential biological or socioeconomical causes behind this finding need to be further investigated.
The RS+clinical model exhibits the potential to categorize patients with different response to NCRT, according to which the treatment plan could be tailored to the individual situation. Patients who were predicted to have residual cancer will continue to receive esophagectomy. For those who are "radiomicly-determined" as potential pCR, surgery could be withheld and the organ-saving strategy could be taken, such as boosting the dose of radiotherapy to the definitive level or close surveillance (salvage surgery if necessary) after chemoradiation. Decision curve analysis proves that at a given threshold probability, using RS+clinical model to evaluate treatment response provides more clinical benefit than both clinical model-based strategy and all-surgery scheme. At the 60% threshold cutoff, net 13% surgeries could be avoided without an increase in the number of missed residual cancer by RS+ clinical model. In other words, the correct pCR prediction of RS+ clinical model would lead to a net reduction of 16 avoidable surgeries in the 121 patients of our research cohort, equivalent to performing organ-saving strategy in 31.37% of the 51 true-pCR cases. The threshold probability is not necessarily fixed at 0.6 in clinical practice and can be adjusted according to the patient's individualized willingness to omit surgery. When it's set to a stricter number higher than 0.6, the misdiagnosis rate will accordingly decline so the patient can take on less risk of tumor residue, though fewer patients can benefit from organsaving treatment at the same time. Therefore, a balance needs to be struck between gaining net benefit and reducing misdiagnosis rate when determining the threshold probability.
When implementing organ-saving strategies, boosting the radiation dose might be a solution to reduce the potential risk of cancer recurrence in false-pCR patients, as supported by the results of several studies indicating that definitive chemoradiotherapy and trimodality treatment (NCRT followed by surgery) lead to similar survival outcome but the former accompanies with significantly lower treatment-related mortality rate (0.8%-3.5% vs. 9.3%-12.8%) (7,36,37). Close surveillance with necessary salvage esophagectomy has also been indicated feasible by previous studies. For example, Markar et al. (38) retrospectively analyzed 848 patients undergoing planned surgery after NCRT or salvage surgery after definitive chemoradiotherapy and found no significant difference in longterm survival as well as comparable short-term outcomes in selected patients at experienced centers. The ongoing prospective SANO trial and ESOSTRATE trial are investigating if active surveillance and surgery as needed after NCRT leads to non-inferior survival than standard esophagectomy (8,39). If so, patients with an adequate response to NCRT identified by prediction models like the one presented in our study will be able to receive organ-saving treatments as a standard of care.
Several limitations apply to our study. First of all, this was a retrospective study with a relatively small study cohort, where division of training and testing set might cause bias, so the performance was corrected by internal validation of bootstrap. However, our study can be regarded as an exploratory effort that offers a theory foundation for future external validation on a larger scale. Second, previous studies included histopathologic grading of endoscopic biopsy in clinical variable analysis (3), but pre-NCRT biopsy specimens were only available in less than 1/3 patients of our cohort (most of which was taken outside our institution), so histopathologic grading was not included in our study. Third, PET parameters were not included in this retrospective study because only a small proportion of the patients received pre-NCRT or post-NCRT PETCT scan; however, we believe the additive value of PET will lead to the better performance of the predictive model, which we will explore in the future.

CONCLUSION
We proposed a handy CT radiomics based model with satisfying performance to discriminate post-NCRT pCR patients from non-pCR ones. Clinical benefits introduced by the model may potentially facilitate individualized organ-preservation strategies on ESCC patients who have an adequate response to NCRT.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Shanghai Chest Hospital Institutional Review Board. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.