Response Prediction to Concurrent Chemoradiotherapy in Esophageal Squamous Cell Carcinoma Using Delta-Radiomics Based on Sequential Whole-Tumor ADC Map

Purpose The purpose of this study was to investigate the association between the radiomics features (RFs) extracted from a whole-tumor ADC map during the early treatment course and response to concurrent chemoradiotherapy (cCRT) in patients with esophageal squamous cell carcinoma (ESCC). Methods Patients with ESCC who received concurrent chemoradiotherapy were enrolled in two hospitals. Whole-tumor ADC values and RFs were extracted from sequential ADC maps before treatment, after the 5th radiation, and after the 10th radiation, and the changes of ADC values and RFs were calculated as the relative difference between different time points. RFs were selected and further imported to a support vector machine classifier for building a radiomics signature. Radiomics signatures were obtained from both RFs extracted from pretreatment images and three sets of delta-RFs. Prediction models for different responders based on clinical characteristics and radiomics signatures were built up with logistic regression. Results Patients (n=76) from hospital 1 were randomly assigned to training (n=53) and internal testing set (n=23) in a ratio of 7 to 3. In addition, to further test the performance of the model, data from another institute (n=17) were assigned to the external testing set. Neither ADC values nor delta-ADC values were correlated with treatment response in the three sets. It showed a predictive effect to treatment response that the AUC values of the radiomics signature built from delta-RFs over the first 2 weeks were 0.824, 0.744, and 0.742 in the training, the internal testing, and the external testing set, respectively. Compared with the evaluated response, the performance of response prediction in the internal testing set was acceptable (p = 0.048). Conclusions The ADC map-based delta-RFs during the early course of treatment were effective to predict the response to cCRT in patients with ESCC.


INTRODUCTION
Worldwide, esophageal cancer ranks 7th in cancer incidence and 6th in mortality rate reported by the World Health Organization (1). Esophageal squamous cell carcinoma (ESCC) is one of the top ten characteristic tumors in China with a proportion over 90% in all esophageal cancer patients, while adenocarcinoma is dominant in the West (2). The 5-year survival rate for locally advanced ESCC is limited to 15-30%, and the prognosis varies from patients even with the same treatment (3,4). Though concurrent chemoradiotherapy (cCRT) has become the prevalent treatment for local advanced ESCC, individual differences for therapeutic sensitivity still exist (5). It is challenging to accurately predict the sensitivity to cCRT and distinguish the discrepancy of ESCC to assist clinical decisions.
Response Evaluation Criteria in Solid Tumors (RECIST, version1.1) is the most widely used tool for assessing solid tumor response to nonsurgical treatment. It relies on imaging techniques, such as X-ray and computed tomography (CT), providing little information about the molecule, cell, histopathology, and biology (6). Compared to these imaging modalities, magnetic resonance imaging (MRI) is able to precisely depict the histopathological layers of the esophageal wall in an ex vivo evaluation (7). Notably, diffusion-weighted imaging (DWI) with physiological information reflecting water diffusion properties of tissues is potentially practical to monitor tumor response. The apparent diffusion coefficient (ADC) and the change of ADC (delta-ADC/DADC) generated from DW-MRI showed a correlation with tumor response to treatment (8)(9)(10). Nevertheless, the utilization of the ADC value as an imaging biomarker is still controversial (11)(12)(13).
With the development of computerized image processing technology during the past decades, quantitative imaging analysis is becoming more and more popular in radiology (14). Radiomics, which extracts numeric radiologic data from medical imaging, quantitatively describes the shape, intensity, texture, and other features of target structures (15). It has been reported that various radiomics features (RFs) are associated with tumor genes, pathology, and outcome of treatment in different tumors (16)(17)(18)(19)(20). Furthermore, the change of RFs between pretreatment images and images of other time points during or after treatment, referred to as delta-radiomics, has been investigated in lung cancer and rectal cancer as promising prognostic factors (21,22). Theoretically, delta-RFs (DRFs) reflect much detailed information of changes induced by chemotherapy, radiation, and immunotherapy throughout treatment rather than just at pretreatment.
It is assumed that radiomics analysis of the ADC map is helpful to predict early treatment response to cCRT of ESCC patients. Aiming to predict response to concurrent CRT in patients with ESCC, this study developed radiomics models based on RFs and DRFs during early treatment. In addition, the prediction value of ADC and DADC values were also analyzed.

Study Population
All protocols of this prospective and observational study were approved by the Institutional Review Board (Project ID: 201404005). Patients from Shandong Cancer Hospital and Institute were randomly assigned to the training set and the internal testing set in a ratio of 7:3 to ensure an adequate sample size. The external testing set was enrolled in Anyang Tumor Hospital.
All methods were carried out following relevant guidelines and regulations. All patients provided written informed consent before enrollment. Inclusion criteria were as follows: (a) ESCC histologically proven by endoscopic biopsy; (b) no contraindications to MR examination; (c) clinical stage T3 or T4 defined on endoscopic ultrasonography, diagnostic CT scans, or 18 fluorodeoxyglucose positron emission tomography/ computed tomography ( 18 F-FDG PET/CT) scan; (d) no prior anticancer treatment; (e) no distant metastasis except lymph nodes; (f) no other primary tumor. Clinical staging was based on the tumor-node-metastasis classification of malignant tumors (UICC, Version 8th). All patients were treated with concurrent CRT. The daily fractional dose of radiotherapy was 1.8-2.0 Gy, administered 5 days a week, and the total dose was 50.4-66.0 Gy with 6-15 MV X-rays performed by intensity-modulated radiation therapy. Before the early treatment response evaluated, all patients accepted two cycles of chemotherapy concurrently during radiotherapy every 3 weeks. One of the following proposals of chemotherapy would be adopted: 1) an intravenous injection of paclitaxel (150 mg/m 2 ) on day 1 and cisplatin (75 mg/m 2 ) on days 1-3; and 2) S-1 administered orally twice daily at 80 mg/m 2 for 2 weeks and cisplatin (75 mg/m 2 ) on days 1-3.

Response Assessment
The response was assessed according to RECIST 1.1 (6). Upper gastrointestinal endoscopy, chest and abdomen CT, or wholebody 18 F-FDG PET/CT was performed 2-3 months after treatment finished to compare with the pretreatment images. The definition of chemoradiotherapy sensitivity was as follows: the sensitive group comprised patients with complete response (CR) and partial response (PR); the resistant group consisted of patients with stable disease (SD) and progression disease (PD).
The ADC map was generated automatically at the MR workspace (Philips Medical Systems Extended MR workspace, Netherlands) from DWI, which was overlayed and averaged in three directions with the b-value of 0 and 600 s/mm 2 .

ADC Value and Extraction of Original RFs
The location of the primary tumor was firstly identified on DWI (b = 600 s/mm 2 ) as areas of high signal ( Figure 2). Although it was not outlined on DWI, it assisted in accurately finding the corresponding location of the tumor in the ADC map. Regions of interest (ROIs) were manually drawn on the ADC map via a segmentation editor in the 3D Slicer software (version 4.10.0, http://www.slicer.org) (23). The definition of ROI was the region of the tumor that was of low signal in the ADC map but excluding the lumen. Every slice containing the tumor was outlined. These ROIs constituted a volume of interest (VOI) with the entire tumor visualized on the ADC map as showed in Figure 2. The mean tumor ADC value was calculated by averaging the measured ADC values of the VOI. For wholetumor radiomics feature, the Slicer Radiomics, which was based on PyRadiomics program (Revision 2.2.0) in the 3D Slicer software, automatically extracted whole-tumor RFs from the three-dimensional reconstructed VOI (24). In our practice, resampled voxel size, LoG kernel sizes, and bin width were respectively set to 1, 1, and 25. In total, 851 RFs were extracted including 18 first-order statistics features, 14 shape-based (2D and 3D) features, 24 gray-level co-occurrence matrix (GLCM) features, 16 gray-level run length matrix (GLRLM) features, 16 gray-level size zone matrix (GLSZM) features, 5 neighboring gray tone difference matrix (NGTDM) features, 14 gray-level dependence matrix (GLDM) features, and 744 wavelet-based features. Detailed classification of the extracted RFs was summarized in Supplementary Table 1. In addition, the RFs mentioned in our study were consistent with the definition of the image biomarker standardization initiative (25).

RF Selection
Selection procedures were performed on the pretreatment images with three steps.
Because DWI is susceptible to field inhomogeneities, large amounts of magnetization artifacts in the chest area, and patient movement, two consecutive DWI acquisitions of the same patient may give images with slightly different imaging characteristics, which may affect radiomics analysis. Thus, the first is to evaluate the repeatability of the radiomics features between two ADC maps in a short time by the intraclass correlation coefficient (ICC) method. Five extra patients with ESCC were enrolled before the first treatment in Shandong Cancer Hospital and Institute (Supplementary Table 2) who were not included in the training and internal testing sets. The interval between the two consecutive DWI scans was 5 minutes (Supplementary Figure 1). VOIs were delineated in the ADC map by an experienced radiologist (WL, with 20 years of experience in MRI), then the VOI was duplicated to the other ADC map from the same patient. The reliability coefficient of RFs was supposed to be higher than 0.75.
Secondly, to select RFs with high interobserver reproducibility, VOIs were delineated by two experienced radiologists (WL, appointed as reader 1; NS, appointed as reader 2 with 11 years of experience in MRI) who were blind to clinical information, treatment plan, and response about patients. Bland-Altman analysis was used to test the interobserver reproducibility of the RFs from the VOI delineated by different radiologists. The differences between the same parameters extracted from two readers' delineation were plotted against their average and reported as a percentage. The lower and upper reproducibility limits were calculated as ± 1.96 standard deviations. In our case, the mean and standard deviation of the differences were supposed to be less than 5% and 10%, respectively, as stated in related work (26,27).
Thirdly, to reduce the number of features and select the most significant features correlated with treatment response, the minimum redundancy maximum relevance (mRMR) algorithm was used to identify and rank the top 30 features. Finally, the selected RFs were applied in the training, internal, and external testing sets from the delineation of reader 1.

Change of the Mean of ADC Values and RFs
The changes of the whole-tumor ADC value between images of different time points were respectively calculated as follows: where DADC 1st week , DADC 2nd week , and DADC 2 weeks denoted the change of the ADC value within the 1st week, the 2nd week, and the first two weeks, respectively. Similarly, ADC pretreatment , ADC 5f , and ADC 10f denoted the ADC value of images acquired  before treatment, after the 5th radiation, and after the 10th radiation, respectively. The changes of the selected RFs were defined as the relative difference between each measurement as follows: where DRF 1st week , DRF 2nd week , and DRF 2 weeks denoted the change of RFs within the 1st week, 2nd week, and first two weeks, and RF pretreatment , RF 5f , and RF 10f denoted the RFs in images acquired before treatment, after the 5th radiation, and after the 10th radiation, respectively.

Construction of Prediction Model Based on Radiomics Signature
As shown in Figure 1B, RF pretreatment , DRF 1st week , DRF 2nd week , and DRF 2 weeks were used to train a support vector machine (SVM) classifier in the training set, and then a radiomics signature was built by the trained classifier and evaluated in the internal and external testing sets. The cost-based SVM classifier, which used the radial basis function kernel and a cost coefficient of 0.001, was trained by RFs with the 10-fold cross-validation. The SVM classifier transformed the feature space to a high-dimensional space where a separating hyperplane maximized the margin between classes. For each patient, the SVM classifier generated a radiomics signature for evaluating predictions of treatment response. The receiver operating characteristic (ROC) curves of obtained radiomics signatures in the training, internal, and external testing sets were analyzed and then tested by the DeLong test. The chisquare test was used to compare the predicted responses, which were corresponding to radiomics signatures in the internal testing with the signatures' ground truth, i.e., treatment response evaluated by RECIST1.1. Afterward, the eligible radiomics signatures together with clinical characteristics were imported to a multivariate logistic regression-based prediction model to predict the sensitivity to treatment. Furthermore, the performance of radiomics signatures, the ADC value, clinical characteristics, and the regression model was assessed by the ROC curve, respectively. The area under the curve (AUC) was calculated, and the optimal Youden's index was determined.

Statistical Analysis
Statistical analysis was performed using the R software (version 3.6.1, http://www.R-project.org). The ICC method, Bland-Altman method, mRMR approach, SVM classifier, logistic regression analysis, and ROC curve analysis were implemented based on the R packages "irr", "BlandAltmanLeh", "mRMRe", "e1071", and "pROC." Chi-square test or t-test was used to examine the correlations between clinical features and treatment response. The differences in ADC and DADC values were compared between the sensitive group and the resistant group by Mann-Whitney U test. P-value < 0.05 was considered as an indicator of the statistically significant difference.

Clinical Characteristics
In Shandong Cancer Hospital and Institute, eighty-five consecutive patients with ESCC were enrolled. Nine patients were excluded, given that the DWI of 6 patients were deformed or had high noise and 3 patients' radiotherapy was interrupted.

Association of Sequential ADC Value and Treatment Response
The mean values of ADC pretreatment , ADC 5f , and ADC 10f of all patients are shown in Table 2. All of the ADC values and relative changes during the first two weeks showed no significant difference between the sensitive and resistant groups in the training and internal testing sets. In the external testing set, ADC in the 5 th radiation showed association with treatment response (p = 0.048). We noticed that the ADC value was gradually higher than before treatment as the treatment progressed.

Generation and Validation of the Radiomics Signature
A total of 560 RFs (Supplementary Table 5) were selected by the ICC method from two consecutive whole-tumor ADC map acquisitions in the same patient. Afterward, 224 RFs (Supplementary Table 6) were selected from pretreatment images in the training set via the Bland-Altman method, and 30 RFs (Supplementary Table 7) were finally chosen out according to mRMR. Most of these RFs were calculated to measure the local homogeneity of the image. To predict good and bad responder, four radiomic signatures based on an SVM classifier were respectively built with RF pretreatment , DRF 1st week , DRF 2nd week , and DRF 2 weeks . The effectiveness of these 4 radiomics signatures to classify the sensitive group versus the resistant group is shown in Figure 3. Only the radiomics signature based on DRF 2 weeks , denoted as R-Signature 2 weeks , discriminated treatment response with an AUC value higher than 0.5 in the training set [AUC = 0.824, 95% confidence interval (CI): 0.679-0.968], the internal testing set (AUC = 0.744, 95% CI: 0.465-1.0), and the external testing set (AUC = 0.742, 95% CI: 0.478-0.919). No difference was found between the training and internal testing sets according to the results of the DeLong test (p = 0.580), nor between the training and external testing sets (p = 0.526). Compared with the evaluated response, the performance of response prediction in internal testing was acceptable (p = 0.048). The chi-square test of prediction results between R-Signature 2 weeks in the training set and the internal testing set achieved a p-value of 1.000, which implied that the distribution of R-Signature 2 weeks between the two sets was not statistically significantly different.

Combining Model Analysis
As elaborated in Supplementary Table 8, in the training set, the outputs of univariate logistics regression showed that tumor location and R-Signature 2 weeks were separately associated with treatment response. We performed the tumor location and Rsignature 2 weeks to the prediction model of treatment response by multivariate logistics regression (Supplementary Table 9) in the three sets. R-Signature 2 weeks was identified as an independent factor predicting treatment response in multivariate analysis in  the three sets. Figure 4 shows the results of ROC analyses on the tumor location, R-Signature 2 weeks , and the regression model in the prediction of treatment response in the training, internal, and external testing sets. Table 3 shows the effectiveness of our proposed regression model in discriminating the sensitive and resistant groups.

DISCUSSION
In this study, a radiomics signature was developed that incorporated the change of RFs based on the whole-tumor ADC map for predicting treatment response to concurrent CRT in patients with ESCC. Compared with pretreatment, our study showed that the radiomics signature based on DRF 2 weeks was able to predict the treatment response for evaluations in the following 2-3 months. DWI-MRI was widely and routinely performed in kinds of tumors for diagnosis and treatment assessment. As designed in our study, ADC values were obtained at pretreatment, after the 5th radiation, and after the 10th radiation. However, the mean value of pretreatment ADC was not significantly different between sensitive and resistant groups, which was consistent with the studies of Kozumi in patients with ESCC undergoing concurrent CRT (11). In fact, results of prediction of treatment response using ADC or DADC were not consistent among previous studies, or even the opposite (8)(9)(10)12). In our study, ADC 5f in the external testing showed association with treatment response, which was different from those in the training and internal testing sets. Since the diffusion movement of water molecules in the tissue was closely related to the distribution of the intratumoral structure, the ADC value tended to be affected by many factors, such as cell density, nuclear area, nuclearcytoplasmic ratio, tumor angiogenesis, and proteins (28)(29)(30). In addition, tissue edema commonly occurred during CRT because of changes in blood perfusion, cell death, and blocked lymphatic drainage induced by radiation and cytotoxicity, which further caused the change of the ADC value (31). We observed that the ADC value increased in both the sensitive group and the resistance group, but only one set of positive results was obtained in the external testing set. Thus, ADC pretreatment and the changes during the first ten radiations were limited in predicting the treatment response.
The mean value of ADC and DADC reflected the average level of the Brownian motion of water molecules and a rough change in the whole tumor, while radiomics offered more heterogeneity information involving the above molecular and pathologic characteristics in different regions. The rise of radiomics using quantitative image features of tumors provided an opportunity for the development of predictive biomarkers (32). Over the years, although several studies on the prediction performance of radiomics analysis to treatment response had been reported in patients with esophageal cancer, the predictive radiomics features were hard to reproduce since there were thousands of RFs and diverse selection criteria. For example, in the 18 F-FDG PET, different radiomics features were found with discriminatory capability in predicting response to CRT in patients with esophageal cancer (33)(34)(35). Many researchers are committed to the standardization and unification of radiomics due to its virtue of convenience to use (36,37). In our study, the radiomics signature was developed from 30 radiomics features from pretreatment images, and these radiomics features were selected with high interobserver reproducibility, high relevance, and low redundancy. Nevertheless, it was inadequate to predict the response merely using RFs from pretreatment images as what the single-shot image modeling did.
In our study, the selected features were extracted from a fourgray-level matrix, which was the ADC value matrix in the ADC map. Sixteen selected feature names were explained representing homogeneity or heterogeneity. The interpretation of other features was based on their calculation, including the distribution pattern of gray and the neighborhood intensity of the image. Their changes are recorded by DRFs. We proposed to train an SVM classifier by the RFs extracted from the images of four time points, including RF pretreatment , DRF 1st week , DRF 2nd week , and DRF 2 weeks . R-Signature pretreatment showed no correlation with treatment response. Instead, containing the change of whole-tumor heterogeneity over treatment, the  radiomics signature based on DRF 2 weeks predicted the treatment response with an accuracy of 0.887, 0.826, and 0.765 in the three sets, respectively. In this study, the trend of ADC change during the initial two weeks showed an increase with no correlation with treatment response. This increase was thought to reflect the reduction in cell membrane integrity and changes in tumor cell numbers due to cytotoxic drugs and radiation, resulting in less restriction on the Brownion motion of water molecules (12). As the treatment progressed, tumor cells gradually decomposed and were absorbed, and the blood vessels and fibers in the microenvironment changed together. During this time, the changes of internal structures were not synchronized due to heterogeneity including different cell-cycle, oxygen status, and proliferative potential (38,39). Our study was designed to record meaningful changes as early as possible (40). By screening radiomics features, effective predictions can be made through statistical methods. The results showed that the larger the time span, the more significant the changes obtained in radiomics features. Delta-radiomics has previously been used for the prediction of response to tumor treatment with ADC maps and CT (41,42). As radiation and reaction to cytotoxic drugs accumulated, early changes from the tumor interior were recorded and quantified by a ratio that reduced the variability of the data. Moreover, Nasief et al. predicted the treatment response for CRT of pancreatic cancer by combining CT-based delta-radiomics and increasing CA19-9 (43). In our study, esophageal tumor location was associated with treatment response in the training and external testing sets, while few studies had reported that the location of primary tumors was associated with the sensitivity of CRT in ESCC. One explanation was that biases existed in the clinical characteristic distribution in different responders with treatment due to the small sample size. Such as the result in the internal testing set, although the AUC changed, there was no increase in diagnostic performance. The other one was that the uneven dose distribution reduced the treatment effect since cervical esophagus had physiological curvature affected by the peripheral anatomical structure (44). The association of tumor locations and treatment response was to be further investigated in a larger cohort.
Our study had several limitations. Firstly, although an external testing set was included, the cohort of our study was still small. A multicenter study with a larger patient cohort is required. Secondly, though several selected procedures were used, esophageal tumors were manually delineated, which introduced an uncertain level of observer dependency. Thirdly, compared with RECIST 1.1 in the definitive cCRT, the pathological evaluation of neoadjuvant CRT was more convincing, and a systematic control comparison was supposed to be studied in the future.
In conclusion, we developed a radiomics signature-based model that incorporated the early change of RFs based on the sequential whole-tumor ADC map to predict treatment response to cCRT in patients with ESCC. This model provided a solution to quantify intratumoral changes and utilized them to guide treatment planning at an early date.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Institutional Review Board of Shandong Cancer Hospital and Institute. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.