Prognostic Value of Texture Analysis Based on Pretreatment DWI-Weighted MRI for Esophageal Squamous Cell Carcinoma Patients Treated With Concurrent Chemo-Radiotherapy

Purpose: The purpose of the research was to assess the prognostic value of three-dimensional (3D) texture features based on diffusion-weighted magnetic resonance imaging (DWI) for esophageal squamous cell carcinoma (ESCC) patients undergoing concurrent chemo-radiotherapy (CRT). Methods: We prospectively enrolled 82 patients with ESCC into a cohort study. Two DWI sequences (b = 0 and b = 600 s/mm2) were acquired along with axial T2WI and T1WI before CRT. Two groups of features were examined: (1) clinical and demographic features (e.g., TNM stage, age and sex) and (2) changes in spatial texture characteristics of the apparent diffusion coefficient (ADC), which characterizes gray intensity changes in tumor areas, spatial pattern and distribution, and related changes caused by CRT. Reproducible feature sets without redundancy were statistically filtered and validated. The prognostic values associated with overall survival (OS) for each parameter were studied using Kaplan-Meier and Cox regression models for univariate and multivariate analyses, respectively. Results: Both univariate and multivariate Cox model analyses showed that the energy of intensity histogram texture (IHIST_energy), radiation dose, mean of the contrast in distance 1 of 26 directions (m_contrast_1), extreme difference of the homogeneity in distance 2 of 26 directions (Diff_homogeneity_2), mean of the inverse variance in distance 2 of 26 directions (m_lnversevariance_2), high-intensity small zone emphasis (HISE), and low-intensity large zone emphasis (LILE) were significantly associated with survival. The results showed that 6 texture parameters extracted from the ADC images before treatment could distinguish among high-, medium-, and low-risk groups (log-rank χ2 = 9.7; P = 0.00773). The biased C-index value was 0.715 (95% CI: 0.708 to 0.732) based on bootstrapping validation. Conclusions: The ADC 3D texture feature can be used as a useful biomarker to predict the survival of ESCC patients undergoing CRT. Combining ADC 3D texture features with conventional prognostic factors can generate reliable survival prediction models.


INTRODUCTION
Esophageal squamous cell carcinoma (ESCC) is a disease with increasing incidence, and the diagnosis still carries a poor prognosis despite advances in therapy (1). Currently, chemo-radiotherapy (CRT) is the standard treatment for locally advanced unresectable ESCC. Due to tumor heterogeneity, these patients usually do not have the same response to a specific therapy. Thus, many patients may receive therapy that provides no benefit to them. Recently, a major research focus has been on how to provide individualized therapy. Individualized therapy requires the development of biomarkers to predict treatment prognosis and outcome. Imaging biomarkers, particularly those based on functional imaging techniques that can characterize biological effects at the cellular level, offer great potential to improve individualized therapy (2).
Diffusion-weighted magnetic resonance imaging (DWI) is a powerful MR functional imaging sequence sensitive to water diffusion (3) that can detect morphological changes in tumors at the molecular or cellular level. DWI has been studied for its potential to evaluate the treatment response to CRT for several types of cancers, including rectal cancer (4,5). The quantitative apparent diffusion coefficient (ADC) map is obtained from two different b values to remove the T2 "shine-through" effects. This can allow quantitative assessment for a treatment response. The ADC can also be used to characterize hypercellularity, distinguish cystic lesion and solid regions, and monitor the change in cellularity within the tumor over time (6). A recent study found that the ADC map can be used to qualitatively assess the tumor area and detect metastatic lymph nodes (LNs) in esophageal cancer (EC) (7).
Recently, the application of texture analysis (TA) in tumor diagnostics has caught the attention of clinical researchers. Texture is an important feature of images that has been used in qualitative and quantitative classification and analysis of materials in industry and medicine. Medical applications of TA provide a quantitative means to analyze and characterize the properties of tumor tissues and their physiological and pathological stages (8). Previous studies have reported that texture analysis can predict the treatment response and predict patient survival (9)(10)(11). It was reported that 3D texture analysis can be more useful than 2D analysis in characterizing intratumor heterogeneity (12). In the study of ESCC, because the esophagus is a tubular organ, 3D TA is expected to provide richer spatial heterogeneity information than slice samples.
The objective of this study was to prospectively investigate the prognostic value of 3D DWI features in ESCC patients treated with CRT. By studying different types of global and regional 3D features, we evaluate their potential prognostic value in correlation with patient survival.

Clinical Characteristics of the Patients
Eighty-two patients with newly diagnosed ESCC treated with CRT between 2010 and 2014 were initially enrolled in this prospective study. The inclusion criteria for the study included the following: (1)

MRI Acquisition
MR imaging in expiration breath-hold was performed before starting the treatment for tumor staging. The following imaging protocols were used:  (13), showing that the tumor lengths measured using a DWI scan of b = 600 s/mm 2 were close to the real tumor lengths based on a surgical specimen with a high concordance with pathology. As shown in Figure 1, the DWI scan with b = 600 s/mm 2 has a good image quality relative to the other DWI scans with b = 800 s/mm 2 or 1,000 s/mm 2 . Motion artifacts were minimized by acquiring all images with breath hold in the expiration phase. Because the tumor volumes could change slightly at the breath hold, two DWI scans with b = 0 and b = 600 were acquired in one cycle, improving the estimation of signal decay. This was expected to provide sufficient imaging information to describe tumor heterogeneity.

Image Preprocessing
An image preprocessing procedure was performed that included tumor segmentation and intensity normalization. The MRI data set in the DICOM format was imported into MATLAB (The Math Works Inc., Natick, MA). An in-house-developed radiomics image analysis program implemented in MATLAB was used for TA (available for share, https://pan.baidu.com/ s/1Tl_PsXrQj-OBJt-1cNjaZQ). The method used in this study is described in the Data Supplement. To perform reliable measurements, as suggested by Collewet et al. (14), the MRI data were kept in the raw data form, and voxels within the tumor region with intensities outside the range µ ± 3δ were excluded in subsequent texture computations. Voxel intensity values were typically resampled in four discrete values (16, 32, 64, or 128): where "Range" is the discrete values chosen (16, 32, 64, or 128), I (x)is the intensity of the original image, and is the set of pixels in the delineated area. The use of different resampling schemes was tested. As discussed in the Data Supplement, 32 discrete values for renormalization produced the most reliable results.

Tumor Delineation Using MRI Data
The tumor was delineated based on abnormal regions from T2weighted imaging (T2WI), DWI and ADC maps. Axial ADC maps were generated using an Extended Siemens MR Workspace workstation. The lesions showed relatively higher signals on DWI maps and lower signals on ADC maps. Pre-CRT MRI was first evaluated using the combination of corresponding T2WI and ADC images and matching between DWI and ADC to correctly position the regions of interest (ROIs) in the primary tumor. Axial T2 images in the same plane were referenced by the observers due to a lack of anatomic details because of the low signal-to-noise ratio (SNR) on DWI or ADC. Therefore, the registration accuracy between ADC/DWI and T1WI/T2WI was important for the process, which relies on careful registration by the alignment of local bone structures between ADC and T2WI images. The ROI was drawn along the border of the low signal of the tumor on the b = 600 mm/s 2 ADC images to cover the entire tumor area of each selected slice, avoiding regions with distortions or artifacts by verifying the lesion boundaries on T2WI. Delineation of the lesions was performed independently by two observers. Manual delineations were performed using MIM software (MIMvista Corp, Cleveland, OH). Independent samples t-test or the Kruskal-Wallis H test, where appropriate, was used to assess the differences between the features generated by reader 1 and those by reader 2, as well as between the twicegenerated features by reader 1. Inter-and intra-class correlation coefficients (ICCs) were used to evaluate the intra-and interobserver agreements of the contour agreement and feature extraction. ICC values >0.80 indicated good agreement. A freehand ROI was drawn along the border of the low signal of the tumor on the b = 600 images to cover the entire tumor area of each selected slice, by referencing T2WI to verify the lesion boundaries and ensure inclusion of the entire tumor area (Figure 2).

Texture Analysis
From ADC images, four subset features were extracted ( Table 2) to characterize tumor heterogeneity at global and regional levels using first-order and higher order statistics. These parameters were used to predict the patient response to CRT and survival. Global information was described by intensity histogram parameters (IHIST), including variance, mean, energy, entropy, skewness, and kurtosis, while regional information was characterized by intensity size-zone variability features (ISZFs). Regional heterogeneity information included intensity variability in the size and tumor zones [see Tixier et al. (15) for the mathematical definition of the regional heterogeneity formula used in this study]. Local heterogeneity information was derived using the co-occurrence of the gray-level co-occurrence matrix (GLCM) and gray-level gradient co-occurrence (GLGCM). Twenty-six gray-level co-occurrence matrices were computed in the direction of the 26 uniform distributions on the sphere from each voxel data area. GLGCM was acquired in the original ADC image, and the corresponding gradient image of the ADC image and 15 Haralick features (16) were extracted from GLGCM. To Similarly, 60 gray gradient features were extracted from GLGCM. One hundred twenty-eight local heterogeneity features of co-occurrence matrices characterized variations in the intensity between consecutive voxels. For texture reporting with GLCM, the notation convention "method"_"feature"_"number" was used; for example, the mean of the contrast in distance 2 of 26 directions would be identified by m_contrast_2 and the extreme difference of cluster shade in distance 1 of 26 directions would be identified by Diff_clustershade_1. The GLGCM features were identified by GLGCM_"feature"_"distance"; the small gradient emphasis in distance 4 would be identified by GLGCM_Small gradient emphasis_4. The histogram-related features would be abbreviated by IHIST_"feature"; for instance, the histogram energy feature would be identified by IHIST_energy. The detailed feature information is shown in Table 2. The algorithms for texture feature extraction are described in the Data Supplement.

Feature Selection Methods
In this study, 229 features in four categories were selected. Notably, not all the features required evaluation because many features would be irrelevant or redundant. Therefore, the number of features tested must be reduced by feature extraction. The three major reasons to perform feature reduction are as follows: (1) to reduce the training time; (2) to improve the robustness; and (3) to enhance the reliability.
To assess texture feature reproducibility, Fried DV's method was used to perform test-retest scans from 10 independent patients (17). The results are shown in the Data Supplement. Reproducibility of the characteristic parameters is an important characteristic in repeated experiments. In this study, a concordance correlation coefficient (CCC) value >0.9 was considered to guarantee reproducibility. Another consideration was the use of a defined "dynamic range" (DR) metric to select highly differentiated features. Similar to CCC, DR ≥ 0.9 indicates that this feature has a large dynamic range (18). The R 2 of simple regression was equal to the square of the Pearson correlation coefficient. Values close to 1 indicate that the data points were close to the fitted line. These features were grouped by R 2 between them. We recursively repeated the process to cover all features. We also calculated R 2 between the remaining features to quantify the dependencies. Using the above methods, 38 features were chosen in the penalized model with highly reproducibility and a dynamic range.
To avoid an inadequate sample size to train and test, the "leave one out" cross validation method was used to test the model stability. Using many features, it was difficult to predict which parameters would be useful to indicate patient treatment responses and survival. Therefore, it was necessary to reduce the number of features to improve the predictability and reliability for analysis. The least absolute shrinkage and selection operator (LASSO) method was used to select the most useful predictive features from the primary data set.
The abovementioned features and clinically relevant features were entered into a penalized multivariate Cox proportional hazards model (Adaptive Elastic Net Cox model) that simultaneously performs covariate selection in addition to model development. The Adaptive Elastic Net method for the Cox model has a grouping effect (19,20). By minimizing the opposite number of the Cox model first, and then adding the appropriate penalty, the Elastic Net estimator for the Cox model was obtained:

Statistical Analysis
Statistical analysis was performed using SPSS19.

Overall Therapeutic Response and Survival
After the completion of CRT, an overall therapeutic response (TE) was estimated according to the RECIST 1.1 standard (21). Thirty-six patients (50%) were determined to have a complete response (CR), and 36 (50%) patients had a partial response (PR). The overall effective response rate was 100.0%. All patients were followed up for over 1 year, and 27 patients (37.5%) were followed over 2 years. The median follow-up time was 16.5 months. The 1 and 2 year OS rates for all patients were estimated at 72.2 and 34.7%, respectively. According to the overall treatment response (CR, PR), the 1 and 2 year survival rates of CR patients were 86.1 and 38.9%, respectively, and those of PR patients were 58.3 and 30.1%, respectively. Significant differences were found between the two groups (log-rank test; χ 2 = 4.153, P = 0.042).

Prognostic Value of ADC Radiomics Data
The possible association of ADC map features with survival was explored by Kaplan-Meier survival analysis. No significant correlation was found between any ADC value measurement (ADC mean , ADC up , ADC down , ADC min , ADC max ) in ESCC patients undergoing CRT (P = 0.224, 0.534, 0.549, 0.328, 0.369). The results of the log-rank analysis of conventional prognostic factors for OS in univariate analysis are given in Table 3.
Age, sex, tumor site, TNM stage, and treatment type were not significant prognostic factors according to the results of univariate analysis. In univariate analysis, the GTV (Gross Tumor Volume size), pathology lesion length, therapeutic effect and radiation dose were significant prognostic factors. Univariate analysis of image texture showed that the IHIST_energy, m_contrast_1, m_Cluster shade_2, Diff_Clusetr Tendency_2, Diff_homogeneity_2, m_lnversevariance_2, Small gradient emphasis_1, GLGCM_small gradient emphasis, highintensity small zone emphasis (HISE) and low-intensity large

Feature Selection
Thirty-eight texture features were reduced to 6 nonzero coefficients in the LASSO model with potential predictors based on 72 patients in the primary cohort. The detailed results used in this study were reported in the Data Supplement.
To further define the predictive values of ADC, multivariate Cox regression model analysis was performed using adjusted clinical factors. Table 4 lists the multivariate analysis results. IHIST_energy, the energy of intensity histogram texture; m_contrast_1, the mean of contrast in distance 1 of 26 directions; Diff_homogeneity_2, the extreme difference of homogeneity in distance 2 of 26 directions; m_Inverencevariance_2, the mean of inverse variance in distance 2 of 26 directions; HISE, high intensity small zone emphasis; LILE, low intensity large zone emphasis.

Validation of Model Performance
The study used the "hdnom" package to assess the model performance by time-dependent AUC using the "leave one out" cross-validation method. We validated the Adaptive Elastic Net multivariate Cox model performance every 6 months. Figure 3 shows the mean, median, maximum, minimum, and 25 and 75% quartiles of time-dependent AUC at each time point across all fold predictions. The median and mean values could be considered the bias-corrected estimation of the model performance. The "leave one out" validation could ensure robust results. The figure shows that the median and mean values at each evaluation time point were relatively close. The results showed that the model had a relative high AUC value at each time point. The study used resampling methods of "leave one out" cross validation for internal model calibration. We split the samples into three risk groups according to the adaptive Elastic Net multivariate Cox model. The model calibration results (median of the predicted survival probability; median of the observed survival probability by the Kaplan-Meier method with 95% CI) are shown in Figure 4. The C-index for the prediction model was 0.720 (95% CI: 0.713 to 0.731) for the primary cohort, which was confirmed to be 0.715 (95% CI: 0.708 to 0.732) via bootstrapping validation. We used the Kaplan-Meier survival curve and values in the risk table to further analyze the survival differences among different risk groups. Here, we plotted the Kaplan-Meier survival curve and assessed the amount of risk in three risk groups from 1 to 3 years (Figure 5; χ 2 = 9.7, Log-rank P = 0.00773).

DISCUSSION
DWI is a powerful MR sequence that provides unique information related to tumor cellularity and the integrity of the cellular membrane. The technique can be applied widely to detect and characterize tumors and to monitor the treatment response (6). The ADC map can be acquired by two DWIs (e.g., b values of 0 and 600 mm/s 2 ) using an MR workstation. The ADC map is independent of the magnetic field strength and can overcome the effects of T2 shine-through, thus allowing more meaningful comparison of the results. We also performed experiments using 800 mm/s 2 and 1,000 mm/s 2 (Figure 1). However, the results were not reliable, the stability of the parameters was not high, and the repeatability was not good. The possible reasons for the above situation may be that a higher b value will introduce much more noise in chest tumors. Recent investigations have demonstrated that the pretreatment ADC value may be applied as a biomarker to predict and detect early the treatment response in ESCC, but the results remain controversial. Koyama et al. (22) reported that tumors with a lower pretreatment ADC value and a higher signal intensity at DWI responded better to treatment. Koh et al. (6) discussed the mechanism of this phenomenon and showed that tumors with a high pretreatment ADC value were likely to be more necrotic than those with a low ADC value. Necrotic tumor tissues are frequently hypoxic, acidotic and poorly perfused, leading to diminished sensitivity to CRT. However, not all studies support this hypothesis. Aoyagi et al. studied 80 patients with advanced EC and found that tumors with a higher pretreatment ADC value responded better to treatment (23). They also performed a further study and found that the pretreatment ADC value was not significantly different between the responder and  non-responder groups (24). Wang et al. also found no direct correlation between the pretreatment ADC value and treatment response in EC (25). The reasons for the controversy could be that simple ADC values only show limited information (one dimensional information) that only reflect variability (high or low), not including geometric distribution. Texture features can overcome the above defects, having the potential to show and quantify pixels or the voxel geometric distribution. With the development of imaging analysis, much evidence has suggested that TA can aid clinicians in cancer diagnosis (26), staging (27), prognoses (28), and response assessments (15). In our study of 82 patients with the diagnosis of ESCC, interestingly, we showed that the pretreatment DWI texture features can provide useful prognostic information for ESCC patients. Finally, previous studies were mostly focused on limited tumor areas, such as contouring ROIs in the largest section, rather than the global tumor volume. In our study, to compensate for the 2D texture feature defects (12,29,30), 3D texture parameters were chosen to evaluate the prediction potentials.
We first analyzed the intensity histogram features with highly reflected distribution of ADC values. The other texture features mainly focused on the local and regional scales, which were used to analyze the interrelationship between pairs of voxels and arrangement of voxels. From microscopy, the order of voxels reflected local non-uniformities. Our analysis showed that IHIST_energy, m_contrast_1, Diff_homogeneity_2, m_Inversevariance_2, HISE, and LILE have strong and independent associations with the OS rates. The IHIST_energy measures the homogeneity of gray distribution. The higher value depicts more homogeneity than the lower one. m_Contrast_1 is a measure of the contrast or amount of local variation present in the ADC. The tumor usually has a large amount of local variations present in the image compared with the normal part. The other parameters (Diff_homogeneity_2, m_Inversevariance_2) were a measure of homogeneity of the image. This represents the change in the tumor gray level and reflects the aggregation of tumor cells on the macro level. The HISE measures the joint distribution of small zones and high gray-level values. The LILE has opposite characteristic to HISE and measures the joint distribution of large zones and high graylevel values. These features represent spatial ADC variability in esophageal tumors, explaining why these ADC texture features are better prognostic factors than simple global ADC values.
The OS variation in ESCC patients treated with CRT is highly related to tumor heterogeneity due to its intra-tumor spatial variation in the cellularity, angiogenesis, extravascular, extracellular matrix, and areas of necrosis. The high tumor heterogeneity was shown to have a poorer prognosis and treatment resistance (31). Ganeshan et al. (32) found that tumor heterogeneity in EC could be reflected by TA. Our study showed that six 3D texture parameters extracted from ADC maps can distinguish among the high-, median-, and low-risk group (Logrank χ 2 = 9.7; P = 0.00773). The idea behind this performance is that texture parameters can reflect the movement of water molecules and tumor heterogeneity. This may become a major mechanism to explain why texture parameters can accurately associate with the OS of ESCC.
To ensure the model's stability, the test-retest method was used to test the selected feature stability in the feature selection step. In the model validation step, the "leave one out" cross validation for both model validation and model calibration was used. Compared with the general sampling test (splitting their data into test and validation sets), the LASSO regularization scheme was used to prevent over fitting (33).
This study has an important clinical significance. Uncertainties remain in the treatment of ESCC, including the scope of the radiotherapy target area, the dose of radiation therapy, the consolidation chemotherapy maintenance period, the assessment of the clinical effect and so on. The cause of the above uncertainty remains a lack of effective means to describe ESCC heterogeneity. The texture features combined with conventional prognostic factors may present a more accurate predictive tool. Our research showed that texture features can be used to evaluate the prognosis of ESCC after CRT at the early phase. However, our study is limited by several factors, the most important of which is the prospective nature of the assessment using a relatively small group of patients. It is necessary to expand the sample size for further study to clearly explore the relationship between the global ADC value and OS. Another limitation of this study is that the tumor regions of interest were drawn manually; inter-and intra-observer variation could be reduced if automated methods were used in the future, particularly for multicenter studies.

CONCLUSIONS
Based on the ADC images, the texture parameters extracted by computer semi-automatic extraction are related to ESCC patient survival. This study confirms that the combination of ADC textures (histogram feature, GLCM feature, and ISZF feature) and conventional prognostic factors (radiation dose) can be used to generate robust models to predict OS. Future work needs to further verify the practical value of related parameters in clinical application.

DATA AVAILABILITY STATEMENT
The datasets for this manuscript are not publicly available because involving patient confidential information. Requests to access the datasets should be directed to zhenjli@seu.edu.cn.

ETHICS STATEMENT
Registration number and name of registry: 20091124-2. Project: Prognostic Value of Pretreatment Diffusion Weighted Magnetic Resonance Imaging based Texture in Concurrent Chemoradiotherapy of Esophageal Squamous Cell Cancer. Institute: Shandong Cancer Hospital Affiliated to Shandong University, Department of Radiation Oncology, the Fourth Hospital of Hebei Medical University. Research Summary: In this retrospective cohort study, we aims to include a total 100 patients with esophageal squamous cell carcinoma in Chinese population. MR imaging examination was performed before starting the treatment for tumor staging. The imaging protocol included: (1) 2D Fast Low-Angle Shot imaging (FLASH)/T1-weighted and DWI was obtained in axial planes. Privacy Highlights: patients privacy. Comments of Institutional Review Board: After carefully examination of the relative information, including researcher qualifications, research programme, CRF, and informed consents, the study was approved by the Institutional Review Board.