Establishing ADC-Based Histogram and Texture Features for Early Treatment-Induced Changes in Head and Neck Squamous Cell Carcinoma

The purpose of this study was to assess baseline variability in histogram and texture features derived from apparent diffusion coefficient (ADC) maps from diffusion-weighted MRI (DW-MRI) examinations and to identify early treatment-induced changes to these features in patients with head and neck squamous cell carcinoma (HNSCC) undergoing definitive chemoradiation. Patients with American Joint Committee on Cancer Stage III–IV (7th edition) HNSCC were prospectively enrolled on an IRB-approved study to undergo two pre-treatment baseline DW-MRI examinations, performed 1 week apart, and a third early intra-treatment DW-MRI examination during the second week of chemoradiation. Forty texture and six histogram features were derived from ADC maps. Repeatability of the features from the baseline ADC maps was assessed with the intra-class correlation coefficient (ICC). A Wilcoxon signed-rank test compared average baseline and early treatment feature changes. Data from nine patients were used for this study. Comparison of the two baseline ADC maps yielded 11 features with an ICC ≥ 0.80, indicating that these features had excellent repeatability: Run Gray-Level Non-Uniformity, Coarseness, Long Zone High Gray-Level, Variance (Histogram Feature), Cluster Shade, Long Zone, Variance (Texture Feature), Run Length Non-Uniformity, Correlation, Cluster Tendency, and ADC Median. The Wilcoxon signed-rank test resulted in four features with significantly different early treatment-induced changes compared to the baseline values: Run Gray-Level Non-Uniformity (p = 0.005), Run Length Non-Uniformity (p = 0.005), Coarseness (p = 0.006), and Variance (Histogram) (p = 0.006). The feasibility of histogram and texture analysis as a potential biomarker is dependent on the baseline variability of each metric, which disqualifies many features.


INTRODUCTION
Recent publications have summarized the current state of radiomics research utilizing functional imaging in head and neck cancer for tumor segmentation, prognostic and predictive response biomarkers, and monitoring of normal tissue sequelae (1,2). While 18 F-fluorodeoxyglucose positron emission tomography ( 18 FDG-PET) can provide information on tumor metabolic activity before, during, and after definitive radiation or chemoradiation (CRT), its sensitivity to inflammation and poor spatial resolution pose significant limitations for imaging during therapy (3).
Diffusion-weighted magnetic resonance imaging (DW-MRI) has garnered much interest in the past decade as an imaging biomarker for cancer treatment response with the potential to detect early treatment-induced changes (4)(5)(6)(7)(8)(9). DW-MRI-derived apparent diffusion coefficient (ADC) maps express water molecular motion, which tends to be relatively low in tumors due to their higher cellular density than normal tissue. CRT has been shown to increase water molecular motion in tumors undergoing a favorable treatment response (10). Most DW-MRI studies have investigated first-order histogram features, such as ADC change of tumors between pre-treatment and post-treatment examinations or pre-, intra-, and posttreatment examinations (7,(11)(12)(13). Vandecaveye et al. determined that percentage change in ADC from 3 weeks post-CRT relative to baseline examination was significantly correlated with patient outcome, allowing for early assessment of treatment response (12). In a similar study by Kim et al. that also included early intra-treatment examinations (1 week after treatment began), the change in ADC between baseline and intra-treatment examinations had the highest sensitivity for differentiating complete and partial responders (11). These studies have shown the viability of using DW-MRI in the setting of detecting response during or shortly after completion of treatment by utilizing first-order histogram imaging features. Beyond the use of first-order ADC histogram features such as mean ADC, recent studies have evaluated the use of secondorder texture features for response prediction (1,2,14,15).
An important aspect of interpreting these early treatmentinduced changes is understanding the underlying baseline repeatability of the features derived from the ADC maps (5). The National Cancer Institute-sponsored consensus conference on DW-MRI identified baseline repeatability as an important parameter needing evaluation (5). To date, our group has been the only one to examine baseline mean ADC variability for HNSCC (16). The repeatability coefficient for baseline nodal ADC was 15%, and the authors emphasized the importance of accounting for inherent baseline variability when assessing treatment-induced changes during CRT. These studies suggest that texture features can be used to quantify treatment-induced changes, which may correlate with histologic changes in MR studies (7).
The primary purpose of this study was to establish baseline repeatability of histogram and texture features from baseline pre-treatment DW-MRI-derived ADC maps. A secondary goal was identification of a subset of histogram and texture features that significantly changed between baseline and early treatment DW-MRI-derived ADC maps. We hypothesized that features with a high baseline repeatability would be candidates for further evaluation as features that could predict early treatmentinduced changes.

Patient Cohort and Clinical Protocol
Patients in this IRB-approved retrospective study were adults undergoing concurrent CRT for head and neck squamous cell carcinoma (AJCC stages II-IVB, 7 th edition). Patients received intensity-modulated radiation therapy (IMRT) to a total dose of 70 Gy at 2 Gy/fraction (6MV photons, between 5 and 11 fields) with concurrent cisplatin and targeted therapy with bevacizumab and erlotinib and represented a subset of the overall patient population. The outcome of this trial has been reported elsewhere (clinicaltrials.gov: NCT00140556) (17). Written informed consent was obtained from all patients.
The purpose of this study was to investigate the application of DW-MRI for tumor physiologic assessment of patients undergoing concurrent chemoradiation for locally advanced head and neck cancer (16)(17)(18). The goal of this study was to evaluate imaging texture features for prognostic potential. Each patient underwent two baseline (baseline 1 and baseline 2) DW-MRI examinations, 1 week apart, to establish baseline intrinsic variability and a third examination (early treatment) performed in the second week of CRT. DW-MRI data generated ADC maps for the nodal disease were analyzed.

MR Imaging Protocol
MRI data were acquired on a 1.5-T system (Signa EXCITE, GE Healthcare [software version 14x and 15x], Fairfield, CT) with a bird-cage quadrature head coil with a head and neck immobilization designed during the CT simulation.

Image Processing and ROI Selection
For the retrospective analysis, ADC maps were rigidly registered to their temporally corresponding CT examination (baseline 1, baseline 2, or early intra-treatment) using an image registration software (Velocity AI, Velocity Medical Solutions). Nodal volumes were contoured by experienced radiation oncologists (DB, DY, YM). Following registration of the ADC map to CT, the functional image was resampled to the CT. After resampling, the ADC maps were updated to the resampled CT resolution.

Histogram and Texture Features
Six histogram and 40 texture features were calculated on all resampled ADC maps for the nodal ROIs. These features are summarized in Table 1.
Texture analysis was accomplished using in-house software (19) to calculate four mathematical matrices representing the spatial distribution of voxels in an image: Gray-Level Co-occurrence Matrix (GLCM), Gray-Level Run Length Matrix (GLRLM), Gray-Level Size Zone Matrix (GLSZM), and the Neighborhood Gray-Level Difference Matrix (NGLDM) (20)(21)(22)(23)(24). Texture analysis requires discrete values in order to tabulate the gray levels into matrices. Further, the voxels must be resampled to a normalized value between patient image sets. Resampling to 64 (bit depth of 6-bit) was used in this study. Thirteen angles were used to calculate the texture features. GLCM features are used to quantify coarseness vs. smoothness within an image by calculating how often pairs of voxels having specific values with a specific spatial relationship occur within an image. GLRLM is similar to GLCM, but instead looks at lengths of voxels rather than pairs. Therefore, GLRLM, while still considering coarseness, focuses on the lengths of similar high-and low-intensity gray levels within an image. GLSZM identifies zones of uniform voxels within an image, as opposed to lengths. GLSZM calculates the occurrences of a specific gray level found in zones, or groups, regardless of size. GLSZM is useful for identifying the homogeneity of an image. NGLDM features are used to investigate the variation found between texture zones in an image.
Texture features representing characteristics of the image are derived from these texture matrices: 13 local features from the GLCM, 11 regional features from the GLRLM, 11 regional features from the GLSZM, and 5 local features from the NGLDM.

Statistical Methods
To identify features with high repeatability between baseline studies (baseline 1 and baseline 2), we compared all histogram and texture features using the intra-class correlation coefficient (ICC). Variance between baselines reflects inherent imprecision of the instrument used for measuring as well as any intrinsic baseline variation. An ICC cutoff value of at least 0.8 was chosen, indicating that the metric had excellent repeatability (25).
For features with highest repeatability with an ICC ≥ 0.80, the Wilcoxon signed-rank test was used to test if the difference between the mean of the baselines and the early treatment for a given metric was nonzero, indicating a treatment-induced change. A Holm-Bonferroni correction was applied to control for the familywise error rate (26). An adjusted p-value was thus calculated for each metric. All statistical analyses were performed using SAS 9.4 (SAS Institute, Inc. Cary, NC).

Patient Cohort and Clinical Protocol
Nine patients were evaluated in this study, corresponding to patient numbers 1, 2, 3, 4, 5, 7, 8, 12, and 14 in Table 1 by Hoang et al. (16) Seven patients were excluded from this analysis: three with missing/incomplete CT data (patient numbers 6, 9, and 11), three with image registration problems (patient numbers 10, 13, and 16), and one excluded during statistical analysis as an outlier (patient number 15). For these nine patients, all except one (number 7) were complete responders to treatment and did not recur subsequently. Table 2 lists the nodal volumes size for the patients evaluated in this study. The results of the ICC for baseline histogram and texture features are shown in Table 3. Eleven features showed an ICC ≥ 0.8, indicating that these features had excellent repeatability:

Baseline to Early Treatment Histogram and Texture Feature Significance
The results of the Wilcoxon signed-rank test for the paired data comparing average baseline to early treatment features are shown in Table 4. After application of the Holm-Bonferroni method to adjust the p-values, four features remained statistically significant: Run Gray-Level Non-Uniformity (p = 0.005), Run Length Non-Uniformity (p = 0.005), Coarseness (p = 0.006), and Variance (Histogram) (p = 0.006). Average baseline and early intra-treatment values for these four features for all nine patients are shown in Figure 1. For all nine patients, Run Gray-Level Non-Uniformity, Run Length Non-Uniformity, and Variance (Histogram) decreased between baseline and early treatment ADC maps. Coarseness increased for all nine patients between baseline and early intra-treatment ADC maps.

DISCUSSION
Notably, this is the first study to investigate baseline repeatability of histogram and texture features in ADC maps derived from baseline DW-MRI examinations in patients with head and neck squamous cell carcinoma. It also analyzed which of these features changed significantly early in the course of treatment relative to their baseline values. This study is unique as it established the variability of pre-treatment baseline DW-MRI ADC first-order histogram and second-order texture features. Of the 45 histogram and texture features, four features displayed excellent repeatability (ICC ≥ 0.8), and early treatment changes in their values were found to be significant: one first-order histogram feature (Variance) and three second-order texture features: two features from the GLRLM  (Run Gray-Level Non-Uniformity, Run Length Non-Uniformity) and one feature from NGLDM (Coarseness). Mean ADC increased from baseline to early treatment consistent with tumor death-associated decreased cellular density ( Figure 2). Variance decreased between baseline and early treatment, indicating a narrower distribution of the ADC values, i.e., more voxels having similar values than they had at baseline ( Figure 1D). Coupled with the increase in mean ADC values, the ADC values were more homogenous in the early intra-treatment examination.  Run Gray-Level Non-Uniformity measures the variability of gray-level intensity values in the ADC map. A lower Run Gray-Level Non-Uniformity thus indicates that neighboring voxels tend to have similar ADC intensity values, i.e., indicating a more homogenous diffusion coefficient of water outside the cell walls. For all patients in this study, the Run Gray-Level Non-Uniformity decreased between the baseline and early intra-treatment examinations ( Figure 1A). Run Length Non-Uniformity measures the similarity of length runs where a lower value indicates more homogeneity among the run lengths in the image, i.e., more runs of voxels of the same length. For the nine patients, the Run Length Non-Uniformity decrease from baseline to early intratreatment indicates more runs of voxels of the same length ( Figure 1B). Coarseness is a measure of the average difference between the center voxel and its neighbor voxels and is an indication of the spatial frequency. A higher value indicates a lower spatial frequency and thus a locally more uniform texture. For the nine patients, the coarseness increased, i.e., the ADC maps displayed more uniform values across the ROIs ( Figure 1C).
Interpreting the biological meaning of higher-order features such as texture features is not straightforward because these features usually cannot be explained by physiological models. However, the summarized texture feature results indicate that characteristics of the ADC map changed to make the values more homogeneous.
Limitations of this study include that it was a small, singleinstitution pilot study. Future research should include larger, multiinstitutional data sets. Further, since the ADC measurements were obtained by a single radiologist, interobserver and intra-observer variability could not be assessed. Notably, texture features typically are not robust to dynamic range and matrix size; consistency in applied methodology in both imaging technique and analysis is key, especially when validating results across institutions (27,28). Image acquisition parameters that can affect the robustness of the resulting texture feature include selection of b-value, repetition time, echo time, inhomogeneity, and receiver bandwidth (29). Multiple studies have shown that the range and number of b-values used to acquire DW-MRI can affect the resulting ADC maps. It is known that certain b-values can cause image distortions due to the chemical shift artifact (30). However, to date there is no study that has assessed the effect of b-value choice on the resulting texture features. b-value selection for this study was per our clinical acquisition protocol. The use of the same acquisition protocol for both baseline and early-treatment DW-MRI acquisitions on the same scanner negated the potential effect of varying b-values on the resulting texture features. Thus, clinical implementation requires that all DW-MRI scans be performed on the same scanner with the same protocol to remove any scanner-dependent variability in the resulting histogram and texture features (31). Other contributions to uncertainty in the baseline 1 and baseline 2 images can arise from patient-specific, physiological, and anatomical variations. A study on healthy subjects by Kolff-Gart et al. indicated that between-visit ADC values acquired 1 month apart varied the least if the subject is scanned on the same scanner with the same parameters (32).
Future work will include a larger patient database to investigate the value of changes in Run Gray-Level Non-Uniformity, Run Length Non-Uniformity, Coarseness, and Variance (Histogram) between baseline and early intratreatment ADC maps in predicting early treatment response and nonresponse to CRT for patients with HNSCC.
In conclusion, lack of baseline repeatability disqualifies many histogram and texture features from DW-MRI-derived ADC maps for quantification of early treatment-induced changes in HNSCC patients. Furthermore, a subset of these ADC histogram and texture features shows significant changes after 1 week of CRT and warrants further study to provide quantitative assessment of early treatment changes for HNSCC using DW-MRI.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Duke University Health System Institutional Review Board. The patients/participants provided their written informed consent to participate in this study.