Predicting the Local Response of Metastatic Brain Tumor to Gamma Knife Radiosurgery by Radiomics With a Machine Learning Method

Purpose The current study proposed a model to predict the response of brain metastases (BMs) treated by Gamma knife radiosurgery (GKRS) using a machine learning (ML) method with radiomics features. The model can be used as a decision tool by clinicians for the most desirable treatment outcome. Methods and Material Using MR image data taken by a FLASH (3D fast, low-angle shot) scanning protocol with gadolinium (Gd) contrast-enhanced T1-weighting, the local response (LR) of 157 metastatic brain tumors was categorized into two groups (Group I: responder and Group II: non-responder). We performed a radiomics analysis of those tumors, resulting in more than 700 features. To build a machine learning model, first, we used the least absolute shrinkage and selection operator (LASSO) regression to reduce the number of radiomics features to the minimum number of features useful for the prediction. Then, a prediction model was constructed by using a neural network (NN) classifier with 10 hidden layers and rectified linear unit activation. The training model was evaluated with five-fold cross-validation. For the final evaluation, the NN model was applied to a set of data not used for model creation. The accuracy and sensitivity and the area under the receiver operating characteristic curve (AUC) of the prediction model of LR were analyzed. The performance of the ML model was compared with a visual evaluation method, for which the LR of tumors was predicted by examining the image enhancement pattern of the tumor on MR images. Results By the LASSO analysis of the training data, we found seven radiomics features useful for the classification. The accuracy and sensitivity of the visual evaluation method were 44 and 54%. On the other hand, the accuracy and sensitivity of the proposed NN model were 78 and 87%, and the AUC was 0.87. Conclusions The proposed NN model using the radiomics features can help physicians to gain a more realistic expectation of the treatment outcome than the traditional method.


INTRODUCTION
Approximately 5 to 40% of cancer patients are diagnosed with a metastatic brain tumor during their treatment. Furthermore, patients have brain metastases (BMs) ten times more often than primary malignant tumors of the brain (1,2). Consequently, BM is the most common brain tumor treated by radiation therapy. Whole-brain radiation therapy (WBRT) and stereotactic radiosurgery (SRS) are regularly offered to manage BMs. The techniques are effective with improved local control of tumors and more prolonged survival of patients (3). The RTOG-9508 study compared the treatment responses of WBRT alone, SRS alone, and WBRT plus SRS for the BMs (4). For the WBRT alone or WBRT plus SRS, the total prescribed dose of WBRT was 37.5 Gy with 2 to 5 Gy per fraction. For the SRS treatment, the prescribed dose was assigned from an earlier dose-escalation RTOG radiosurgery trial (90-05) (5). The mean survival time did not differ much among the three techniques. The local control rate at three months after WBRT plus SRS or WBRT alone ranged from 71 to 82%, indicating about 20 to 30% failure rate. Hence, a predictive capability of the radiation therapy outcome of BMs may provide a decision tool to clinicians for the effective management of patient care with the most desirable treatment outcome. If the local failure is predicted for radiotherapy, the treatment plan can be modified to improve the local control by, for example, increasing the dose.
There are several prognostic tools or prognostic indices, specially developed for the radiation therapy of BMs such as the RTOG Recursive Partitioning Analysis (RPA), the Score Index for Radiosurgery (SIR), the Basic Score for Brain Metastases (BSBM), and the Graded Prognostic Assessment (GPA) (6). These indices are proven to have clinical value for predicting the treatment outcome. The addition of more detailed clinical information to the pretreatment characteristics used by the existing prognostic indices might improve the predictive performance. Such new information includes the biological data (i.e., biological markers and genomics) specific to the patient (7) and the quantitative imaging data obtained by radiomics (8)(9)(10).
Radiomics analyzes the medical image quantitatively to explore features unique to a patient (11). It has been used for classifying patients and evaluating their risk to customize oncological treatments (12,13). Some researchers used radiomics to find the correlation between radiomics signatures and radiation treatment outcome (14)(15)(16). Zhou et al. tried to predict survival after chemotherapy of glioblastoma patients using several imaging features based on MR image (17). Ryu et al. performed a prognostic prediction using features obtained from functional images (18). Other studies have combined radiomics with genomics to associate radiomics features with gene mutations that are clinically proven to predict therapy response (19). A recent study reported that radiomics features could potentially be used as surrogate biomarkers for predicting tumor prognosis following Gamma Knife radiosurgery (GKRS) (20).
Goodman et al. categorized brain tumor images into three groups: homogeneous, heterogeneous, or ring-enhancing (21). They found that these enhancement patterns are significant prognostic factors in the response of brain metastases after radiosurgery. A drawback of their approach is the subjective nature of the classification technique. Visual classification into one of the three patterns is often neither possible nor accurate because real images do not display clear ring-like features or completely uniform pixel colors throughout the tumor.
In the current study, therefore, we proposed the application of radiomics and a machine learning (ML) technique to create a more reliable and accurate method than the decision with the visual evaluation for predicting the treatment outcome, in particular, the local response of the tumor to radiation therapy. Primarily, we built a model to predict the response of metastatic tumors treated with GKRS.

Patients
Previously, we analyzed the treatment outcome of 88 patients with either renal cell or melanoma cancer as the primary disease, who underwent GKRS at the University of Minnesota from 2005 to 2012 for their BMs (22). For the current study, we selected a subset of the patients, 45 melanoma patients with a total of 115 tumors, for model building. Furthermore, we obtained the new data of nine melanoma patients with a total of 42 tumors from the database of GKRS patients treated from 2013 to 2017 for the final evaluation of the model. The characteristics of the patients and their tumors are presented in Table 1.

Image Acquisition
All patients were scanned with a 1.5T MRI (Siemens Syngo MR) scanner. The total scanning time was about 15 min for the whole brain scan. We used the Siemens 12 channel head matrix coils. The scanning protocol was a FLASH (3D fast, low-angle shot) with gadolinium (Gd)-contrast enhanced T1-weighting. The scan parameters are shown in Table 2.

Treatment
We treated patients with the Leksell Gamma Knife Model 4C (Elekta AB, Stockholm, Sweden). The prescription dose was decided based on tumor size according to the RTOG 90-05 trial protocol (5). The prescription isodose level varied from 40 to  Table 1 shows the number of tumors in these three-volume ranges.

Follow-Up
Patients after GKRS were followed at 3-month intervals with MRI performed at each visit. The time from the first SRS to the last follow-up imaging study or death was defined as patient follow-up duration.

Treatment Response Evaluation
To evaluate the local response (LR) of the tumors to the treatment, we measured the maximum lengths of a tumor in three orthogonal directions using pretreatment and follow-up MRI images. Tumor volumes were calculated with the ellipsoid volume formula. The LR status of treatment was determined by using the latest available follow-up imaging study at the time of the data collection. The medium follow-up length was 7.6 months. The status of each tumor was evaluated based on modified RECIST criteria (23). A tumor was defined as progressive disease (PD) if there was a relative increase in tumor volume on follow-up MRI by greater than 20% compared to pretreatment MRI. Lesions in which volume increased less than 20% or decreased less than 30% of pretreatment were considered a stable disease (SD). The tumor, whose size fell more than 30%, but it was still visible on the follow-up MRI, was categorized as a partial response (PR). Any lesion which disappeared on the MRI was considered as complete repose (CR). We accepted only conservative management of cancer during the follow-up period to be included in the analysis. To enhance the predictive performance, we classified the LR into two groups as follows: response group (CR + PR) and non-response group (SD + PD). The LR data of the patients are presented separately for model building and model evaluation datasets in Table 1.
For the patients in the current study, we did not do either additional imaging study to delineate necrotic areas or took tissue samples for a histopathological examination. Instead, to minimize the volume measurement error due to the necrosis, we examined the available T1-weighted Gd-contrast enhanced MRI to identify necrosis by the existence of the edema around the enhanced lesion or clear hemorrhage inside the lesion, or by checking the patient's neurologic symptom. We did not see these indications among the patients and their tumors, which we used for the current study. Thus, our tumor volumes might contain necrosis or hemorrhage inside the volume unless it was present clearly outside of the tumor.

Radiomics Analysis
The process of the radiomics analysis is shown in Figure 1. The pixel values of the MRI data were rescaled by using the RescaleSlope and RescaleIntercept tags from the DICOM header as follows: Before calculating radiomics features, we applied the medium smooth filter to the rescaled image data. All treatment planning MRI images were analyzed to extract textural features from the GTVs contoured for the radiotherapy plans. The GTV was manually contoured for the radiosurgery treatment planning by radiation oncologists. The feature extraction was performed using IBEX software (24). It is noted that the tumors smaller than 4 mm diameter or volume of 33.5 mm 3 were excluded from further study because of its limited number of pixels available for the texture analysis. We used the following six different  (18). The resulting 740 features were considered in this study. When there was an option of 2.5D or 3D analysis for texture calculations, we selected 2.5D. The least absolute shrinkage and selection operator (LASSO) regression was performed in the MATLAB program (Mathworks, Natick, MA, USA) to select the suitable features for the prediction. The LASSO regression performs feature selection during model construction by penalizing the respective regression coefficients. As this penalty is increased, more regression coefficients shrink to zero, resulting in a more regularized model. The most significant predictive features were selected from among all the candidate features for the subsequent training session to build an ML-based prediction model.  Table 1. Tumors were randomly partitioned into a training set (55% tumors), a validation set (15% tumors), and a testing set (30% tumors). The predictive model for the classification was created with the training set and the validation set. The performance of the predictive model was evaluated by the testing set by calculating the accuracy and sensitivity of the prediction. The training-validation-testing processes were repeated five times for the five-fold crossvalidation. Then, a model that was the closest to the average accuracy of five-fold cross-validation was selected for the final evaluation. We performed the final assessment with the data in the model evaluation dataset (42 tumors of nine patients), as shown in Table 1. The predictive performance of the models was assessed using the area under the receiver operator characteristic (ROC) curve, AUC, as well as the accuracy and sensitivity.

Visual Evaluation
Goodman et al. classified the lesion characteristics into homogeneous, heterogeneous, or ring-enhancing by the pattern of enhancement (20). The uniform enhancement of the entire lesion was defined as homogeneous. If there were any areas of nonhomogeneous enhancement, it was defined as the heterogeneous. Additionally, if there was a rim or ring of contrast enhancement surrounding a central non-enhancing low-signal intensity area, it was identified as a ring-enhancing. In the current study, an experienced radiation oncologist classified the tumors into three types of patterns (homogeneous, heterogeneous, or ring-enhancing) by visually inspecting the MR images. The treatment outcome was predicted based on the image. Nieder et al. showed that important prognostic factors for complete remission were the small volume and no necrosis (25). Based on the wellaccepted knowledge (20,25), we assigned the predicted response of the homogeneous tumors to the response group (group I) and tumors with heterogeneous or ring enhancement to the nonresponse group (group II). We compared the visual evaluation method and the ML method using the data in the model building dataset (115 tumors of 42 patients).

RESULTS
First, a total of 740 radiomics features were extracted from the BM MRI images. Then, the number was reduced to seven features by using the LASSO regression method. Figure 3 shows the binomial deviation (a) and the coefficients (b) as a function of the tuning penalization parameter l for the LASSO linear regression. As l increased, only a few coefficients of 740 features remained non-zero, indicating only parameters important for an accurate model. The selected features were 45-7ClusterShade, 225-7ClusterShade, 45-7InformationMeasureCorr-1, 225-7InformationMeasureCorr-1, 90-4InformationMeasureCorr-2, 225-7Energy, and 315-5Energy. Table 3 shows the performance of the NN models. There were five models generated in the five-fold cross-validation step. Those models were evaluated with the training and testing datasets separately. The average accuracy of the five models was 0.80, with the training data. The model closest to the average accuracy was model 3. Hence, the final evaluation was performed with the model 3. The accuracy and sensitivity of the final model were 0.78 and 0.87 with the model evaluation dataset. Figure 4 shows the performance of the classifier according to the ROC metrics for the training and testing datasets. The AUC score was 0.89 for the training data and 0.82 for the testing data in the model training section. When we applied the selected model to the final evaluation of 42 tumors in the model evaluation dataset, we obtained the AUC score of 0.87. Table 4 compares the visual evaluation method and the NN prediction model by accuracy and sensitivity. The former method was applied to the 115 tumors used for the NN model training. The latter was applied to the testing data in the model training section, and the values in the table were the average of the five models. The results showed that the NN model was superior to the visual evaluation for accuracy and sensitivity.

DISCUSSION
Goodman et al. reported that the pattern of tumor images seen on the Gd-contrast enhanced T1-weighted MR images is valuable for predicting the response of a tumor to radiosurgery (20). The current study used radiomics features extracted from radiotherapy planning MRI (Gd-contrast enhanced T1weighted) to predict the local response (LR) by a machine learning (ML) method with a neural network (NN) classifier.  We compared the predictive performance of the NN model and the visual evaluation method. The accuracy of the new method using the radiomics features yielded a higher prediction accuracy (80%) than the visual approach. Thus, the ML method, such as NN, would be useful for predicting the response of the BMs to GKRS. The LASSO regression analysis resulted in seven radiomics features, which were useful for the classification, among 740 features initially included in the radiomics analysis. The selection of these features can be understood by the mathematical implication of those features. Cluster shade is a measure of the skewness of the matrix and is believed to gauge the perceptual concepts of uniformity. It may be correlated with lesion characteristics that are heterogeneous or ring-enhancing. Informational Measure of Correlation-1 and Measure of Correlation-1 assess the correlation between the probability distributions using mutual information, which means quantifying the complexity of the texture. Energy is a measure of the magnitude of voxel values in the image. The current study revealed that these were useful features for predicting the response of BMs to GKRS.
The prediction of the LR of BMs to SRS has important practical implications for patients and clinicians. Our prediction model could be useful in clinics. Although the current study created the prediction model of the LR for the radiosurgery, the same approach can be used for all of the treatment methods.
In this study, the comparison between our predictive model and the visual method was made to demonstrate the high predictive performance of the current approach. Goodman et al. (20) tried to identify the necrosis inside the tumor by classifying the tumor into three groups based on the enhancement pattern. However, there is no reliable technique to quantify the amount of necrosis only by visual examination. Hence, the visual classification method suffers from a large uncertainty. Consequently, we expect a large variation among observers for distinguishing three image patterns. Surely, we cannot exclude a potentially better performance of some observers than our method. But, the overall performance of our method should be better than the visual method. Our method does not classify the image pattern into only three types, but it uses more information of images for the decision making than the visual approach. Furthermore, the visual method is applied to only one transverse image since classifying the images into three patterns of the three-dimensional data is time-consuming and almost impossible. As a result, the method should be more accurate for outcome prediction.
There are several recent studies, in which radiomics features were used for more accurate distinction of necrosis from tumor progression and early detection of adverse radiation events (ARE) after radiotherapy of brain tumors (26)(27)(28)(29). We used only the GTV (Gd contrast enhanced area) for radiomics analysis in the current study. Suppose we extend the region-of-interest (ROI) by including the volume surrounding the Gd-contrast enhanced area or add other types of imaging data such as PET, for example. In that case, we might be able to predict brain injuries after GKRS. Such a study is interesting and can be undertaken in the future.
There are five limitations to the current study. First, the LR of BMs depends on the prescribed dose. For our GKRS treatment, we prescribed the dose based on the tumor size following the RTOG 90-05 protocol (5). Hence, the LR can be affected not only by the radiomics features but also by the prescribed dose. Secondarily, other clinical factors are statistically significant, but we did not consider in the current study. To improve the prediction performance, therefore, the radiomics features can be combined with the standard biomarkers. Thirdly, the present study used the radiomics features extracted from only radiotherapy planning MRI scans (Gd-contrast enhanced T1weighted). But, the prediction accuracy may improve by utilizing images taken by other imaging modalities. For example, Wu et al. combined the radiomics features of CT and FDG-PET for predicting distant metastasis in early-stage non-small cell lung cancer after stereotactic body radiation therapy (30). Ordering additional imaging studies other than standards requires additional funding and a special protocol, but it may be an  important step for more accurate predictions. Fourth, the current predictive model was built using only the metastatic brain tumors of patients with melanoma as the primary, mainly due to the availability of the treatment follow-up data. Lastly, only one experienced radiation oncologist classified the tumors into three MR image patterns for the visual evaluation. For a fair comparison of the ML-based method with the visual evaluation method, we need to recruit more experts to study the effects of inter-observer variation on the outcome prediction.
To overcome the first three limitations, we plan to improve the prediction model by adding radiomics features of other MR imaging protocols, dosimetric parameters such as prescribed dose and standard biomarkers. Extending the model to BMs with different primary cancer types is straightforward as long as the necessary data for model training are available. The versatile prediction model will be created by including multi-institution and other brain metastases patients. The uncertainty of the interobserver with visual evaluation is a serious problem. However, we believe that the prediction model proposed in the current study decreases the uncertainty with the visual evaluation.

CONCLUSION
The proposed NN model using the radiomics features of tumor image was more accurate than the visual evaluation method using the image pattern information in predicting the local response of brain metastases to GKRS. Because of the excellent prediction ability of the method, the method can be used to help physicians to gain a more accurate prediction of the treatment outcome than the traditional method.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by IRB 0801M23942 (University of Minnesota). The research consent requirement was waived because of the retrospective nature of the study.

AUTHOR CONTRIBUTIONS
YW conceived and designed the study. XT did radiomics analyses. CL and YW contributed to the data collection and analysis. DK made substantial contributions to the applications of machine learning techniques. DK and YW prepared the manuscript. All authors contributed to the article and approved the submitted version.