CT-Based Radiomics Signature With Machine Learning Predicts MYCN Amplification in Pediatric Abdominal Neuroblastoma

Purpose MYCN amplification plays a critical role in defining high-risk subgroup of patients with neuroblastoma. We aimed to develop and validate the CT-based machine learning models for predicting MYCN amplification in pediatric abdominal neuroblastoma. Methods A total of 172 patients with MYCN amplified (n = 47) and non-amplified (n = 125) were enrolled. The cohort was randomly stratified sampling into training and testing groups. Clinicopathological parameters and radiographic features were selected to construct the clinical predictive model. The regions of interest (ROIs) were segmented on three-phrase CT images to extract first-, second- and higher-order radiomics features. The ICCs, mRMR and LASSO methods were used for dimensionality reduction. The selected features from the training group were used to establish radiomics models using Logistic regression, Support Vector Machine (SVM), Bayes and Random Forest methods. The performance of four different radiomics models was evaluated according to the area under the receiver operator characteristic (ROC) curve (AUC), and then compared by Delong test. The nomogram incorporated of clinicopathological parameters, radiographic features and radiomics signature was developed through multivariate logistic regression. Finally, the predictive performance of the clinical model, radiomics models, and nomogram was evaluated in both training and testing groups. Results In total, 1,218 radiomics features were extracted from the ROIs on three-phrase CT images, and then 14 optimal features, including one original first-order feature and eight wavelet-transformed features and five LoG-transformed features, were identified and selected to construct the radiomics models. In the training group, the AUC of the Logistic, SVM, Bayes and Random Forest model was 0.940, 0.940, 0.780 and 0.927, respectively, and the corresponding AUC in the testing group was 0.909, 0.909, 0.729, 0.851, respectively. There was no significant difference among the Logistic, SVM and Random Forest model, but all better than the Bayes model (p <0.005). The predictive performance of the Logistic radiomics model based on three-phrase is similar to nomogram, but both better than the clinical model and radiomics model based on single venous phase. Conclusion The CT-based radiomics signature is able to predict MYCN amplification of pediatric abdominal NB with high accuracy based on SVM, Logistic and Random Forest classifiers, while Bayes classifier yields lower predictive performance. When combined with clinical and radiographic qualitative features, the clinics-radiomics nomogram can improve the performance of predicting MYCN amplification.


INTRODUCTION
Neuroblastoma (NB) is one of the most common solid malignancy in children originating from neural crest tissues along the sympathetic chains (1). NB can arise from various anatomical compartments (i.e., neck, chest, abdomen or pelvis), but most frequently arise from the abdomen (adrenal gland or extra-adrenal retroperitoneum), accounting for 73% of all systems (2). As a kind of heterogeneous tumor, the clinical outcome of abdominal NB varies from spontaneous regression to extensive systemic metastasis (3). For the pediatric patients with abdominal NB in advanced stage, the long-term survival rate is less than 50% regardless of the intensive treatment (4). Therefore, risk stratification is vital enough to choose the optimal therapy for individuals in the era of precision medicine (5). Among the various attempts from different international groups aimed to identify factors that can be used to risk stratification and to define an sub-population with poor clinical outcome (6), all groups highlight the significance of MYCN amplification status for defining high-risk group and consider that all patients with MYCN amplified are prone to relapse (2). Clinically, the amplification of MYCN oncogene is significantly correlated to an aggressive phenotype (7). Therefore, the detection of MYCN amplification status is critical to riskstratify patients. However, as an invasive method, traditional biopsy may cause various complications (8). Meanwhile, the availability of detection of MYCN has been hindered by the limited access to genetic testing methods in many institutions (9), therefore, an alternative non-invasive method is needed to characterize the MYCN amplification status availably.
In recent years, the increasing application of radiomics in solid tumors has resulted in the emergence of radiogenomics. The heart of radiogenomics is to identify and predict the expression of clinically significant molecular biomarkers of tumors by analyzing high-dimensional quantitative signatures extracted from tumor regions of interest (ROIs) in radiographic images (9,10). Compared with histopathology and genetic testing methods, radiogenomics not only can overcome sampling bias and the possible complications caused by biopsy, but also is expected to provide more comprehensive and accurate information in predicting the biomarkers (9). To date, the application of radiogenomics in pediatric tumors is mainly focused on MRI-based signatures of medulloblastoma, and the CT-based radiogenomics is rarely used (11,12). Although a recent study has shown the potential of CT-based signature in the prediction of MYCN amplification of NB and ganglioneuroblastoma (GNB) (8), there were some problems with the patients' selection, in which nonabdominal NB and GNB were also enrolled, because previous literatures have demonstrated that MYCN amplification rarely occurs in nonabdominal NB and GNB (13,14). Meanwhile, due to the heterogeneity of NB, the ROIs selectively delineated on several largest levels of the tumor cannot reflect the biological characteristics of the tumor comprehensively (15). Instead, the whole-tumor ROIs delineated on all slices in other radiomics studies have contributed to reduce sampling bias and improve intra-and inter-observer consistency (16,17).
In the present study, we developed and validated the CTbased radiomics features combined with various machine learning methods for predicting MYCN amplification of abdominal NB in the cohort of pediatric patients. Besides, we constructed a clinical model based on clinicopathological parameters and radiographic features, and then added the radiomics signature to develop radiomics-clinics model. The predictive performance of clinical model, radiomics model and radiomics-clinics model was finally evaluated and compared according to the AUC and Delong test.

Patients and MYCN Amplification Characterization
The Ethics Committee of our hospital approved this singlecenter retrospective study and waived the requirement for informed patient consent. We identified 172 abdominal NB patients with MYCN amplified (n = 47) and non-amplified (n = 125) by searching the medical record management system and radiology picture archiving and communication system (PACS) of our department from May 2012 to August 2020 consecutively according to the inclusion and exclusion criteria. Inclusion criteria were: (1) availability of abdominal contrastenhanced CT with sufficient image quality, including nonenhanced, arterial and venous phase; (2) patients without any radiotherapy, chemotherapy or surgical treatment before the first CT examination; (3) pathologically confirmed abdominal NB; (4) with the detection of MYCN status. Exclusion criteria were: (1) patients with ganglioneuroma or GNB; (2) patients with nonabdominal NB (e.g., neck, chest or pelvis); (3) abdominal NB patients absent of three-phrase CT scans; (4) insufficient image quality; (5) without the detection of MYCN status; (6) abdominal NB patients with prior treatments. The detailed workflow of patients' selection is shown in Figure 1.
The study cohort was randomly stratified sampling into training group and testing group in a proportion of 7:3. Clinicopathological parameters, including gender, age (month), histopathology, INSS stage, Shimada classification and urinary vanillylmandelic acid (VMA) were collected from medical records. According to the differentiation degree, the histopathological results were categorized into two groups: undifferentiated or poorly differentiated, and differentiated NB (8). The prognostic Shimada classification of patients was defined as favorable histology (FH) and unfavorable histology (UFH) on the basis of age, degree of differentiation and mitotic karyorrhectic index (MKI) of NB (18). The MYCN gene copy number was detected by fluorescence in situ hybridization (FISH) method in all specimens using a MYC-N/LAF double probe, and cases with the number of signals exceeding 10-fold MYCN copies were considered to be MYCN amplified (18). The intervals between the MYCN status detection and contrastenhanced CT scans of the same patient in the present study were less than a month.

CT Scanning
All CT scans were acquired during a single breath-hold in cooperative children or during quiet respiration in children unable to suspend respiration, and those who could not cooperate were sedate by oral administration of 10% chloral hydrate (0.5 ml/kg, body weight) before examination. All abdominal three-phase CT scans, including non-enhanced phase (NP), arterial phase (AP), and venous phase (VP), were performed on Lightspeed VCT 64-slice spiral CT (GE Healthcare, USA) scanner or Brilliance ICT 256-slice spiral CT (Philips, Netherlands) scanner. The CT scanning parameters were (1) tube voltage: 120 kV; (2) tube current: 200 mAs; (3) pitch: 0.984:1; (4) slice thickness: 5.0 mm; (5) slice interval: 5.0 mm; (6) reconstructed slice thickness: 1.25 mm. Nonionic iodinated contrast material (Omnipaque 300 mg I/mL or Visipaque 320 mg I/ml, GE Healthcare) was used. Contrast material (2 ml/kg, body weight) was injected into peripheral vein of the forearm with a power injector at a rate of 1-3 ml/s. AP and VP of post-contrast scanning were performed at 20-35 and 60-70 s respectively after contrast material administration.

Imaging Analysis
All CT examinations were transmitted to the workstation for review and analysis. All images were initially analyzed independently by two experienced pediatric radiologists without knowledge of the MYCN status. The tumor features, including calcification (present or not), infiltrating across midline (exceeding the contralateral edge of the spine, present or not) and necrosis (present or not), were recorded [8]. Disagreements were resolved by negotiation.

Clinical Model Building
Clinicopathological parameters of MYCN-amplified and nonamplified groups included gender, age, histopathology, INSS stage, Shimada classification and urinary vanillylmandelic acid (VMA) and radiographic features. Influence characteristics that were statistically significant with p<0.05 in the univariate logistic analyses were included in the multivariate analysis following the stepwise selection method. The Akaike information criterion (AIC) and Log-Likelihood were used as the stopping rules to select the most predictive clinical features.

Image Preprocessing and Tumor Segmentation
Before the tumor segmentation, isotropic voxel resampling into 1 mm × 1 mm × 1 mm with linear interpolation was used to image preprocessing for purpose of normalizing the geometry of CT images. The ROIs of whole-tumor were manually 3Ddelineated on three phrases respectively using a free opensource software package (ITK-SNAP, ver.3.4.0) by a pediatric radiologist with 2 years of experience, and the ROIs were then reviewed and confirmed by the other pediatric radiologist with 10 years of experience ( Figure 2). The ROIs included the calcification and necrosis area of the lesion. The ROIs segmentation of each tumor was performed twice by reader 1 (time-interval of 2 weeks) and once by reader 2. The intraobserver class correlation coefficients (ICCs) were calculated based on the features extracted from the ROIs delineated by reader 1 at different time points.

Radiomics Features Extraction and Selection
The images and corresponding ROIs were imported into the inhouse software (Artificial Intelligence Kit, AK, Version V3.2.2.R, GE Healthcare) together, and then features extraction was performed with AK software. The radiomics features were classified into seven groups including: first order, shape, gray-level co-occurrence matrix (GLCM), gray-level size-zone matrix (GLSZM), gray-level run-length matrix (GLRLM), neighborhood gray-tone difference matrix (NGTDM) and neighboring gray-level dependence matrix (GLDM). To enhance intricate patterns in the data invisible to the human eye, advanced filters, including Laplacian of Gaussian (LoG; sigma, 2.0 and 3.0 mm), and wavelet decompositions with all possible combinations of high (H) or low (L) pass filter in each of the three dimensions (HHH, HHL, HLH, LHH, LLL, LLH, LHL, HLL), were applied. A total of 3654 radiomic features of each patient (1218 features in each phase) were finally extracted from the ROIs based on NP, AP and VP.
Because many of the extracted high-dimensional features are often redundant and meaningless, a variety of methods were used for dimensionality reduction. To begin with, intra-observer analysis was used to assess the reliability and reproducibility of the features in order to find out the robust features. Features with ICCs higher than 0.80 were considered reliable and selected. Then, two feature selection methods, the maximum relevance minimum redundancy (mRMR) and the least absolute shrinkage and selection operator (LASSO) regression, were applied to eliminate the redundant and irrelevant features and choose the optimized subset of features to construct the radiomics models. Due to the CT scans in our study were performed on two scanners from different manufacturers (Lightspeed VCT and Brilliance ICT), the performance of radiomics features derived from two scanners was evaluated by ROC analysis and Delong test. Rad-score was calculated by summing the selected features weighted by their coefficients.

Machine Learning
The selected features from the training group were used to establish radiomics models based on three-phrase using Logistic regression, Support Vector Machine (SVM), Bayes and Random Forest. The performance of the developed radiomics models were then validated in both training and testing groups according to the area under the receiver operator characteristic (ROC) curve (AUC). The Delong test was used to compare the performance of four different machine learning models.

Nomogram Building and Evaluating
Finally, the radiomics signature was added to build the radiomics-clinics nomogram incorporated of statistically significant clinicopathological parameters and radiographic features on the basis of the results of multivariate logistic regression analysis in the training group. The predictive performance of the clinical model, radiomics models, and nomogram was evaluated according to the area under the receiver operator characteristic (ROC) curve (AUC) in both training and testing groups, and the Delong test was applied to compare the performance of different models.

Statistical Analysis
IPM statistics (IPMs, version 2.4.0, GE healthcare) and R programming language (ver. 3.4.2, http://www.r-project.org) were used to carry out statistical analysis. A chi square test or Fisher's exact test was used for the nominal variables, and a Mann-Whitney test was used for the continuous variables with abnormal distribution between the two cohorts. A two-tailed p <0.05 indicated statistical significance. "mRMRe" and "glmnet" packages were used to carry out the mRMR and LASSO respectively. The "pROC" package was used to perform Delong test and plot the ROC curves of each model. The "rms" package was used to carry out machine learning and build clinical-radiomics nomogram.

Patient Characteristics and Clinical Model Building
According to the inclusion and exclusion criteria, 172 patients were identified in the present study (47 patients with MYCN amplified and 125 patients with MYCN non-amplified). The patients were divided into training group (n = 121) and testing group (n = 51) randomly in a proportion of 7:3, and the characteristics of patients are detailed in Table 1. The meaningful

Feature Selection and Machine Learning
A total of 1,218 radiomics features were automatically extracted for each segmented ROI (NP, AP and VP). 734 features were firstly selected with ICCs higher than 0.80 by intra-observer analysis. Before selection of the 734 features, the abnormal or missing values were replaced by the median, and features standardization was applied. And then, mRMR and LASSO were used to select the most optimal features. After the redundant and irrelevant features were removed by mRMR, 30 features from AP, NP and VP were retained. Then LASSO was conducted to identify the final 14 optimal features, including one first-order feature and eight wavelet-transformed features and five LoG-transformed features, to construct the radiomics models. The LASSO includes choosing the regular parameter l and determining the number of the feature (Figure 3). After the number of features determined, the most predictive subset of features was chosen and the corresponding coefficients were calculated ( Figure 4). The comparison of radiomics signatures derived from two scanners is shown in Supplementary Figure 1 and Table 1, and the performance of the signatures from two scanners was different. Rad-score was calculated by summing the selected features weighted by their coefficients. The final formula of rad-score is showed in Supplementary Figure 2.
The ROC curves of the four machine learning models in the training and testing groups are shown in Figure 5. In the training group, the AUC among the Logistic, SVM, Bayes and Random Forest was 0.940, 0.940, 0.780 and 0.927, respectively, and the corresponding AUC in the testing group was 0.909, 0.909, 0.729, 0.851, respectively. The Delong test was applied to compare the performance of the four models. There was no significant difference among the Logistic, SVM and Random Forest model (Logistics vs SVM: p = 0.99, Logistic vs Random Forest: p = 0.33, SVM vs Random Forest: p = 0.33), but all better than the Bayes model (p <0.005) ( Table 3).   Figure 6) and the calculated radscore. The ROC analysis of the clinical model, radiomics model, and nomogram is illustrated in Figure 7 and the comparison of different models is shown in Table 4. The nomogram had a superior predictive performance than using the clinical model alone, accompanied with an improved AUC value from 0.770 to 0.946 in the training group and 0.917 to 0.977 in the testing group. The performance of the Logistic radiomics model based on three-phrase is similar to nomogram, but both better than clinical model and radiomics model based on single venous phase ( Table 4).

DISCUSSION
MYCN amplification status plays a significant role in risk classification of NBs, and NBs with MYCN amplified are usually classified into the high-risk group, where the patients need intensive treatment of operation, radiotherapy and chemotherapy (19). In addition to genetic testing method, radiogenomics, which focusing on establishing the correlation between imaging features and molecular biomarkers, is expected to provide an alternative method to characterize and predict the MYCN amplification status of neuroblastoma noninvasively and inexpensively (8,20). Previous studies on radiogenomics have demonstrated its potential to predict mutated genes in the solid   tumors (11,12,20,21). Among adult tumors, CT-based radiogenomics has been widely studied in lung, kidney and liver neoplasms (22)(23)(24). However, there have been a few reports on CT radiogenomics in pediatric tumors (8). In this study, data from clinicopathologic parameters (Shimada classification, VMA) and radiographic features (infiltrating across midline, calcification and radiomics features) were selected to develop predictive models for the MYCN amplification of pediatric abdominal NB. Compared to the other study of CT radiogenomics in pediatric NB and GNB (8), we only enrolled the pediatric patients with abdominal NB, because MYCN amplification mostly occurs in abdominal NB (13,14). Meanwhile, we delineated the whole-tumor ROIs on all slices for the purpose of improving intra-and inter-observer consistency. In addition to first-order and textural features, higher-order features transformed by wavelet and LoG were also extracted to further evaluate the optimal radiomics features correlating with MYCN amplification. Moreover, we also compared the performance of radiomics models developed by four common machine learning methods.
In the present study, quantitative radiomics features, derived from CT images of the whole-tumor ROIs on three-phrase, were extracted and selected by using ICCs, mRMR and LASSO methods. mRMR refers to Maximum Relevance and Minimum Redundancy, which is used to select the optimal features that are most relevant to the classification task but least redundant to each other. mRMR is an algorithm based on mutual information, similar to the Maximum Dependency algorithm. However, unlike Maximum Dependency algorithm, which is not applicable in the case of large number of features, mRMR is especially suitable for high-dimensional data space (25). After the redundant and irrelevant features were removed by mRMR, LASSO regression model was used to prevent overfitting of the selected radiomics features. The main advantage of LASSO method is that it does not compress the variable with larger parameter estimation, while the variable with smaller parameter estimation is compressed exactly to zero. The complexity of the model is controlled through a series of parameters, so as to avoid overfitting. Moreover, the parameter estimation of LASSO analysis has continuity, which is suitable for the model selection of high-dimensional data (26).
Finally, 14 features from three phases were identified as the most predictive subset of feature to construct the radiomics model, including one original first-order feature and eight wavelet-transformed features and five LoG-transformed features. Among the final selected features, the higher-order features filtered by wavelet and LoG filters were obviously superior to the original first-order and textural features. Chen et al. (16) investigated the role of CT-based radiomics to differentiate pelvic rhabdomyosarcoma from yolk sac tumors in children. Among the 10 features selected in their radiomics model based on each phrase, most of the selected features were wavelet-transformed features. Wavelet and LoG are both higherorder statistical methods imposing filter grids on the images, and could possibly reflect more information about vascularity and spiculation of the lesion (27). The principle of wavelet is to put a matrix of linear or radial "waves" on images, while LoG is mostly used to extract features from areas with coarse textural pattern (27). Besides, we evaluated the performance of radiomics features from two scanners, and the results showed that the performance of two signatures is different. One reason for this difference may be that radiomics features are correlated with different scanners from different manufactures, and the other reason may be that the sample size of patients scanned on Lightspeed VCT (GE Healthcare) was relatively too small. The results of our study showed that radiomics models based on NP, AP and VP images can predict MYCN amplification in pediatric abdominal NB, while the performance of different machine learning radiomics models varies. The AUC in the training group among the Logistic, SVM, Bayes and Random Forest was 0.940, 0.940, 0.780 and 0.927, respectively, and the corresponding AUC in the testing group was 0.909, 0.909, 0.729, 0.851, respectively. The Logistic and SVM models have the best predictive performance with the same value of AUC. According to Delong test, there was no significant difference among the Logistic, SVM and Random Forest model, but all better than the Bayes model. In previous studies, researchers mostly chose one classifier to build radiomics model, and there is no consensus on the best-performing classifier method. Deist et al. (28) compared the performance of different classifiers (decision tree, random forest, neural network, support vector machine, elastic net logistic regression, LogitBoost) in predicting radiotherapy outcomes. In their study, Random forest and elastic net logistic regression performed better than other classifiers. Machine learning classifiers can be used to identify the best combination of radiomics features, while different algorithms have different advantages and disadvantages (29). Therefore, we should choose the optimal machine learning method with overall maximal predictive performance according to the specific clinical application.
In addition, we constructed a nomogram combining clinical parameters, imaging features and radscore. The variables including Shimada classification, VMA, infiltrating across midline and calcification were selected to build the nomogram with the most predictive signatures of radiomics. In this study, UFH was found to show significant correlation with MYCN  amplification. This finding supported the previous study which also found that MYCN-amplified NBs were mostly categorized as UFH group (18). Besides, we found that the majority of NBs infiltrating across midline were MYCN-amplified, which is consistent with the finding of Wu et al. (8), but calcification in NB was found to be related to MYCN amplification. The nomogram had a superior predictive performance than using the clinical model alone, accompanied with an improved AUC value from 0.770 to 0.946 in the training group and 0.917 to 0.977 in the testing group. Besides, we found that the radiomics features used to construct the radiomics models were mostly derived from the NP and VP, so we developed Logistic radiomics model based on single VP. Then, we further evaluated and compared the predictive performance of the nomogram, Logistic radiomics VP model and Logistic radiomics threephrase model. Although compared with radiomics model, nomogram did not significantly improve the prediction of MYCN amplification, they were both better than clinical model and radiomics model based on single venous phase, which demonstrated that the radiomics features are useful for predicting MYCN amplification and radiogenomics is expected to be involved in risk stratification in NB patients. Despite our study showed that CT-based radiomics has the potential to predict MYCN amplification in pediatric abdominal NB, there were some limitations. First, this was a retrospective study, which may cause inherent selection bias, especially for those valuable absent clinical indicators that could potentially improve the performance of clinics-radiomics nomogram. Second, we only enrolled 172 patients in the present study because MYCN status has begun to be detected in recent years in our hospital. Previous literatures have shown that MYCN amplification usually occurs in about 20% of neuroblastoma. As a tertiary referral medical center, we have accumulated a certain number of MYCN-amplified cases over the past several years, and the inclusion of more cases will take some time in the future. Third, the CT scans of enrolled patients were performed on two scanners from different manufacturers in the study, from which the derived features have a certain influence on the predictive performance of radiomics models. Fourth, we only choose four common machine learning methods to build radiomics models, and the performance of other classifiers still needs to be evaluated.

CONCLUSIONS
In conclusion, the CT-based radiomics signature is able to predict MYCN amplification of pediatric abdominal NB with high accuracy based on SVM, Logistic and Random Forest classifiers, while Bayes classifier yields lower predictive performance. Thus, one of these three machine learning methods should be the first consideration for researchers to construct predictive models for MYCN amplification of abdominal NB. When combined with clinical and radiographic qualitative features, the clinics-radiomics nomogram can improve the performance of predicting MYCN amplification.
With the development of tumor molecular stratification, radiogenomics is expected to provide a promising method to characterize and predict molecular biomarkers noninvasively.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Ethics Committee of the Children's Hospital Affiliated with Chongqing Medical University. Written informed consent from the participants' legal guardian/next of kin was not required to participate in this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
XC, HW, and LH contributed to conception and design of the study. KH, HD, LZ, TZ, and WY organized the database. HL performed the statistical analysis. HW wrote the first draft of the manuscript. XC and LH reviewed the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.