Development and evaluation clinical-radiomics analysis based on T1-weighted imaging for diagnosing neonatal acute bilirubin encephalopathy

Purpose To investigate the value of clinical-radiomics analysis based on T1-weighted imaging (T1WI) for predicting acute bilirubin encephalopathy (ABE) in neonates. Methods In this retrospective study, sixty-one neonates with clinically confirmed ABE and 50 healthy control neonates were recruited between October 2014 and March 2019. Two radiologists' visual diagnoses for all subjects were independently based on T1WI. Eleven clinical and 216 radiomics features were obtained and analyzed. Seventy percent of samples were randomly selected as the training group and were used to establish a clinical-radiomics model to predict ABE; the remaining samples were used to validate the performance of the models. The discrimination performance was assessed by receiver operating characteristic (ROC) curve analysis. Results Seventy-eight neonates were selected for training (median age, 9 days; interquartile range, 7–20 days; 49 males) and 33 neonates for validation (median age, 10 days; interquartile range, 6–13 days; 24 males). Two clinical features and ten radiomics features were finally selected to construct the clinical-radiomics model. In the training group, the area under the ROC curve (AUC) was 0.90 (sensitivity: 0.814; specificity: 0.914); in the validation group, the AUC was 0.93 (sensitivity: 0.944; specificity: 0.800). The AUCs of two radiologists' and the radiologists' final visual diagnosis results based on T1WI were 0.57, 0.63, and 0.66, respectively. The discriminative performance of the clinical-radiomics model in the training and validation groups was increased compared to the radiologists' visual diagnosis (P < 0.001). Conclusions A combined clinical-radiomics model based on T1WI has the potential to predict ABE. The application of the nomogram could potentially provide a visualized and precise clinical support tool.

Methods: In this retrospective study, sixty-one neonates with clinically confirmed ABE and healthy control neonates were recruited between October and March . Two radiologists' visual diagnoses for all subjects were independently based on T WI. Eleven clinical and radiomics features were obtained and analyzed. Seventy percent of samples were randomly selected as the training group and were used to establish a clinical-radiomics model to predict ABE; the remaining samples were used to validate the performance of the models. The discrimination performance was assessed by receiver operating characteristic (ROC) curve analysis.
Results: Seventy-eight neonates were selected for training (median age, days; interquartile range, -days; males) and neonates for validation (median age, days; interquartile range, -days; males). Two clinical features and ten radiomics features were finally selected to construct the clinical-radiomics model. In the training group, the area under the ROC curve (AUC) was .
; specificity: . ). The AUCs of two radiologists' and the radiologists' final visual diagnosis results based on T WI were . , . , and . , respectively. The discriminative performance of the clinical-radiomics model in the training and validation groups was increased compared to the radiologists' visual diagnosis (P < . ).

Introduction
Neonatal hyperbilirubinemia, characterized by jaundice, is a common and benign disease in neonates, but is the main cause of hospitalization in the first week after birth (1); neonatal jaundice is the seventh and the ninth cause of neonatal death in the early neonatal period (0-6 d) and late neonatal period (7-27 d), respectively (2,3). Even though most hyperbilirubinemia . /fneur. .
patients have a favorable prognosis, some severe cases may cause neurotoxicity and lead to neonatal acute bilirubin encephalopathy (ABE) (4) with a risk of neonatal mortality and life-long neurodevelopmental handicaps (5,6). Hyperintensity of the bilateral globus pallidus (GP) on T1weighted imaging (T1WI) is considered to be a characteristic imaging manifestation of ABE (7,8). However, the presence of myelin in these structures has also revealed hyperintensity on T1WI, which is easily confused with the changes caused by brain damage in ABE (9). Furthermore, hyperintensity on T1WI of GPs in neonates with ABE may be transient and subtle (7); the lack of objectivity and accuracy necessitates the need to determine GP signals by a radiologist's naked eye only, which may affect the early diagnosis and treatment of ABE, especially for those neonates who may have had early brain damage while the signal of GPs on T1WI has not increased. Therefore, conventional T1WI is insufficient to meet diagnostic needs. Radiomics is an emerging field that translates medical images into quantitative data (10,11). The internal texture information of the GPs can be mined deeply by extracting highthroughput features, and the biological information can be displayed objectively and quantitatively.
The purpose of this study was to assess the value of a clinicalradiomics model based on T1WI for predicting ABE in neonates and to compare its diagnostic value with experienced radiologists' visual diagnosis results, providing a new tool for the early diagnosis and individualized monitoring of ABE.

Patients
This retrospective study was approved by the hospital ethics committee and the requirement for informed consent was waived because of the retrospective nature of the study. One hundred and eleven neonates were recruited between October 2014 and March 2019. Figure 1 shows the pathway of ABE and healthy neonate inclusion and exclusion. Clinical features including age, gender, weight, gestational week, pregnancy history of maternal, meconium-stained amniotic fluid, premature rupture of membranes, singleton or multiple-birth pregnancy, and type of delivery were collected from all subjects. In order to construct a stable and generalization model, all subjects (n = 111) were randomly divided into a training group (n = 78) and a validation group (n = 33) according to the ratio of 7:3. The training group was used to construct models that were verified by the validation group.

MRI acquisition
All subjects underwent MRI examination on a GE Signa HDxT 1.5T MRI scanner with a dedicated eight-channel head and neck unite coil. Ten percent chloral hydrate (0.5 mL/kg) was administered via anal enema for sedation 30 min before examination. MRI

Radiologists' visual diagnosis
Radiologist 1 (JY) and radiologist 2 (JW), with 15 and 20 years of experience in neonatal radiology, respectively, independently provided what they thought was the most likely diagnosis for all subjects (n = 111) based on T1WI. For the subjects with controversial diagnoses from radiologists 1 and 2, a review was conducted by radiologist 3 with 25 years of experience (YM) to determine the radiologists' final visual diagnosis. Radiologists were blinded to all clinical and diagnosis information except age at the time of MRI scanning. Images were displayed in a random sequence.

Segmentation and feature extraction
MRI data were transferred to a personal computer and processed based on 3D slicer software (http://www.slicer.org). The regions of interest were manually outlined along the boundary of the bilateral GPs on the slices with the largest area of bilateral GPs and its adjacent upper and lower slices on T1WI. Then, a volume of interest was automatically generated on the software. One hundred and six features of each GP (212 features of bilateral GPs) for each neonate were extracted by 3D slicer software. All radiomics features are summarized in Supplementary material E1.

Feature selection and prediction model building Clinical feature and model
The clinical features in the training and validation group were compared using an independent samples t-test (for normally distributed data) or Mann-Whitney U-test (for non-normally distributed data) for continuous variables and a chi-squared test for categorical variables.
Features with P < 0.05 in the univariate analysis were selected and fed into the multivariate logistic regression analysis using a backward stepwise elimination method based on Akaike's information criterion. The clinical model was established by applying multivariate logistic regression.

Radiomics feature and model
The dimensionality reduction of radiomics features was performed. First, intraclass correlation coefficients were used to assess the intra-observer repeatability of radiomics features. The image segmentation of all cases was performed by observer one with 15 years of experience (JY), and then observer two with 10 years of experience (YL), and used the same method to segment the 30 subjects who were selected from the whole sample based on stratified sampling to reduce possible bias and ensure the reliability of segmentation and feature extraction. Observers . /fneur. .

FIGURE
Flow diagrams show the pathway of ABE and healthy neonate inclusion and exclusion. ABE, acute bilirubin encephalopathy; TSB, total serum bilirubin; BIND, bilirubin-induced neurologic dysfunction.

FIGURE
The workflow of clinical-radiomics analysis in the current study. VOI, volume of interest; ICC, intraclass correlation coe cients; mRMR, maximal relevance and minimal redundancy; Lasso, least absolute shrinkage and selection operator; ROC, receiver operating characteristic; DCA, decision curve analysis.
were blinded to clinical and group information. Features with intraclass correlation coefficients > 0.90 were retained, which indicated high robustness and satisfactory reproducibility. Second, maximal relevance and minimal redundancy were performed to eliminate the redundant and irrelevant features, and 30 features were retained. Then, the least absolute shrinkage and selection operator algorithm was performed to identify the most valuable features. Finally, the radiomics score (rad-score) was . /fneur. .

Clinical-radiomics model and nomogram
The combined clinical-radiomics model was established to predict the risk of ABE using multivariate logistic regression analysis based on the selected clinical features and radiomics features. To ensure the easy use of the model, the combined clinical-radiomics model was further visualized as a nomogram.

Evaluation
The discrimination performances of the radiologists' visual diagnosis, clinical model, radiomics model, and clinical-radiomics model were accessed by using receiver operating characteristic (ROC) curve analysis. The area under the ROC curve (AUC), accuracy, sensitivity, and specificity were calculated. DeLong's test was used to compare the statistical differences between the AUCs of the training and validation groups. The calibration performance of the combined model was tested by using the calibration curves accompanied by the Hosmer-Lemeshow test; P > 0.05 were considered good fitness. The clinical benefit for clinical application of the combined model was assessed by using decision curve analysis.
The statistical analyses were performed using R software (v. 3.5.3, http://www.R-project.org), and two-sided P < 0.05 were considered to be statistically significant. The workflow is shown in Figure 2.

Clinical feature and model
Sixty-one neonates with ABE (median age, 9 days; interquartile range, 7-10 days; 35 male) and 50 healthy neonates (median age, 11.5 days; interquartile range, 6-19.75 days; 38 male) were included for the training and validation groups. Demographic and clinical features data are provided in Table 1.
The clinical features were further analyzed by univariate and multivariate logistic regression analysis in the training group (Table 2). In the univariate analysis, gestational week, meconiumstained amniotic fluid, and type of delivery (P < 0.05) were included in the multivariate analysis. Multivariate logistic regression analysis indicated that gestational week and type of delivery (P < 0.05) were .

Radiomics features and model
One hundred features of left GP and 106 features of right GP were considered stable with intra-observer stability (intraclass correlation coefficients ranges: 0.901-1.000 and 0.941-1.000, respectively). These features measured by observer one were selected for subsequent analysis.
There remained ten radiomics features after dimensionality reduction, and the coefficients of these features are presented in Figure E1; these were selected to calculate the corresponding radscore:

Clinical-radiomics model and nomogram construction
Combined with the clinical features and the rad-score, the clinical-radiomics model was established. Meanwhile, the nomogram was established based on the clinical-radiomics model to individually estimate the risk of ABE for each neonate (Figure 3).

Model performance evaluation
The ROC curves and discriminative performance of the clinical model, radiomics model, and clinical-radiomics model are shown in Figures 4A, B  The calibration curve results showed good consistency between the nomogram-predicted probability of ABE and the actual ABE observed in the training group and validation group (Hosmer-Lemeshow test; P = 0.137 and 0.362, respectively) ( Figures 5A, B). The decision curve analysis applied to predict ABE is shown in Figure 6.

Radiologists' visual diagnosis
Radiologists' information is shown in Table Figure E2.
DeLong's test showed that the discriminative performance of the clinical model and clinical-radiomics model in the training and validation groups was increased compared to radiologists' visual diagnosis (Table 4).

Discussion
Conventional MRI lacks quantitative indicators for brain damage in ABE, and it is impossible to objectively and comprehensively assess the risk of ABE, especially for neonates who may have brain damage when their GPs do not show hyperintensity on T1WI. In this study, we developed and validated a clinical-radiomics model based on T1WI for individualized prediction of ABE, which demonstrated good discrimination, calibration, and clinical benefit. It is worth noting that the discriminative performance of the clinical model and clinical-radiomics model in the training and validation groups was increased compared to that of radiologists with rich experience in pediatric radiology diagnosis. These results indicated that it is difficult to identify the ABE early based on T1WI by the naked eye. Moreover, the sensitivity of radiologists' visual diagnosis was low; however, the models were effectively improved. The nomogram was easy to use and may facilitate personalized risk stratification and further treatment decision-making for neonates with ABE, which could potentially provide a visualized and precise clinical support tool.
ABE is brain damage caused by unconjugated bilirubin passing through the blood-brain barrier (14). The neurotoxicity of unconjugated bilirubin is highly selective, implicating GPs, .

FIGURE
The nomogram based on the clinical-radiomics model. Including gestational week, type of delivery, and rad-score. For each neonate, the total points were calculated by adding up the points of each variable and translated into the risk of ABE. Type of delivery represents spontaneous labor and represents cesarean delivery. the substantia nigra, reticulata, subthalamic nuclei, brain stem, auditory, vestibular, and oculomotor nuclei, the hippocampus, and cerebellum (15); it particularly implicates GPs, which are related to its active metabolism (16). The mechanism of unconjugated bilirubininduced neuron damage has not been fully elucidated. Existing hypotheses include excitotoxicity hypotheses (17), bilirubin-induced neuroinflammation (18,19), and oxidative stress mechanisms (20). The destruction of bilirubin on neurons is recoverable in the early stages; consequently, early identification of ABE risk factors and intervention is an important method to prevent ABE, reduce the sequelae, and improve the prognosis. Gestational week and the type of delivery were considered independent risk factors of ABE and they were incorporated into the clinical model. For neonates, the earlier the gestational week, especially gestational weeks earlier than 37 weeks, the higher the risk of ABE. This was due to the lower liver enzyme activity in premature infants, which affected the combination of human serum albumin and bilirubin, and premature infants were often treated with antibiotics, which destroyed the intestinal microecological environment. These factors caused bilirubin accumulation and led to ABE (21). The type of delivery also affected the risk of ABE; cesarean births were more likely to develop ABE, and this may be related to anesthesia (22,23). In the radiomics model, the selected features were composed of the following categories: one first-order feature (Left_Skewness), two shape features (Right_SurfaceVolumeRatio and Left_Maximum2DDiameterRow), and seven texture features (Right_ . /fneur. .   RunEntropy, Left_Correlation, Right_SmallAreaLowGrayEmphasis, Right_Imc1, Right_Correlation, Left_GrayLevelNonUniformity_2, and Left_Idmn). Smaller skewness reflected the left deviation of the image, and more voxels were high intensity. Macroscopically, the higher intensity of left globus pallidus increased the risk of ABE. Smaller Left_Maximum2DDiameterRow and larger Right_SurfaceVolumeRatio were correlated closely with the risk of ABE. We speculated this was due to the neurotoxicity of unconjugated bilirubin increasing the influx of calcium ions, stimulating the activity of proteolytic enzymes, and leading to apoptosis (20, 24), changing the shape and volume of GP. The texture features described the distribution of voxel intensity and the spatial relationship between local adjacent voxels, which were a comprehensive reflection of the intrinsic properties of the image, and they could quantify the complexity of the texture of bilateral GPs in neonatal with ABE. However, associating a single texture feature with complex biological processes remains a challenge. These texture features included in the radiomics model may reflect the heterogeneity of bilateral GPs in neonatal with ABE. Liu et al. (25) also established models to distinguish between ABE and healthy neonates based on T1WI, and the best model obtained good diagnostic efficiency with an AUC of 0.946. Wu et al. (26) integrated . /fneur. . multimodal MRI with deep-learning approaches to diagnose ABE, with the combination of three modalities, T1WI, T2WI, and diffusion-weighted imaging, with an AUC of 0.991 ± 0.007. However, these studies did not consider clinical factors. The most important thing is that they did not set up independent validation sets to verify the established models. Our study combined clinical and radiomics models and achieved a good prediction performance; the independent validation group verified the established model, and the risk of overfitting was effectively avoided. Our study had several limitations. First, this was a retrospective study and potential bias may have been introduced in the selection of research subjects; some clinical data such as albumin and blood type were not considered. Furthermore, this batch of data was scanned using the same equipment, the sample size was small, there was a long inclusion period, and a lack of external validation; therefore, a larger cohort of prospective studies based on multicenter data is needed to verify the performance of the clinical-radiomics model. Lastly, the regions of interest in this study were manually placed, which may inevitably lead to certain errors. We have started to develop an automatic segmentation tool for GPs, which can segment the GPs more accurately and robustly in the future.

Conclusions
In conclusion, we developed and validated a combined clinicalradiomics model based on T1WI to predict ABE. The application of the nomogram could potentially provide a visualized and precise clinical support tool.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by the First Affiliated Hospital of Dalian Medical University Ethics Committee. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.