A radiomics model based on preoperative gadoxetic acid–enhanced magnetic resonance imaging for predicting post-hepatectomy liver failure in patients with hepatocellular carcinoma

Background Post-hepatectomy liver failure (PHLF) is a fatal complication after liver resection in patients with hepatocellular carcinoma (HCC). It is of clinical importance to estimate the risk of PHLF preoperatively. Aims This study aimed to develop and validate a prediction model based on preoperative gadoxetic acid–enhanced magnetic resonance imaging to estimate the risk of PHLF in patients with HCC. Methods A total of 276 patients were retrospectively included and randomly divided into training and test cohorts (194:82). Clinicopathological variables were assessed to identify significant indicators for PHLF prediction. Radiomics features were extracted from the normal liver parenchyma at the hepatobiliary phase and the reproducible, robust and non-redundant ones were filtered for modeling. Prediction models were developed using clinicopathological variables (Clin-model), radiomics features (Rad-model), and their combination. Results The PHLF incidence rate was 24% in the whole cohort. The combined model, consisting of albumin–bilirubin (ALBI) score, indocyanine green retention test at 15 min (ICG-R15), and Rad-score (derived from 16 radiomics features) outperformed the Clin-model and the Rad-model. It yielded an area under the receiver operating characteristic curve (AUC) of 0.84 (95% confidence interval (CI): 0.77–0.90) in the training cohort and 0.82 (95% CI: 0.72–0.91) in the test cohort. The model demonstrated a good consistency by the Hosmer–Lemeshow test and the calibration curve. The combined model was visualized as a nomogram for estimating individual risk of PHLF. Conclusion A model combining clinicopathological risk factors and radiomics signature can be applied to identify patients with high risk of PHLF and serve as a decision aid when planning surgery treatment in patients with HCC.


Introduction
Liver resection remains the mainstay modality in the treatment of hepatocellular carcinoma (HCC) with a curative intent. With the advances of surgical techniques and perioperative management in recent years, cases of extended or complex liver resection are increasing (1), which makes it increasingly important to make individual evaluations to avoid insufficient remnant liver volumes and impaired liver function after the surgery, the so-called posthepatectomy liver failure (PHLF). At present, PHLF poses a fatal threat after liver resection and is the prominent cause of perioperative mortality (2), with a reported incidence as high as 40% (3).
Precise evaluation of liver function makes it possible to predict PHLF preoperatively. Previous studies have explored blood biochemistry tests, indocyanine green (ICG) test (4), and clinical scoring systems such as Child-Pugh score (5) and the Model for End-Stage Liver Disease (MELD) score (6) and computed tomography (CT)-based remnant liver volume (7) in the prediction of PHLF. However, the overall performance of these factors has been suboptimal. A more accurate, non-invasive approach for comprehensive liver function evaluation is urgently needed.
Gadoxetic acid (Primovist®) is a T1 magnetic resonance imaging (MRI) contrast medium widely used in clinical practice for liver lesion detection and characterization. Compared with the extracellular contrast media, it is actively taken up by hepatocyte at 10-40 min after administration (the so-called hepatobiliary phase) (8). Recent studies have shown that gadoxetic acid-enhanced MRI is promising in quantitative evaluation of liver function (9,10). Classically, the methods used are based on the measurement of signal intensity (for instance relative liver enhancement or liver-tomuscle ratio or liver-to-spleen ratio), T1 relaxometry (e.g., T1 reduction rate), or dynamic contrast-enhanced MRI parameters (including hepatic extraction fraction) (11). These gadoxetic acidenhanced MRI-derived parameters have shown a good correlation with ICG test and clinical scoring systems (Child-Pugh grades and MELD score), indicating a potential value in prediction of PHLF (10,12,13). However, when measuring signal intensity or T1 relaxation time, regions of interest (ROIs) with a limited diameter are most often placed in a single selected slice, which may not fully represent the whole liver function. Furthermore, the placement of the ROI is subjective, potentially reducing the reproducibility. In addition, the measurement of T1 relaxation time or dynamic contrast-enhanced MRI often requires additional scanning sequences (14).
Radiomics is a burgeoning technique, which can extract a great number of features from clinical routine medical imaging and transform them into mineable data for quantitative analysis (15). The basic assumption of radiomics is that the delicate pathophysiological alterations at cellular or molecular levels can be reflected by signal changes on images. The quantification of these imaging features and analyzing them through advanced algorithms or deep learning techniques can aid the clinician to solve clinical issues, such as disease diagnosis, prognosis, or prediction of treatment response. In the field of hepatobiliary imaging, previous studies have demonstrated that radiomics can significantly improve diagnostic and prognostic accuracy in HCC, such as the prediction of microvascular invasion (15), tumor differentiation (16), and early recurrence after hepatectomy (17).
In this study, it was tested whether radiomics analysis of gadoxetic-enhanced MR images can be used to predict PHLF in patients undergoing surgery due to HCC. The hypothesis was that radiomics analysis can detect delicate imaging features reflecting varying levels of liver function.

Study design and patient selection
The research protocol of this single-center, retrospective study was reviewed and approved by the Institutional Review Board of Southwest Hospital, Army Medical University (No. (B) KY2021068). Written informed consent was waived due to the retrospective property of this study.
Consecutive patients undergoing hepatectomy during the period between January 2017 and March 2019 were retrieved according to the following inclusion and exclusion criteria. The inclusion criteria were 1) histopathologically confirmed HCC by resected specimen and 2) preoperative gadoxetic acid-enhanced MRI within 4 weeks before hepatectomy. The exclusion criteria were 1) anti-cancer treatment before hepatectomy, including radiofrequency ablation, hepatectomy, transarterial chemoembolization, portal vein embolization, targeted therapy, and immunotherapy, and 2) insufficient imaging quality (such as motion artifacts). In final, 276 patients were included in this study, and they were randomly divided into training and test cohorts at a ratio of 7:3, in which the training cohort was exclusively used for model development, while test cohort was used for to validate the performance of the model. Figure 1 gives more details about this process.
The reporting of this study followed the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) guidance (18). The CLAIM checklist is provided at Supplementary Table S1. The process of model development is illustrated in Figure 2.

Definition of PHLF
PHLF was defined according to the International Study Group of Liver Surgery (ISGLS) standard: an increased international normalized ratio (INR) and hyperbilirubinemia (above the normal range of the local laboratory) on postoperative day 5 or afterwards (21). According to this definition, the patients were grouped into PHLF group and non-PHLF group.

MR imaging acquisition
All patients underwent preoperative gadoxetic acid-enhanced MRI on a scanner (3.0 T, Magnetom Trio, Siemens Healthcare) with a six-channel body coil. Dynamic contrast-enhanced images were acquired using T1-weighted 3D volume interpolated breath hold sequence before, at the time of aorta enhancement, and 60 s, 180 s, 5 min, and 15 min after administration of contrast media. Gadoxetic acid (Primovist, Bayer Pharma, Berlin, Germany, 0.1 ml/ kg body weight) was injected through an antecubital vein at a rate of 1.0 ml/s followed by a flush of saline at the same rate. Hepatobiliary phase was obtained at 15 min after injection. Detailed scanning parameters are provided in Supplementary Table S2.

Delineation of normal liver tissue and inter-observer agreement evaluation
Delineation of normal liver tissue (exclusion of blood vessels, bile ducts, and cyst areas) was performed on images obtained in the hepatobiliary phase using the open-source software ITK-SNAP (http://www.itksnap.org/). Initially, 30 MR images were randomly selected for volume of interest (VOI) delineation by two researchers (with 2 years and 20 years of liver MRI experience, respectively) independently to evaluate reproducibility and stability of the extracted radiomics features. The inter-observer agreement was measured by interclass coefficient (ICC) on the VOI-based feature FIGURE 1 Flowchart of patient selection. Imaging preprocessing and radiomics feature extraction Before feature extraction, all images were interpolated to a voxel size of 1 × 1 × 1 mm 3 , and the intensity histogram was discretized into a bin width of 25. A Python package, pyradiomics (https:// github.com/AIM-Harvard/pyradiomics), was exploited to extract radiomics features from the manually delineated VOI. The terminology of the radiomics features extracted by pyradiomics is in accordance with the Image Biomarker Standardisation Initiative (24). The following categories of features were extracted: (1) shape, including 2D and 3D (n = 14); (2) first-order statistics (n = 18); (3) gray level co-occurrence matrix-derived feature (n = 24); (4) gray level run length matrix-derived feature (n = 16); (5) gray level size zone-derived feature (n = 16); (6) gray level dependence matrixderived feature (n = 14); and (7) neighboring gray tone difference matrix feature (n = 5), (8) above features extracted from the wavelet transformed images (n = 744). In total, 851 features were exacted.

Radiomics feature selection and radiomics model construction
In the training cohort, radiomics feature selection for the model construction involved two steps. First, after normalization of the radiomics features by z-score method, Spearman correlation analysis was performed among the features and only one of the pairs with a correlation coefficient >0.99 was kept in order to reduce redundancy. Second, the filtered features were fed into the least absolute shrinkage and selection operator (LASSO) regression analysis to detect the most informative features to avoid potential overfitting. The superparameter lambda (l) in LASSO was determined by the fivefold cross-validation. Features with nonzero coefficient were selected for model development (termed as "Rad-model").

Clinical model construction
To detect independent risk factors for PHLF incidence, univariable regression analysis on clinicopathological variables were performed in the training cohort. The variables where the correlation to PHLF had a p-value <0.05 were used in a multivariable logistic regression analysis. A clinical model (coined as "Clin-model") was then constructed using clinicopathological variables with p < 0.05 after the multivariable regression analysis.

Combined model construction
Radiomics risk score (Rad-score) was then calculated for each patient through linear combination of included features in the Radmodel weighted by the corresponding coefficient. Clinicopathological variables in the Clin-model and the Rad-score were then collected to construct a combined model through logistic regression analysis. The ideal one was determined by the backward stepwise selection strategy using likelihood ratio test with Akaike information criteria (AIC) at the minimum value. Workflow of the development of a radiomics-based model.

Statistical analysis
Continuous variables with normal distribution were expressed as mean ± standard deviation and compared using Mann-Whitney U test between non-PHLF and PHLF groups. Categorical variables were presented as number (percentage) and were compared by Chisquare test or Fisher's exact test. The performance of the models was evaluated based on their abilities of discrimination, calibration, and clinical usefulness in both training and test cohorts. The discrimination capability was assessed by the area under the receiver operating characteristic curve (AUC). Calibration capacity of the model was intuitively assessed by calibration curve. The goodness of fit of the model was measured by Hosmer-Lemeshow test, with p-value > 0.05 indicating a good result. Clinical usefulness of the model was evaluated by decision curve analysis (DCA). All statistical analyses were performed on R software (R Foundation for Statistical Computing, Vienna, Austria). A two-sided p < 0.05 was regarded as statistically significant.

Patient basic characteristics
There were 238 men and 38 women among the 276 included patients, with a majority of patients <55 years (71.4%). According to the ISGLS criteria, 65 patients were diagnosed with PHLF, and the incidence rate of PHLF was 24% in the entire cohort. The training cohort contained 194 patients, and the test cohort contained 82 patients. The baseline characteristics between the two cohorts was balanced, with p > 0.05 for all variables, including the PHLF incidence. Table 1 provides detailed information about the entire, training, and test cohorts.

Clinical model construction
Based on univariable and multivariable logistic regression analyses, three significant clinicopathological variables were detected, including platelet, ALBI score, and ICG-R15 (p < 0.05) ( Table 2). The Clin-model was based on these three variables. The AUC of the Clin-model in the training and the test cohort was 0.74 (95% confidence interval, CI: 0.65-0.83) and 0.71 (95% CI: 0.57-0.84) respectively ( Table 3). The formula of the Clin-model is provided in Supplementary Material S1.

Radiomics feature selection and model construction
Among the 851 extracted radiomics features, 494 features (58%) showed an ICC ≥ 0.75, and these features were subjected to the twostep feature selection strategy. In the first step, 315 features    Table 3). The difference in the Clin-model and the Rad-model in performance was not significant (Delong test, p = 0.24).

Combined model construction
The individual Rad-score was calculated through a linear combination of the included variables in the Rad-model weighted by the corresponding coefficients (Supplementary Figure S1 and Supplementary Material S2). A third model, the combined model, was then constructed according to the AIC minimum value, which includes ALBI score, ICG-R15, and rad-score.

Performance evaluation of the combined model
The combined model yielded an AUC of 0.84 (95% CI: 0.77-0.90) in the training cohort, with sensitivity of 0.78 and specificity of 0.81 ( Figure 4A). It exhibited an AUC of 0.82 (95% CI: 0.72-0.91) and sensitivity of 0.93 and 0.67 in the test cohort ( Table 3). The AUC difference was significant between the combined model and the Clinmodel (p < 0.05), but not between the combined model and the Radmodel (p = 0.08, Table 3). The combined model has been visualized as a nomogram ( Figure 5) for clinical utility. An online tool to facilitate its calculation is available at https://onlinetools.shinyapps.io/onlineTool/.
The optimal cutoff value of the model was set at 0.28. The calibration curve showed a good agreement between the combined model predicted values and observed PHLF rate (Hosmer-Lemeshow test p > 0.05) (Figures 4B, C). The DCA plot illustrated that compared with the "treat-all" and "treat-none" strategies, the net benefit was higher for the combined model than the Clin-model and the Radmodel, implying that the combined model was beneficial for clinical utility ( Figure 4D).

Discussion
To predict PHLF in patients with HCC, we developed and validated a nomogram model combining two clinicopathological variables (ALBI score and ICG-R15) and one radiomics variable (Rad-score) derived from radiomics analysis of preoperative T1-weighted gadoxetic acidenhanced MRI. This prediction model yielded an AUC of 0.82 in the test cohort, indicating a promising tool for clinical utility.
Until now, only few studies have explored radiomics for prediction of PHLF. Zhu et al. proposed a nomogram model including ICG-R15 and radiomics signature based on hepatobiliary phase of gadoxetic acid-enhanced MRI from 101 patients (25). The model yielded an AUC of 0.89 in prediction PHLF in patients undergoing major liver resection. However, the study did not further validate the model in an independent test cohort. Chen et al. developed a combined model incorporating platelet, tumor size, and radiomics score deriving from preoperative gadoxetic acid-enhanced MRI for predicting PHLF (26). They validated the model at another medical center, obtaining an AUC of 0.84. However, in that study, they did not present their model with a formula or nomogram, which made it hard to reproduce or translate their model into clinical utility. In addition, their radiomics analysis was based on a single MRI slice per patient, which may not fully reflect the liver function. There are also two radiomics models based on CT modality for PHLF prediction (27,28). Those studies had a rather limited sample size (112 and 186 cases), which may explain the unusual outcome with AUCs in their respective test cohorts higher than that in their training cohorts (27,28). The Rad-model alone showed an effective prediction efficacy, almost comparable to our combined model. A majority of the radiomics features in the Rad-model (13/16) belonged to waveletderived features. Those described low and high frequency signals, representing homogeneity and heterogeneity of the liver tissue (29). Unfortunately, the two previously published studies on radiomics of hepatobiliary phase of gadoxetic acid-enhanced MRI, by Zhu and Chen as mentioned above, did not adopt wavelet filter, so it is not possible to make comparisons regarding the specific radiomics features. However, wavelet-derived features do frequently appear in other gadoxetic acid-enhanced MRI radiomics models, for instance, in the prediction of microvascular invasion (30) or tumor grading for HCC (31), indicating that they capture important structural features of hepatic tumor or parenchyma (14).
Another variable in our prediction model is ICG-R15. This was consistent with Zhu's PHLF prediction model, in which ICG-R15 was the only clinical predictor (25). Currently, ICG-R15 still serves as a reference standard in the quantitative evaluation of liver function before liver resection and plays an essential role in treatment management of HCC patients (32,33). Nevertheless, the role of ICG test as an independent risk factor for PHLF prediction remains controversial, as it can be influenced by many factors, such as blood flow or hyperbilirubinemia (32,34). This might explain why only approximately half of currently available studies (5/11) could successfully use ICG to predict PHLF as shown in a systematnic review (3).
Our model also consists of a predictor of ALBI grade, which is a simple and objective scoring system adopting just two common biochemistry tests (serum albumin and bilirubin) for quantitative evaluation of liver function in HCC patients (19). It was proposed to overcome the limitation of the conventional Child-Pugh scoring system and has proven to be a reliable, effective tool for liver function evaluation, applicable in several different geographic regions (19). Xiang et al. have shown that ALBI could predict PHLF with an AUC of 0.64 in the test cohort (28). In the multivariable regression analysis for Clin-model, ALBI grade demonstrated an independent risk factor for PHLF incidence (odds ratio: 3.2, Grade 2/3 vs. Grade 1). However, neither Child-Pugh nor MELD score was a significant risk factor for PHLF prediction in our cohort.
This research has some limitations to be acknowledged. First, the retrospective nature of this study bore incoherent selection biases that could have had an impact on the results. However, this issue was partially compensated via inclusion of consecutive patients. Second, our model was not validated in an external cohort. Additional validation in larger prospective multicenter cohorts is warranted to generalize our prediction model. Third, the radiomics analysis was performed based on the whole normal liver parenchyma, rather than the future liver remnant (FLR) only. A study based on FLR only might show better AUC than presented here. The large AUC observed in our study might be explained by a strong relationship in radiomics between FLR and resected part. Lastly, the resection extent was not included in our prediction model, as it was not significant during univariable logistic regression analysis. Traditionally, the hepatic resection extent is regarded as an important indicator for PHLF. However, its role may be impaired with the development of surgical concepts and skills, equipment, perioperative management, and anesthesia techniques. Currently, the occurrence of PHLF is assumed as a consequence of multiple clinicopathological factors during the perioperative period, including baseline liver/patient characteristics and intraoperative and postoperative factors (35). Interestingly, among the four existing studies that developed the radiomics models for PHLF prediction using preoperative imaging (25-28), only one study detected the resection extent significant and included it in their model (28). Furthermore, we compared the difference in the prediction performance between our proposed models with and without the variable of the resection extent, and the test showed an insignificant result (Supplementary Table S3). Due to the simplicity principle, this variable was not included in our final models. Future studies can further investigate the effect of this variable on the incidence of PHLF. Performance of the combined model for predicting post-hepatectomy liver failure. An area under the receiver operating characteristic curve in the training and test cohorts (A). Calibration curves in the training (B) and test cohorts (C) illustrated a good consistency between the model-predicted probability and the actual probability of PHLF. The red line stands for the combined model, while the green line describes the combined model calibrated by 1,000 bootstrap resampling strategy. The dash line indicates an ideal situation that the model-predicted probability perfectly matches the actual probability of PHLF. The decision curve analysis (D) showed that the combined model (green line) yielded a highest net benefit at different risk threshold of PHLF, compared with the clinical model (red line) and the radiomics model (blue line). Note: AUC, area under the receiver operating characteristic curve; PHLF, post-hepatectomy liver failure.
In conclusion, a prediction nomogram combining clinical risk factors and radiomics signature based on preoperative gadoxetic acid-enhanced MRI was constructed, and it can potentially be an effective tool for predicting liver failure after liver resection in patients with hepatocellular carcinoma.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

Ethics statement
The studies involving human participants were reviewed and approved by The Institutional Review Board of Southwest Hospital, Army Medical University. The ethics committee waived the requirement of written informed consent for participation.