Delta-radiomics models based on multi-phase contrast-enhanced magnetic resonance imaging can preoperatively predict glypican-3-positive hepatocellular carcinoma

Objectives: The aim of this study is to investigate the value of multi-phase contrast-enhanced magnetic resonance imaging (CE-MRI) based on the delta radiomics model for identifying glypican-3 (GPC3)-positive hepatocellular carcinoma (HCC). Methods: One hundred and twenty-six patients with pathologically confirmed HCC (training cohort: n = 88 and validation cohort: n = 38) were retrospectively recruited. Basic information was obtained from medical records. Preoperative multi-phase CE-MRI images were reviewed, and the 3D volumes of interest (VOIs) of the whole tumor were delineated on non-contrast T1-weighted imaging (T1), arterial phase (AP), portal venous phase (PVP), delayed phase (DP), and hepatobiliary phase (HBP). One hundred and seven original radiomics features were extracted from each phase, and delta-radiomics features were calculated. After a two-step feature selection strategy, radiomics models were built using two classification algorithms. A nomogram was constructed by combining the best radiomics model and clinical risk factors. Results: Serum alpha-fetoprotein (AFP) (p = 0.013) was significantly related to GPC3-positive HCC. The optimal radiomics model is composed of eight delta-radiomics features with the AUC of 0.805 and 0.857 in the training and validation cohorts, respectively. The nomogram integrated the radiomics score, and AFP performed excellently (training cohort: AUC = 0.844 and validation cohort: AUC = 0.862). The calibration curve showed good agreement between the nomogram-predicted probabilities and GPC3 actual expression in both training and validation cohorts. Decision curve analysis further demonstrates the clinical practicality of the nomogram. Conclusion: Multi-phase CE-MRI based on the delta-radiomics model can non-invasively predict GPC3-positive HCC and can be a useful method for individualized diagnosis and treatment.


Introduction
Hepatocellular carcinoma (HCC) accounts for 75%-85% of primary liver cancers, which is the sixth most common malignant tumor in humans and the third leading cause of cancer death worldwide (Sung et al., 2021). Despite advances in diagnosis and treatment, the prognosis of HCC patients is still unsatisfied. Hepatectomy and transplantation are considered as the most recommended surgical approaches for HCC treatment . Unfortunately, the 5-year recurrence rates after surgical resection still reach up to 70% (Llovet et al., 2021).
Glypican-3 (GPC3) is a member of the heparan sulfate proteoglycan family and is overexpressed in most HCC but not in healthy or nonmalignant livers (Capurro et al., 2003). It can help distinguish alpha-fetoprotein (AFP)-negative HCC from benign nodules, suggesting that GPC3 is a more reliable biomarker than AFP in diagnosing HCC (Llovet et al., 2006;Wang et al., 2006). Furthermore, previous studies have shown that GPC3-positive HCC patients have a worse prognosis (Shirakawa et al., 2009;Yorita et al., 2011;Ning et al., 2012;Fu et al., 2013). It is one of the most popular targets in the treatment of HCC in recent years, and there are several clinical traits that reported that GPC3 shows great potential to be an immunotherapeutic target for HCC (Ho and Kim, 2011;Zhu et al., 2013;Du et al., 2021). Accordingly, GPC3 plays a vital role in the diagnosis, treatment, and prognosis of HCC. Identifying the expression of GPC3 as soon as possible is of great importance to the clinical management of HCC.
Currently, accurate detection of GPC3 expression is mainly achieved through postoperative immunohistochemical examination. Although needle biopsy can test GPC3 expression before surgery, it is an invasive method that cannot reflect the heterogeneity of the entire tumor and is susceptible to sampling variation (Zhou et al., 2018). Recently, several studies have found that GPC3 can be released from the cell surface into peripheral blood, indicating the potential of using serum GPC3 levels in HCC diagnosis (Capurro et al., 2003). However, the results vary among studies and need to be further verified (Zhou et al., 2018;Guo et al., 2020). Thus, a preoperative and non-invasive method is needed for predicting the expression of GPC3.
Radiomics, an emerging imaging technology that employs cutting-edge computational tools to extract high-throughput quantitative imaging features and builds predictive models via statistical ways or machine learning to improve diagnosis and prognosis prediction, is attracting increasing attention in cancer research studies (Lambin et al., 2017). Recently, plenty of research studies utilizing the radiomics method demonstrated favorable performance in preoperatively predicting KI67, CK19, microvascular invasion (MVI), and other pathological factors in HCC (Li et al., 2019;Wang et al., 2020;Chong et al., 2021;Zhu et al., 2021). Furthermore, delta radiomics, which integrates time components and radiomics features, provides additional information about the evolution of feature values (Fave et al., 2017;Mokrane et al., 2020). To the best of author's knowledge, only one study  has preoperatively predicted GPC3 expression by radiomics features extracted from delayed-phase magnetic resonance imaging (MRI) images. The values of other phases and delta information have not been explored yet.
The present study aims to investigate the performance of the radiomics model based on multi-phase contrast-enhanced magnetic resonance imaging (CE-MRI) and evaluate the effect of delta-radiomics features in predicting GPC3 positive HCC preoperatively.

Patients
This retrospective study was approved by the institutional review board, and the requirement for informed consent was waived. From January 2017 to December 2021, 582 pathologically confirmed HCC patients who underwent curative resection were consecutively enrolled. The inclusion criterion is as follows: patients with stage Ia, Ib, and IIa hepatocellular carcinoma according to the China liver cancer (CNLC) staging system (Zhou et al., 2020) who underwent surgical resection as first-line treatment. The exclusion criteria are listed as follows: 1) patients who had no preoperative gadobenate dimeglumine (GD-BOPTA)-enhanced 3.0T MRI examination; 2) MRI was performed more than a month before surgery; 3) patients with macrovascular invasion upon preoperative MRI examination; 4) patients with indistinguishable tumor boundaries due to obvious artifacts on MRI images; 5) HCC patients who had previous treatment such as chemotherapy, radiotherapy, transarterial chemoembolization (TACE), and radiofrequency ablation; and 6) immunochemical staining for GPC3 was unavailable. The status of GPC3 expression was recorded according to the pathological report and was categorized into GPC3-positive or GPC3-negative HCC patients. Finally, 126 HCC patients who met the criteria were included and randomly split into the training cohort (n = 88) and validation cohort (n = 38) at a ratio of 7:3 based on the stratified method ( Figure 1).

Laboratory test and MRI protocol
According to the hospital information system, we collected patients' basic information, preoperative blood test, and biochemical results, including age, gender, hepatitis B and C immunology, cirrhosis, AFP level, platelet count (PLT), prothrombin time (PT), international normalized ratio (INR), total bilirubin (TBIL), serum albumin (ALB), and alanine aminotransferase/aspartate aminotransferase (ALT/AST).

Histopathological analysis
Histopathological evaluation was available after hepatectomy for HCC across all patients. At the participating hospital, all surgical specimens were routinely fixed in a 10% formaldehyde solution. Two pathologists who were blind to MRI information jointly evaluated the surgical specimens using a standard seven-point sampling method (Liao et al., 2021). A mouse anti-human glypican-3 monoclonal antibody (#MAB-0617 Maixin-Bio, Fujian, China) was used for immunohistochemical labeling of GPC3. GPC3 staining was considered positive when the brown reaction product was present in at least 1 (tumoral) hepatocyte (Libbrecht et al., 2006).

Radiological assessment
All MR images were reviewed independently by two abdominal radiologists (X.J.C with 3 years of abdominal MRI experience and C.Y with 10 years of abdominal MRI experience) who were aware of the diagnosis of HCC but blinded to the status of GPC3 expression and other clinical data. Divergences between two readers were discussed until a final consensus was achieved. Radiological features in accordance with the Liver Imaging Reporting and Data System (version 2018) (Chernyak et al., 2018) were assessed as follows: (a) tumor margins; (b) tumor capsule; (c) arterial phase hyperenhancement; (d) non-peripheral washout; (e) peritumoral arterial enhancement; (f) tumor hypointensity on HBP; (g) peritumoral hypointensity on HBP; (h) mosaic architecture; (i) intratumoral fat; (j) intratumoral hemorrhage; and (k) intratumoral necrosis.

Clinical-radiological model
All clinical factors and qualitative radiological features in the training cohort were analyzed first using univariate logistic regression analyses. Those factors with a p value less than 0.05 in univariate analyses were entered into multivariate logistic regression analyses to find the independent predictors, and then the

Radiomics analysis
The main flow chart of radiomics analysis is shown in Figure 2. The N4 bias-field correction algorithm was applied to correct the inhomogeneity of MR images. The volumes of interest (VOIs), defined as the whole tumor without peritumoral vessels or bile ducts, were manually delineated by a radiologist (X.J.C) on T1WI, AP, PVP, DP, and HBP images using 3D Slicer software (https:// www.slicer.org/). The details of VOIs segmentation are shown in the Supplementary Method. If multiple lesions were found, only the largest one was delineated. After an interval of 2 months, repeated segmentation was performed on 30 randomly selected patients by another radiologist (H.T.D). The two radiologists who perform the segmentation were all blind to the GPC3 status and other clinical information. The segmentation reproducibility was assessed using the Dice similarity coefficient (DSC), and the reproducibility of radiomics features was assessed by the intra-class correlation coefficient (ICC).
The ratio of GPC3-negative patients to GPC3-positive patients is 1:3.065 in this study, which reveals a data imbalance. Thus, we used Synthetic Minority Over-sampling TEchnique (SMOTE) method ('BorderlineSMOTE' packages from scikit-learn) to balance the GPC3-negative group in the training cohort. Subsequently, features in the training cohort were normalized by the z-score, and features in the validation cohort were normalized using the mean and standard deviation values derived from the training cohort. We utilized a two-step feature selection procedure to reduce feature dimension and select robust features. First, the minimal-redundancy-maximal-relevance (mRMR) algorithm ('pymrmr' packages in the Python project) was recruited to rank feature importance. Briefly, input features were ranked by maximizing mutual information (MI) to class labels and minimizing MI with other features (Peng et al., 2005). The top-20 features ranked by mRMR were used for further selection by recursive feature elimination (RFE) algorithm with 10-fold crossvalidation ('RFECV' packages from scikit-learn).
We built preliminary models based on five phases, namely, delta1, delta2, and delta3 features, respectively, using logistic
Frontiers in Physiology frontiersin.org regression (LR) and support vector machine (SVM). Indicators such as area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, and specificity were used for model evaluation. Subsequently, features in the preliminary model with the AUC higher than 0.75 in both the training cohort and validation cohort were combined to construct the fusion model. The fusion model with the best discriminative power was selected as the final model, and the prediction probability of the final model was used as radiomics signature.

Nomogram construction and evaluation
A nomogram was built by integrating the clinical-radiological risk factors and the radiomics signature in the training cohort into multivariable logistic regression and was assessed in the validation cohort. Calibration curves were utilized to analyze the agreement between the predicted and observed GPC3 status. Decision curve analysis was conducted to determine the clinical utility of the nomogram.

Statistical analyses
Continuous variables with normal distribution were compared by using the Student's t-test and those with non-normal distribution were compared by using the Mann-Whitney U-test. Categorical variables were compared using the chi-squared test or Fisher's exact test. The DeLong test was used to compare the AUC between different models. Hosmer-Lemeshow test (HL test) was used to evaluate the goodness of fit of the nomogram. All statistical analyses were performed using R software (version 4.1.0; http://www.rproject.org). Two-sided p < 0.05 was considered to indicate statistical significance.

Basic characteristics and clinical model performance
Comparisons of clinical and radiological characteristics between the training cohort and the validation cohort are summarized in Table 1. No statistical difference was observed between the two groups (p = 0.243-1.0), except for gender (p = 0.038) and nonperipheral washout (p = 0.023).

Construction and evaluation of radiomics models
First, preliminary models were constructed and evaluated. The performance of each preliminary model in the training cohort and the validation cohort is shown in Supplementary Table S2. Among five single-phase radiomics models, only the one derived from DP has an AUC higher than 0.75 in both the training and the validation cohort. In delta1, no model met the criteria. In delta2 and delta3, models based on delta2 AP-t1, delta2 PVP-t1, delta2 HBP-t1, delta2 PVP-AP, delta3 PVP-t1, and delta3 HBP-t1 showed satisfactory results (AUC higher than 0.75 in both training and validation cohorts) and were used in constructing the fusion models according to permutation and combination.
The performance of all fusion models is presented in Table 2. The logistic regression model based on delta2 AP-T1, delta2 HBP-T1, and delta2 PVP-AP (fusion9) showed the best overall performance, achieving AUCs of 0.862 (95% CI: 0.795-0.912) and 0.851 (95% CI: 0.717-0.959) for the training cohort ("SMOTE" training cohort) and the validation cohort, respectively, which was named as the optimal model in the following section. The selected features and their corresponding coefficients are shown in Table 3. Radiomics score was calculated from the predicted probability of the optimal model. The AUCs of radiomics score were 0.805 in the training cohort and 0.851 in the validation cohort ( Figures 3A, B).
The mean DSC of VOIs segmentation and the mean ICC of radiomics features in different phases are shown in Supplementary Tables S4-S6. The mean DSC of VOIs on five phases was higher than 0.9, and the mean ICC of radiomics features from different delta phases was higher than 0.8. All of the ICC of radiomics features in the optimal model were higher than 0.8.

Performance of the nomogram
By combining the AFP and radiomics signature of the optimal model through logistic regression, a comprehensive model was built and visualized as a nomogram ( Figure 4A). The nomogram yielded an AUC of 0.844 (95% CI: 0.748-0.941) in the training cohort and 0.862 (95% CI: 0.745-0.979) in the validation cohort, respectively, ( Figures  3A, B), which were significantly higher than the clinical model (p < 0.001 in both cohorts). The accuracy, sensitivity, and specificity of the nomogram in the training cohort and the validation cohort were 0.773, 0.742, and 0.864 and 0.789, 0.793 and, 0.778, respectively. The threshold of the nomogram in the training cohort was 0.78, which was calculated by the Youden index. The box diagram ( Figures 2B, C) displayed the distribution of the predicted probability of the nomogram in the GPC3+ group and the GPC3− group and showed statistical difference in both groups (p < 0.001 in both cohorts). Calibration curves ( Figure 4D) showed good agreement between nomogram-predicted probability and actual GPC3 status in both the training cohort (HL test, p = 0.846) and validation cohort (HL test, p = 0.632). Decision curve analysis ( Figure 4E) demonstrated that the nomogram obtained more clinical net benefits than the strategies of "treat all" and "treat none."

Discussion
In this retrospective study, we aim to develop and validate a radiomics-based nomogram to preoperatively predict the status of GPC3 expression in HCC patients. The clinical model, single-phase radiomics models, and delta-radiomics models were built, and deltaradiomics model had higher AUCs than single-phase radiomics models (especially the delta2 radiomics model). The final comprehensive model consisting of delta2 AP-T1 and delta2 HBP-T1  and delta2 PVP-AP radiomics signature and AFP achieved the best overall performance and had more clinical benefits. Previous studies Zhao et al., 2021) have reported that the serum AFP level was significantly associated with GPC3 expression. Our study also has the same result. The potential mechanism may be that the level of GPC3 induction is controlled by alpha-fetoprotein regulator 2 (Afr2) (Morford et al., 2007). During the construction of the clinical model, only the AFP level was significantly related to the GPC3 expression status. Despite the unsatisfactory sensitivity and accuracy, a high specificity of 0.909 was reported in the training cohort of the clinical model based on AFP to predict GPC3 expression. The final comprehensive model, integrated AFP, displayed good performance as well.
Due to the imbalanced proportion of GPC3 expression in our data, we employed the "BorderlineSMOTE" strategy (Wang et al., 2015) to solve the problem. The BorderlineSMOTE algorithm is an improved and scientific oversampling algorithm based on SMOTE, which uses only a few samples on the boundary to synthesize new samples, thereby improving the internal distribution of samples. Moreover, we used two classical and common machine learning algorithms, namely, logistic regression and support vector machine to train the radiomics models. In terms of SVM, we used the linear kernel because only it can output the predicted probability of each sample. Our results showed that LR and SVM performed similarly in model development.
In our study, we extracted radiomics features from threedimensional VOIs. The entire tumor inherently grows spatially and forms heterogeneously; the VOIs certainly incorporate more texture features and geometrical information rather than the twodimensional regions of interest (Xu et al., 2019). We not only extracted radiomics features from five different phase images but also calculated delta-radiomics features. Delta radiomics may provide more information about the blood supply and metabolism of tumors. As for delta radiomics, our results showed that in the validation cohort, direct subtraction delta features (AUC ranges: 0.548-0.808) and relative subtraction delta features (AUC ranges: 0.464-0.862) had better performance than standardized delta features (AUC ranges: 0.473-0.724). The best combination of delta features is delta2 AP-T1 and delta2 HBP-T1 and delta2 PVP-AP, and most of them were textured features. Among them, IDMN (inverse difference moment normalized) is a measure of the local homogeneity of an image, and IMC2 quantify the complexity of the texture. These two features from delta2 AP-T1 may reflect the texture changes from T1WI to the arterial phase and may correspond to the arterial enhancement. Many previous studies have supported that that hepatobiliary phase hypointensity on liver-specific contrast agentenhanced MRI increases the diagnostic sensitivity for detecting HCC (Cortis et al., 2016;Li et al., 2021). Features from delta2 HBP-   Frontiers in Physiology frontiersin.org 10 0.879 and 0.871 in the training and validation cohorts, respectively. In our study, radiomics models from DP had an AUC of 0.772 and 0.755 in the training and validation cohorts, respectively. The difference between the two studies may be due to the bias of diversified sample populations, parameters of scan machines, and different MRI contrast agent, etc. We further analyzed the predictive effect of the optimal delta-radiomics model in our study and found that the model achieved an AUC of 0.957 [0.907-1] in the subgroup population with hepatocellular carcinoma smaller than 5 cm, which was higher than the radiomics model in Gu's study.
Recently, adoptive cell therapy is emerging in advanced HCC, and many clinical trials investigate CAR T cells targeting GPC3 (Rochigneux et al., 2021). An effective preoperative estimation of GPC3 presence can assist clinicians to choose and customize appropriate therapeutic strategies for patients. Chen et al. (2021) used the IDEAL IQ MRI R2* map to evaluate glypican-3 expression and achieved an AUC of 0.881, sensitivity of 0.859, and specificity 0.842. However, the IDEAL IQ sequence is not a routine sequence in liver MRI examination, and the usage of R2* needs further validation. Our proposed radiomics-based nomogram is an effective and economical tool to preoperatively predict GPC3 expression and is expected to help clinicians make appropriate treatment decision.
Our study has several limitations. First, our study is a singlecenter retrospective study. The sample size is relatively small, and external validation in other centers is needed. Second, we only focus on muti-phase CE-MRI radiomics features, and multimodal MRI radiomics features such as T2WI and DWI could be explored in the future. Third, our study used Gd-BOPTA CE-MRI, and the hepatobiliary phase was acquired at about 90 min after contrast medium injection. Further validation based on gadoxetate disodium CE-MRI is needed. Fourth, manual segmentation with semiautomatic segmentation tools still takes a lot of time. Registration and deep learning-based auto-segmentation can be used in future studies.
In conclusion, the multi-phase CE-MRI based on deltaradiomics model can non-invasively predict GPC3-positive HCC. Integrated with the serum AFP level, the comprehensive nomogram achieved a satisfactory prediction of GPC3 expression status, which will be beneficial for clinical treatment decision making.

Data availability statement
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

Author contributions
Conceptualization: YL, ZH, and HD; methodology: YL, ZH, and LG; data curation (e.g., collected patients, imaging performed): ZH, HD, XC, XC, CY, and RY; formal analysis: ZH and LG; editing and review of the manuscript performed by all authors. All authors read and approved the final manuscript and agreement to be accountable for all aspects of the work. All authors contributed to the article and approved the submitted version.

Funding
This study has been supported by the Natural Science Foundation of Fujian Province (CN) (Award Number: 2022J02033).