Validation of the GALAD model and establishment of a new model for HCC detection in Chinese patients

Background GALAD model is a statistical model used to estimate the possibility of hepatocellular carcinoma (HCC) in patients with chronic liver disease. Many studies with other ethnic populations have shown that it has high sensitivity and specificity. However, whether this model can be used for Chinese patients remains to be determined. Our study was conducted to verify the performance of GALAD model in a Chinese cohort and construct a new model that is more appropriately for Chinese populations. Methods There are total 512 patients enrolled in the study, which can be divided into training set and validation set. 80 patients with primary liver cancer, 139 patients with chronic liver disease and 87 healthy people were included in the training set. Through the ROC(receiver operating characteristic) curve analysis, the recognition performance of GALAD model for liver cancer was evaluated, and the GAADPB model was established by logistic regression, including gender, age, AFP, DCP, total protein, and total bilirubin. The validation set (75 HCC patients and 130 CLD patients) was used to evaluate the performance of the GAADPB model. Result The GALAD and GAADPB achieved excellent performance (area under the receiver operating characteristic curve [AUC], 0.925, 0.945), and were better than GAAP, Doylestown, BALAD-2, aMAP, AFP, AFP-L3%, DCP and combined detection of AFP, AFP-L3 and DCP (AUCs: 0.894, 0.870, 0.648, 0.545, 0.879, 0.782, 0.820 and 0.911) for detecting HCC from CLD in the training set. As for early stage of HCC (BCLC 0/A), GAADPB had the best sensitivity compared to GALAD, ADP and DCP (56.3%, 53.1%, 40.6%, 50.0%). GAADPB had better performance than GALAD in the test set, AUC (0.896 vs 0.888). Conclusions The new GAADPB model was powerful and stable, with better performance than the GALAD and other models, and it also was promising in the area of HCC prognosis prediction. Further study on the real-world HCC patients in China are needed.


Introduction
With approximately 906,000 new cases and 830,000 deaths globally in 2020, hepatocellular carcinoma (HCC) was known as the sixth most diagnosed cancer and the third leading cause of cancerrelated mortality (1).The etiology of HCC is geographically diversified. Worldwide, most cases are related to HBV infection, while HCV infection is the most common case of HCC in some Western countries as well as Japan (2). In China, HBV infection is the leading cause of HCC (3). Surgical resection, ablation, or liver transplantation can be used to cure early-stage HCC. However, those patients who are diagnosed at an advanced stage have limited access to treatments and a poor prognosis (3). A lot of patients in China are already at an advanced stage at the time of diagnosis. Given that situation, early surveillance of HCC among high-risk population is of great importance.
For better surveillance of high-risk groups, abdominal ultrasound is recommended every six months, with or without alpha-fetoprotein (AFP) serum test. However, the test results are operator-dependent. The sensitivity could vary from 47% to 84% (4). Further, obesity might affect the performance of ultrasound (5). As a surveillance test, ultrasound still has some limitation, so AFP is often used in combination with ultrasound. While detection rate can be increased by combining these two methods, there is also an increased suspicion of false-positive and cost (6). Besides, patients with chronic viral hepatitis can also have an elevated alpha-fetoprotein (5). To monitor HCC more accurately at an early stage,other serum-based biomarkers, such as Lens culinaris agglutinin-reactive fraction of AFP (AFP-L3), and Des-gamma-carboxy prothrombin (DCP), have been put into use (7). However, none of the single serum marker for early surveillance of HCC has been proved to meet the clinical demands (8,9).
The GALAD score, as well as Doylestown, BALAD and aMAP, which is derived from gender, age, AFP, AFP-L3, DCP, bilirubin, platelets and albumen, has showed high sensitivity for HCC detection in reported studies (9,10). However, the studied samples of these reports had characteristics different from those of Chinese HCC patients, where hepatitis B virus (HBV) infection is the most common etiology of HCC in China whereas HCV infection was the main reason in other ethnic populations.
This study aimed to assess the performance of the GALAD model for HCC diagnosis and explore a better model for Chinese patients.

Study populations
This study enrolled 307 participants in training set, consisting of 80 HCC patients, 140 non-cancer controls with CLD [100 patients with liver cirrhosis (LC) and 40 patients with chronic hepatitis B (CHB)] and 87 healthy controls (HC). In testing set, a total of 205 participants was enrolled, including 75 HCC patients and 130 CLD controls (106 LC/24 CHB). All participants were recruited between 2019 and 2021 from Shulan hospital in Hangzhou. For all patients, the diagnosis was established by the histologic examination of tumor tissue or characteristic medical imaging including computed tomography (CT), magnetic resonance imaging (MRI), according to clinical practice guidelines. We collected information on 29 clinical characteristics and other variables for all participants, consisting of age, gender, etiology of liver disease, laboratory results, and baseline tumor characteristics at the time of diagnosis (Tables 1, S1). Written informed consent was

Inclusion criteria
HCC patients were diagnosed by referring "Guidelines for the diagnosis and treatment of primary liver cancer": 1) Histopathological diagnosis; or 2) Medical imaging diagnosis, includes of computed tomography (CT), magnetic resonance imaging (MRI), Contrast enhanced-ultrasonography(CEUS) or Gd-EOB-DTPA-MRI(EOB-MRI): A) if liver had a nodule equal or less than 2 cm, and more than 2 medicals imaging results revealing typical imaging lesions of HCC; B) if liver had a nodule greater than 2 cm, and more than 1 medical imaging results revealing typical imaging lesions of HCC; or C) if liver had not nodule, but AFP values was positive and more than 1 medical imaging results revealing typical imaging lesions of HCC. CHB patients were diagnosed by referring "The guideline for the prevention and treatment of CHB infection from the Chinese Society of Hepatology": HBV infection more than 6 months, alanine aminotransferase (ALT) is persistently or repeatedly elevated, or hepatitis lesions are identified by liver biopsy. LC patients were diagnosed by referring "Chinese guidelines on the management of liver cirrhosis": 1) Histopathological diagnosis; or 2) Medical imaging diagnosis, Ultrasound(US), CT or MRI imaging results revealing splenomegaly without liver spaceoccupying lesions. Healthy participants were a group of people with normal physical examination in Shulan Hospital: 1) No family history of cancer, no history of liver disease diagnosis and treatment; (2) no HBV or HCV infection; 3) liver function, renal function, and routine blood tests were normal; 4) Ultrasound results were normal in the liver or gallbladder system; and (5) liver fiber scan results were normal.

Exclusion criteria
(1) Subjects with HCC with other tumors; (2) Subjects who cannot be sampled, have insufficient sample size or have unqualified samples; (3) Subjects with liver metastases or HCC treatment (such as: surgery, ablation, radiotherapy or chemotherapy).
AFP, AFP-L3%, and DCP assay About 10 mL peripheral blood was collected from each participant. For HCC and CLD patients, blood sample was drawn prior to the treatment. Serum AFP, AFP-L3%, and DCP were assayed by Hotgen Biotech Co., Ltd (Beijing, China) by using chemiluminescence microparticle immunoassay. The quantitative limit of AFP, AFP-L3, AFP-L3%, and DCP was 0.6-20000ng/mL, 0.6-20000ng/mL, 5-50% and 0.6-20000ng/mL, respectively. If these biomarker values exceeded extreme ones, we used extreme values to represent. On the other hand, if both AFP and AFP-L3 exceeded extreme values, we then used 10% (AFP-L3/AFP) to represent AFP-L3% positive.
Performance evaluation of five models for the discrimination of HCC from CLD We compared five previous developed models for assessing their discrimination ability of HCC from CLD in our sample. GALAD score was calculated based on five variables (age, gender, AFP, AFP-L3 and DCP) (11). GAAP score was calculated on the basis of four variables (age, gender, AFP and DCP) (12). Doylestown score was calculated based on five variables (age, gender, AFP, ALP and ALT) (13). BALAD-2 score was calculated based on five variables (bilirubin, total albumin, AFP, AFP-L3 and DCP) (14). aMAP score was calculated based on five variables (age, gender, total bilirubin, albumin and PLT) (15). Table S2 shows the detailed equations of these models. Cutoff values of these models were based on published thresholds (GALAD: -0.63, GAAP: -0.65, Doylestown: 0.5, BALAD-2: 0.66, and aMAP: 60). Performance of these models was performed by calculating the area under each receiver operating characteristic (ROC) curve.

Model development
Logistic regression for multivariate analysis was used to assess the association of the 20 variables, including demographics, HCC biomarkers, liver function, blood routine test, and blood coagulation parameters; see Tables 1, S1) with HCC based on the use of a forward-backward stepwise approach. Participants with missing data were excluded from the statistical analysis. Logarithmic transformation of AFP, AFP-L3 and DCP values was applied for logistic regression analysis due to extreme skewness (11). Age has been considered as a good variable for HCC risk assessment (11,14,15). Therefore, we built a new HCC risk assessment model, called GAADPB based on variables (gender, AFP, DCP, TP and TB) selected from multivariate analyses and age. The formula was shown as follows: GAADPB score = 0.176 + 0.162*gender +0.002*age +0.178*log 10 AFP+0.164*log 10 DCP-0.007*TP-0.002*TB The probability of having HCC was calculated using the following formula: P(HCC) = exp[score]/(1 + exp[score]).

Statistical analysis methods
SPSS (version 22.0) software (IBM/SPSS Inc., Chicago, IL) or R language (version 3.4.4) were used to perform the statistical analysis and draw figures. Continuous variables were presented as medians (interquartile range) and categorical variables were presented as frequencies (percentage). Characteristics differences were tested using Wilcoxon test for continuous variables and Chi-square tests for categorical variables. SPSS software was used to analyze ROC curve. The results of sensitivity, specificity, and AUC were used to reveal models/biomarkers performance. Comparisons of ROC curves were performed using the "roc.test" function in pROC package in R language (parameter paired = "TRUE" and method = "delong"). All statistical tests were two-sided. P value ≤0.05 was considered to be statistically significant.

Demographic data and clinical characteristics of enrolled participants in training set
A total of 307 participants were enrolled in the training set, including 80 HCC patients, 140 CLD (100 HC/40 CHB) and 87 HC participants. The demographic data, clinical characteristics and laboratory results of the study participants were described in Tables 1, S1. HCC patients were significantly older than those with chronic liver disease (median 54.5 years vs 52 years; P=0.021), and the proportion of males in HCC patients were significantly higher compared to that in CLD population (92.5% vs 75.7%, P=0.002). In addition,41.3% HCC patients were at a very early/early stage (BCLC 0/A).

Screening candidate biomarkers for model construction
To find potential biomarkers that can be used to distinguish HCC patients from non-HCC controls among Chinese population, two approaches were employed. Firstly, we applied logistic regression analysis to screen independent factors associated with HCC. After excluding one CHB patient due to incomplete data, we finally included 80 HCC, 100 LC and 39 CHB samples for logistic regression analysis, which contained 20 variables from demographic characteristics, routine blood test, liver function test, and blood clotting test (Tables 1, S1). The results demonstrated that gender, AFP, DCP, TP and TB were independent factors for HCC (Table 2).
We also compared the levels of 18 blood indicators between HCC patients and patients at high-risk for HCC (CLD group; Tables 1, S1) to screen significantly difference indicators. According to our samples, the mean values of AFP, DCP, AFP-L3, WBC, PLT, ALT, AST, ALP, GGT and ALB had significant difference between HCC and CLD groups (P<0.05). We further compared these variables between different disease subgroups (Figures 1, S1), and found that the levels of AFP, DCP, AFP-L3, AST and GGT in HCC group were significantly higher than that in LC and CHB subgroups. However, TP, RBC, HGB, ALP, DBIL, ALB, A/G, PT and INR had a similar level between HCC and LC group, and TB, WBC and ALT could not discriminate HCC from CHB group. In addition, the level of PLT in HCC group were significantly higher than that in LC group, but significantly lower than that in CHB and HC groups. These results suggested that the AFP, DCP, AFP-L3, AST and GGT biomarkers might be able to potentially discriminate HCC patients from non-HCC patients.

Development of a new HCC risk assessment model
In order to build a more suitable HCC diagnostic model, we referred to the ideas developed by GALAD (16). HCC and highrisk groups for HCC were used in the construction of a diagnostic model for HCC. We included above identified independent factors, gender combined with blood indicators showing significant difference between HCC and CLD groups, and all candidate biomarkers obtained from two approaches in a multivariate model to construct models 1, 2 and 3, respectively. The performance of each model is showed in Table 3. The AUC of models 1, 2 and 3 was 0.941, 0.927 and 0.94, respectively. These results suggested that, although the values of AFP-L3, AST and GGT alone were significantly different between HCC and non-HCC groups, these variables were not effective in improving the performance of HCC diagnostic models. Furthermore, we found that the age variable was filtered out by logistic regression analysis. However, numerous studies (15-17)had shown that age variable can be enrolled into models related to HCC, and "Guidelines for the diagnosis and treatment of primary liver cancer" also suggested that men more than 40 years old is also part of the high-risk group for HCC. Therefore, we enrolled the age variable into model 1 and developed a new model for HCC diagnosis, GAADPB. The performance of GAADPB (AUC=0.941, Table 3) was similar to model 1 and slightly higher than GALAD (P= 0.0545).

Subgroup analysis of model performance
We analyzed the performance of GAADPB for differentiating HCC from different disease subgroups (training set) and health controls in our sample, and the results are shown in Figure 2 and Table 4. The AUC of GAADPB for differentiating HCC from LC was 0.939, which was significantly higher than that of GALAD (AUC LC =0.913; P =0.01), A FP (AUC L C = 0. 87 4; P = 0. 0 01 ) and DCP (AUC LC =0.801; P <0.001). In addition, the AUC of GAADPB for differentiating HCC from CHB and HC subgroups was 0.946 and 0.991, respectively, which was similar to the performance of GALAD (AUC CHB =0.954, P=0.453; AUC HC =0.991, P= 0.857) and significantly higher than that of AFP (AUC CHB = 0.879, P=0.003; AUC HC =0.953, P=0.005) and DCP (AUC CHB =0.871, P=0.009; AUC HC =0.94, P=0.001). Collectively, the performance of GAADPB for different subgroups was significantly better than that of a single protein biomarker, and it was more suitable than GALAD for distinguishing HCC from LC patients and similar to GALAD for distinguishing HCC from CHB and HC.
In addition, we analyzed the performance of GAADPB in different cancer subgroups in training set. GAADPB performed significantly better for distinguishing HCC than individual protein biomarkers in many HCC subgroups and was almost identical to GALAD ( Figure 3 and Table S3). As a diagnostic model, it is necessary to maximize the detection rate of the model under the premise of avoiding excessive medical treatment. Therefore, we compared the sensitivity of GAADPB and GALAD at a specificity of 90%. The results showed that GAADPB appeared more sensitive than GALAD for detecting different HCC subgroups with very early/early stage (BCLC 0/ A), small size (diameter < 3 cm), single lesion, absent PPVT, absent metastases, AFP-negative (20 ng/ml) and DCP-negative    (<40 ng/ml), especially in (very) early-stage (BCLC 0/A), small size (diameter< 3 cm) and AFP-negative HCC, which improved its sensitivity by 12.1%, 12.5%, and 12.5% than GALAD, respectively. These results suggested that GAADPB performed even better in detecting HCC compared to GALAD in a more subtle situation.

Performances of GAADPB in testing set
To evaluate the stability of GAADPB in distinguishing HCC from high-risk population. We included an independent testing set for model validation. The information for demographic data and clinical characteristics of the participants in the testing set are presented in Table 5. In the testing set, the AUC of GAADPB was 0.896 for distinguishing HCC from CLD patients, and the sensitivity was 76.7% at a specificity of 90% (Figure 4). The preformation for different disease subgroups is showed in Figure 4. The AUC of GAADPB was 0.889 and 0.928 for distinguishing HCC from LC and CHB, respectively. When the specificity was 90%, the sensitivity was 74.7% in both LC and CHB subgroups. Furthermore, we assessed GAADPB performance for distinguishing HCC from different cancer subgroups, the results revealed that at a 90% specificity, GAADPB still had the highest sensitivity compared to the individual biomarkers for detecting HCC subgroups with very early/early stage (BCLC 0/A), small size (diameter < 3 cm), single lesion, absent PPVT, absent metastases, AFP-negative (20 ng/ml) and DCP-negative (<40 ng/ml)] ( Figure 5 and Table S4).
These results indicated that GAADPB model was a stable and robust diagnostic tool for differentiating HCC from highrisk population.

Discussion
About 350 million people are infected with HBV globally, and the lifetime risk of developing HCC among HBV carriers ranges from 10% to 25% (18). Most HCC cases in China are related to HBV infection, which is the same in our study population (19). Currently, the methods based on ultrasound and AFP are not sensitive enough to detect early HCC, so a more effective, objective and accurate Chinese population monitoring method is in need.
Several HCC risk predictions scoring systems had been developed for estimating the risk of HCC development from CHB, diagnose the ability of HCC from CLD, stage HCC and so on (11)(12)(13). Since they were all HCC-related models, we wanted to determine whether all these models could be used for the diagnosis of HCC. Therefore, we evaluated and compared the performance of these five models as well as three protein biomarkers for the diagnosis of HCC in our training sample. Our results showed that GALAD had the highest accuracy for HCC detection, with an area under the receiver operating characteristic curve (AUC) of 0.925 (Table 6). In addition, combined detection of AFP, AFP-L3 and DCP (AUC=0.91) and GAAP (AUC=0.89) could also better distinguish the HCC patients from the non-HCC population. However, BALAD-2 Sensitivity comparison of GAADPB and GALAD at 90% specificity in different cancer subgroups.  and aMAP could not predict HCC with a relative high performance, even with significantly lower AUC than individual protein markers (AUC AFP =0.876, P BALAD-2 VS AFP <0.001, P aMAP VS AFP <0.001), which may be related to the scenario of model development (15,17).
In order to build a more suitable HCC diagnostic model, we referred to the ideas developed by GALAD and constructed a new model for HCC diagnosis, the GAADPB model (11). Differing from GALAD, in our multivariate logistic regression analysis, total protein (TP) and total bilirubin (TB) are independent factors associated with the developing of HCC, they reflect the synthetic function and the underlying liver function and they are also included in the BALAD-2 and aMAP (11,12). We excluded AFP-L3, because the contribution of AFP-L3 was small in our study, and the GAAP study had the same results (12). The age variable was filtered out by logistic regression analysis, but according to previous reported studies, the age variable is associated with the incidence of HCC and it is included in many models (11,13,15). Therefore, we included the age variable in our model. The performance of GAADPB (AUC=0.941) was better than GALAD (AUC=0.925), GAAP (AUC=0.894) and other models according to our training set. And in the validation set, the performance of GAADPB (AUC=0.896) was also better than the GALAD model (AUC=0.888) and single serum marker. We also assessed GAADPB performance for distinguishing HCC from   (8). Besides, the aMAP score had been shown to be strongly related to HCC development in patients with chronic hepatitis, and late recurrence after radiofrequency ablation of HBV-related HCC patients (21). In our current study, the GAADPB model had a higher AUC in the diagnosis of liver cancer compared to BALAD-2 and aMAP, and GAADPB introduced indicators of liver injury such as TB compared to GALAD, which implied the performance of GAADPB in the areas of liver cancer progression, recurrence and prognosis could be promising in the future.
Early diagnosis of HCC for detecting HCC has been a research hotspot recently. Lots of new technologies such as multitarget HCC blood test (mt-HBT) and liquid biopsies were used to improve early cancer detection, but at the same time they come at a higher cost (22). Our results suggested that GAADPB model had higher sensitivity compared to the individual biomarkers (Figure 3), when we were detecting HCC subgroups with very early/early stage (BCLC 0/A), small size (diameter < 3 cm), single lesion, absent PPVT, absent metastases, AFP-negative (20 ng/ml) and DCP-negative (<40 ng/ml). And the performance was confirmed to be stable in the testing set. Therefore, when it comes to early stage HCC that are difficult to diagnose by imaging and traditional biomarkers, GAADPB might be an economic method of auxiliary diagnosis.
This study had some limitations. First of all, in our training set, age did not show a significant correlation with HCC occurrence, but previous reports had demonstrated a correlation between age and HCC occurrence, which may be related to the relatively insufficient sample size compared to the number of independent variables in the multivariate analyses (11,13,15). We plan to increase the sample size and make further validation and correction of the GAADPB model to improve the reliability of our findings in future. Besides, this is a retrospective design-based study with data from a single medical center. And the diagnostic performance of DCP was higher than that of Doylestown in the validation set (AUC of DCP: 0.884, 95% CI: 0.841-0.927; Doylestown: 0.845, 95% CI: 0.794-0.896; P <.05). Our further study will focus on validating the GAADPB model based on the real-world HBV-caused HCC in Southeast China.

Conclusions
In sum, the performance of GALAD in discriminating HCC in CLD people of China was excellent. Our GAADPB model, which also enrolled TB and TP, had better performance than GALAD, especially in detecting early stage HCC. GAADPB was also promising in predicting the prognosis of HCC patients. Further study is needed to proving the function of our GAADPB model in Chinese patients.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Ethics statement
The studies involving human participants were reviewed and approved by Shulan Hangzhou Hospital. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.