Construction and validation of a machine learning-based nomogram to predict the prognosis of HBV associated hepatocellular carcinoma patients with high levels of hepatitis B surface antigen in primary local treatment: a multicenter study

Background Hepatitis B surface antigen (HBsAg) clearance is associated with improved long-term outcomes and reduced risk of complications. The aim of our study was to identify the effects of levels of HBsAg in HCC patients undergoing TACE and sequential ablation. In addition, we created a nomogram to predict the prognosis of HCC patients with high levels of HBsAg (≥1000U/L) after local treatment. Method This study retrospectively evaluated 1008 HBV-HCC patients who underwent TACE combined with ablation at Beijing Youan Hospital and Beijing Ditan Hospital from January 2014 to December 2021, including 334 patients with low HBsAg levels and 674 patients with high HBsAg levels. The high HBsAg group was divided into the training cohort (N=385), internal validation cohort (N=168), and external validation cohort (N=121). The clinical and pathological features of patients were collected, and independent risk factors were identified using Lasso-Cox regression analysis for developing a nomogram. The performance of the nomogram was evaluated by C-index, receiver operating characteristic (ROC) curves, calibration curves, and decision curve analysis (DCA) curves in the training and validation cohorts. Patients were classified into high-risk and low-risk groups based on the risk scores of the nomogram. Result After PSM, mRFS was 28.4 months (22.1-34.7 months) and 21.9 months (18.5-25.4 months) in the low HBsAg level and high HBsAg level groups (P<0.001). The content of the nomogram includes age, BCLC stage, tumor size, globulin, GGT, and bile acids. The C-index (0.682, 0.666, and 0.740) and 1-, 3-, and 5-year AUCs of the training, internal validation, and external validation cohorts proved good discrimination of the nomogram. Calibration curves and DCA curves suggested accuracy and net clinical benefit rates. The nomogram enabled to classification of patients with high HBsAg levels into low-risk and high-risk groups according to the risk of recurrence. There was a statistically significant difference in RFS between the two groups in the training, internal validation, and external validation cohorts (P<0.001). Conclusion High levels of HBsAg were associated with tumor progression. The nomogram developed and validated in the study had good predictive ability for patients with high HBsAg levels.


Introduction
Primary liver cancer is the sixth most common cancer and the second leading cause of cancer death worldwide, which poses a huge economic and disease burden worldwide due to its high morbidity and mortality rates (1,2).China is the country with the highest hepatocellular carcinoma (HCC) occurrence and the overall incidence of HCC is expected to continue to climb (3).HCC occurs most often in the setting of chronic liver inflammation and is mainly induced by hepatitis B virus (HBV) infection (4), which is a key risk factor for liver cirrhosis and HCC, capable of increasing the risk of HCC approximately 20-fold (5)(6)(7).For early HCC, surgical resection, liver transplantation, and ablation are recommended treatments.Studies have shown that ablation has similar five-survival rates compared to surgical treatment, and fewer complications than surgery (8,9).However, the recurrence rate after ablation remains high, with a five-year recurrence rate of 50-70% (10).Transcatheter arterial chemoembolization (TACE) is the only guideline-recommended global standard of care for intermediate-stage HCC, and the median progression-free survival time (mPFS) is only 5 months (11).Therefore, diagnosis and treatment of HCC is an increasingly important public health problem.
The first serologic marker of HBV infection is Hepatitis B surface antigen (HBsAg), which can be detected from 2 to 12 weeks after infection with HBV (12).HBsAg clearance, which is currently regarded as the functional cure of chronic hepatitis (CHB), is associated with improved long-term outcomes and reduced risk of complications (13,14).The decline in HBsAg during antiviral therapy is relatively slow, and the seroclearance rate is faster at low serum HBsAg expression (<1000U/L) (15,16).Previous studies revealed that high serum levels of HBsAg increase the risk of developing HCC and have a worse prognosis for patients who have already developed HCC (17).Nevertheless, the prognostic impact of serum HBsAg levels in patients after TACE sequential ablation therapy needs to be further confirmed.
HBV-HCC prognosis is linked to several factors, including tumor burden, AFP, disease stage, ALBI, and NLR (18,19), and there are also nomograms about HBV-HCC (20)(21)(22).However, no nomogram for HCC patients with high HBsAg expression after local treatment has been available to our knowledge.We compared the effects of high levels of HBsAg (≥1000U/L) and low levels of HBsAg (<1000U/L) in HCC patients undergoing TACE and sequential ablation and utilized propensity score matching to minimize selection bias.In addition, we created a nomogram to predict the prognosis of HCC patients with high levels of HBsAg after local treatment to more accurately guide the clinical decision.

Patient selection
This study retrospectively evaluated 1008 HBV-HCC patients who underwent TACE combined with ablation at Beijing Youan Hospital and Beijing Ditan Hospital from January 2014 to December 2021.The diagnosis of HCC was based on the guideline of the America Association for the Study of Liver Diseases (ASSLD) (1,23).The patients at Youan Hospital consisted of 553 patients with a high level of HBsAg and 334 patients with a low level of HBsAg.In order to build a reliable model, the patients from Youan Hospital were divided into the training cohort (N=385) and the validation cohort (N=168).Furthermore, 121 patients from Ditan Hospital were used as an independent external verification cohort to verify the external applicability of the nomogram.The inclusion criteria of patients were as follows (1): Aged 18-80 years (2).received TACE combined ablation (3).Child-Pugh classification was class A or B (4). all patients had not received any other therapeutics before ablation.Exclusion criteria were listed as follows (1): with second primary malignant tumors (2).clinical follow-up data incomplete (3).advanced HCC. (Figure 1).
The study was approved by the Medical Ethics Committee of Youan Hospital and Ditan Hospital and was performed in compliance with the standards of the Helsinki Declaration.The requirement for informed consent was waived because the study was deemed to pose no additional risk to patients and the data were deidentified.

TACE procedure
TACE was conducted by experienced interventional radiologists.Under local anesthesia, percutaneous right femoral artery puncture with a modified Seldinger technique was performed.Angiography was conducted by the 5-F (Terumo, Tokyo, Japan) catheter to identify arterial supply to tumors and to assess the patency of the portal vein.When applicable, a microcatheter was inserted into the blood-supply artery of the carcinoma to inject a mixture of doxorubicin (Pfizer Inc., New York, NY, USA) and lipiodol (Guerbet, Villepinte, France), followed by embolization using embolic materials, such as gelfoam or polyvinyl alcohol particles.The blood flow was monitored until complete vessel occlusion was observed.TACE was repeated thereafter if the lesion is not completely necrotic and the active portion exceeds 50% of the baseline value.

Ablation procedure
Performed under the guidance of computed tomography (CT) and magnetic resonance imaging (MRI) by a qualified interventionalist.The size of the tumor decided the number of electrodes.Routine disinfection and intravenous anesthesia were applied around the puncture points.During RFA, after measuring the baseline impedance, the power was gradually increased from 80w to 200w to reach the maximum impedance.The electrode tip temperature was kept below 20°C by the pump injected cold brine into the electrode chamber.Moreover, to achieve complete ablation, the safe margin for complete ablation of the tumor was 0.5cm.After ablation, the needle track was ablated to prevent postoperative bleeding and tumor implantation along the needle track.Arteriography-enhanced CT was performed immediately after treatment to evaluate the success of the procedure and its complications.

Follow-up
All patients underwent regular follow-ups at the outpatient clinics.Tumor responses were evaluated at approximately 4-6 weeks after ablation by using CT or MRI.For the follow-up protocol, patients were examined every 3 months during the first year and every 6 months thereafter.The contents of the follow-up included blood tests, liver function, and imaging examination to detect tumor recurrence.The study endpoint was recurrence-free Screening flow chart of enrolled patients.

Statistical analysis
Differences between the groups were compared through the ttest, chi-square test, Mann-Whitney U test, and Kruskal-Wallis test, with the purpose of providing median or counts and percentages to summarize baseline variables.Survival and recurrence were calculated using the Kaplan-Meier method, and the log-rank test was used for comparison.Lasso regression was performed for risk factor selection and identified independent risk factors for tumor recurrence were used in Multivariate Cox regression analysis.A nomogram based on independent risk factors to predict recurrence.Subsequently, the performance of the nomogram was validated in the internal validation and the external validation cohort.According to the nomogram scores, the patients were classified as low-risk and high-risk groups, and their recurrence rates were predicted.The receiver operating characteristic (ROC) curves were plotted and the area under the curves (AUCs) was calculated to evaluate prognostic value.Calibration curves and the Hosmer-Lemeshow test were conducted to assess the predictive ability of the nomogram.To estimate the clinical utility of the nomogram, decision curve analysis (DCA) was conducted by calculating the net benefits for a range of threshold probabilities.
To reduce the potential selection bias, 1:1 propensity score matching (PSM) was conducted, with a matching tolerance was 0.1.Matches were made in baseline variables that were previously considered clinically relevant in the literature, comprising age, sex, Child-pugh classification, BCLC stage, tumor size, tumor number, ALT, AST, and AFP.
All data were analyzed with SPSS (version 26.0, IBM, Armonk, NY, USA) and R software (version 4.1.3)in this study, and a P-value less than 0.05 was considered statistically significant (twotailed tests).

Result
A total of 1008 HBV-HCC patients from Beijing Youan Hospital and Beijing Ditan Hospital were screened between January 1, 2014, to December 31, 2021, including 334 patients with low HBsAg levels and 674 patients with high HBsAg levels.After PSM, 293 patients were included in each group (Figure 1).The high levels of HBsAg groups were divided into the training cohort (N=385), internal validation cohort (N=168), and external validation cohort (N=121).The last follow-up until July 1, 2023, and the median follow-up time was 4.05 years (25~75th percentiles, 2.68~7.05years).

Efficacy
After PSM, mRFS was 28.4 months (22.1-34.7 months) and 21.9 months (18.5-25.4months) in the high HBsAg level and low HBsAg level groups, respectively (Figure 2).Because mRFS were significantly shorter in the high HBsAg level (P<0.001), a nomogram for predicting recurrence needs to be developed for the high HBsAg group in order to prompt clinical interventions.

The prediction model was built based on the Lasso-Cox regression 3.2.1 Independent prognostic factors of RFS
The cohort in Beijing Youan Hospital was randomly split in a 7:3 ratio into the training (N=385) and internal validation (N=168) sets.The external validation cohort consisted of patients from Beijing Ditan Hospital.There were no statistical differences between the three groups (P<0.05), which showed that the data grouping was random and reasonable.Lasso regression was used to screen parameters, and the variation characteristics of the coefficient of these variables were shown in Figure 3A.The model exhibited outstanding performance and the least number of independent variables (Figure 3B).The screened variables included age, BCLC stage, tumor size, ALB, Palb, GLB, GGT, and bile acids.Variables screened based on Lasso regression were further subjected to multifactorial COX regression analysis to screen independent risk factors associated with recurrence (Table 3).The final results obtained were age (HR: 1.02, 95% CI: 1.01-1.04),BCLC stage (HR: 1.53, 95% CI: 1.22-1.91),tumor size (HR: 1.44, 95% CI: 1.06-1.94),globulin (HR: 1.02, 95% CI: 1-1.04),GGT (HR: 1.01, 95% CI: 1-1.01), and bile acids (HR: 1, 95% CI: 1-1.01).

Develop the nomogram
The independent predictors found by the Lasso-Cox regression analysis were used to construct a nomogram (Figure 4).In the training cohort, the C-index was 0.682(95%CI: 0.639-0.725),and the time-dependent ROC curve demonstrated that AUCs of 1-, 3-, and 5-year were 0.741, 0.723, and 0.687 (Figure 5).It indicated the good predicting ability of our nomogram.The calibration curves of 1-, 3-, and 5-year demonstrated satisfactory accordance between the nomogram prediction and actual observation.In addition, the clinical value of the nomogram was evaluated using DCA, which provided the net benefits in reasonable threshold probability (Figure 6).
Patients were classified into two groups according to the score of the nomogram: low-risk group and high-risk group.In the training cohort, there were apparent variances in RFS (Figure 7) between the low-risk group (N=193) and high-risk group (N=192) (P<0.001).

Validate the nomogram
To further test the efficacy of the reliability and robustness of our prognostic nomogram, internal and external validations were conducted on the nomogram.In the internal and external    Frontiers in Immunology frontiersin.orgvalidation cohorts, the C-indexes of the nomogram for predicting the RFS were 0.666 (95%CI: 0.613-0.719)and 0.74 (95%CI: 0.696-0.783).The time-dependent ROC revealed that the AUCs of 1-, 3-, and 5-year were 0.702, 0.704, 0.684, 0.792, 0.734, and 0.770 in the internal and external validation cohorts (Supplementary Figure S1).The calibration curves also matched well (Supplementary Figure S2), and the DCA curves of 1-, 3-, and 5-year had good clinical practicability (Supplementary Figure S3).The patients in two validation cohorts were also divided into high-risk and low-risk groups.The recurrence rates in the high-risk groups were significantly higher in the low-risk groups (P<0.001)(Supplementary Figure S4).

Discussion
HCC is one of the most common malignant tumors in the world.In China, the major etiology of the HCC is the HBV infection, which can promote the development and metastasis of the HCC (10,24,25).With the use of 1:1 PSM, our study found that the high level of HBsAg had a higher risk of recurrence than the low level of HBsAg.Consequently, our study is the first to focus on the high level of HBsAg patients who underwent TACE combined ablation to develop and validate a nomogram, which will hopefully predict the recurrence in H-HBsAg patients (High level of HBsAg).At present, there is a lack of a recurrence prediction model for H-HBsAg.We simultaneously created a nomogram by Lasso-Cox regression to accurately predict the prognosis of H-HBsAg patients.
The nomogram contains seven factors to produce the probability of an individual-specific clinical event, including age, tumor size, BCLC stage, globulin, GGT, and bile acid.The scores of the nomogram were obtained by drawing a vertical line at the location of the corresponding total score so that it intersected the three lines predicting the risk of recurrence, and the values shown at the intersection were predicted RFS at 1, 3, and 5 years.The C-index and AUCs of the training cohort and validation cohorts were similar, demonstrating adequate discrimination ability.The calibration curves presented the good prediction performance of the nomogram.Moreover, the nomogram indicated reliable clinical applicability by DCA curves.Patients were divided into two different risk groups according to the nomogram, and RFS was clearly different(P<0.001),which illustrated that our nomogram had a better ability to distinguish H-HBsAg patients to determine the risk of relapse after ablation therapy.The number and size of tumors suggested strong tumor aggressiveness and poor prognosis of HCC, which was currently uncontroversial and needed not to be described here.Liver weight and portal blood flow velocity are reduced in the elderly, resulting in less reparability of the young body.Elderly people have lower immunity and faster tumor progression after treatment, leading to higher recurrence rates and worse prognosis (26, 27).At present, the BCLC system is regarded as an optimal staging system for tumor stage, treatment regimens, and expected survival.The expected survival rate is 50-70% for patients who are BCLC A at 5 years (28,29).When we combined BCLC with other independent prognostic factors, the predictive value for prognosis could improve.GGT may be involved in the balance of oxidant and anti-oxidation, leading to sustained oxidative stress in tumor cells, which can contribute to the process of cancer (30, 31).Various proinflammatory proteins, including immunoglobulins, C-reactive protein, a2 macroglobulin, and fibrinogen are globulins (32,33).Since human immunoglobulins are mainly metabolized by the liver, patients with severe hepatic dysfunction have a reduced ability to clear immunoglobulins, causing hyperglobulinemia (34,35).Bile acid synthesis occurs in liver cells and is the end product of cholesterol metabolism (36).The Systemic homeostasis of bile acid mainly depends on its enterohepatic circulation process, which is of great significance for nutrient absorption and distribution, metabolic regulation, and homeostasis (37).Bile acid metabolism is implicated in tumor progression and hydrophobic bile acids are promoters of HCC (38,39).Besides, reduced Farnesoid X (FXR) receptor signaling during hepatic inflammation induces to decrease in bile acid transporter proteins, resulting in elevated bile acids and persistent hepatic inflammation, which promote the development of HCC (40,41).
The presence of HBsAg is a serologic marker of HBV infection and is used in clinical diagnosis (42,43).HBsAg appears 1-2 weeks after exposure to HBV and precedes the onset of clinical symptoms and other serologic biochemical indicators of infection.There are still 257 million carriers of HBsAg despite the availability of antiviral therapeutics (44,45).Many studies showed that the spontaneous HBsAg seroconversion rate was 1% and the presence of persistent HBsAg was associated with a high risk of HCC and a worse prognosis (46,47).Previous studies by our team have also reported that the prognosis of HCC patients with negative HBsAg expression was better than that with positive HBsAg expression (48).In our study, we investigated the role of HBsAg levels in the recurrence of HCC after local treatment and used PSM to reduce bias.The results revealed that HBV-HCC patients with high HBsAg levels have worse prognosis than those with low HBsAg levels.
In the BCLC Guideline, TACE is recommended for BCLC intermediate stage B HCC.For early-stage HCC, TACE can mark the tumor and achieve tumor downstaging, thereby declining the time and increasing the success rate of ablation (49).Foreign and domestic studies have suggested that combination therapy by TACE and ablation improved overall and progression-free survival compared with TACE alone (50,51).Unlike the conventional univariate analysis, the LASSO regression that we used aimed to select variables for Cox regression to avoid overfitting.Also, the nomogram can be validated by both internal and external validation because our study was a multicenter retrospective study.Simultaneous examination of comprehensive patient features covering demographics, liver function, tumor load, tumor markers, and inflammatory markers was a major strength of our study.The consists of our nomogram are simple and easy to obtain so that the clinicians are able to evaluate the patient's condition in a timely and effective manner.
Several limitations of our study should be addressed.The first one of them is the retrospective nature and it is necessary to strengthen the conclusions by further validations in large prospective studies.Because as a retrospective study, there is inevitable selection bias.Although internal and external validations were conducted by a larger multicenter sample, external validations from other centers are still required in the future.Besides, the patients included in our study all received TACE combined with ablation.Whether the nomogram would be suitable for other treatments such as surgery and liver transplantation requires further investigation.Lastly, the study was conducted only in China, where hepatitis B virus is the principal cause of HCC.Thus, generalizing to other populations in which HBV is not a major causative factor for HCC must be carried out with caution.Nevertheless, we used up to eight years of follow-up to create an accurate and reliable nomogram to better guide clinical practice for this group of HCC patients with high levels of HBsAg.In general, high-risk patients needed more frequent clinical surveillance and appropriate interventions to prevent recurrence and progression.

Conclusion
In summary, high levels of HBsAg were associated with tumor progression and poor prognosis.For high levels of HBsAg patients, we created an accurate and reliable nomogram to predict recurrence based on the Lasso-Cox regression analysis.The nomogram, including age, BCLC stage, tumor size, globulin, GGT, and bile acids, demonstrated adequate discrimination ability, which could better guide the clinical decisions.

FIGURE 2 FIGURE 3
FIGURE 2Kaplan-Meier plot of RFS for HBV-HCC after PSM.

FIGURE 4 6
FIGURE 4Nomogram, including Age, tumor number, BCLC stage, Globulin, GGT, and Bile acid for 1-, 3-, and 5-years recurrence free survival (RFS) in HCC patients with high HBsAg levels in AFP.The nomogram is valued to obtain the probability of 1-, 3-, and 5-years recurrence by adding up the points identified on the points scale for each variable.T.S, tumor size; BCLC, Barcelona Clinic Liver Cancer; GGT, gamma glutamyl transferase.

FIGURE 7
FIGURE 7Kaplan-Meier plots of RFS for the low-risk group and high-risk group in the training cohort.

TABLE 1
Demographics and clinical characteristics before and after PSM.

TABLE 2
Demographics and clinical characteristics for training and validation sets.

TABLE 3
Cox proportional hazards regression to predict recurrence based on Lasso regression.