External validation and improvement of the scoring system for predicting the prognosis in hepatocellular carcinoma after interventional therapy

Background Currently, locoregional therapies, such as transarterial chemoembolization (TACE) and ablation, play an important role in the treatment of Hepatocellular carcinoma (HCC). However, an easy-to-use scoring system that predicts recurrence to guide individualized management of HCC with varying risks of recurrence remains an unmet need. Methods A total of 483 eligible HCC patients treated by TACE combined with ablation from January 1, 2017, to December 31, 2019, were included in the temporal external validation cohort and then used to explore possibilities for refinement of the original scoring system. We investigated the prognostic value of baseline variables on recurrence-free survival (RFS) using a Cox model and developed the easily applicable YA score. The performances of the original scoring system and YA score were assessed according to discrimination (area under the receiver operating curve [AUROC] and Harrell's concordance index [C-statistic]), calibration (calibration curves), and clinical utility [decision curve analysis (DCA) curves]. Finally, improvement in the ability to predict in the different scoring systems was assessed using the Net Reclassification Index (NRI). The YA score was lastly compared with other prognostic scores. Results During the median follow-up period of 35.6 months, 292 patients experienced recurrence. In the validation cohort, the original scoring system exhibited high discrimination (C-statistic: 0.695) and calibration for predicting the prognosis in HCC. To improve the prediction performance, the independent predictors of RFS, including gender, alpha-fetoprotein (AFP) and des-γ-carboxyprothrombin (DCP), tumor number, tumor size, albumin-to-prealbumin ratio (APR), and fibrinogen, were incorporated into the YA score, an improved score. Compared to the original scoring system, the YA score has better discrimination (c-statistic: 0.712VS0.695), with outstanding calibration and the clinical net benefit, both in the training and validation cohorts. Moreover, the YA score accurately stratified patients with HCC into low-, intermediate- and high-risk groups of recurrence and mortality and outperformed other prognostic scores. Conclusion YA score is associated with recurrence and survival in early- and middle-stage HCC patients receiving local treatment. Such score would be valuable in guiding the monitoring of follow-up and the design of adjuvant treatment trials, providing highly informative data for clinical management decisions.


Introduction
Hepatocellular carcinoma (HCC), the most common primary liver cancer, is the sixth most common malignancy worldwide and the third leading cause of cancer-related mortality with very high incidences in China. Approximately 72% of HCC occurs in Asia, of which China accounts for 47% (1,2). Currently, locoregional therapies, such as transarterial chemoembolization (TACE) and ablation, play an important role in the treatment of HCC. Local ablation has become the first-line treatment strategy for patients with early-stage HCC and exhibits similar clinical efficacy to surgical resection (3,4). TACE, the recommended treatment modality for BCLC stage B or intermediate stage HCC, has been proven to prolong overall survival (OS) and recurrencefree survival (RFS) in HCC patients (5,6). However, 50% of patients suffer from recurrence within the first 3 years after local treatment (7), ultimately leading to unfavorable prognoses. Therefore, it is critical to identify patients at high risk of recurrence after locoregional treatment and then guide physicians in clinical decision-making and subsequent management.
More recently, several staging or scoring systems for HCC prognosis, including the Barcelona Clinic Liver Cancer (BCLC) stage (8), Child-Turcotte-Pugh (CTP) class (9), tumor-nodemetastasis (TNM) stage (10), and albumin-bilirubin (ALBI) grade (11) have been applied to assess the prognosis of HCC patients, while no one is most widely accepted with more accurate prediction ability. Meanwhile, considerable tumor heterogeneity remains among patients with different types of tumors and inherent limitations exist in many staging systems whose effects on the local treatment are also debatable, making it difficult to effectively predict the prognosis of HCC patients based on original traditional staging systems (12). Moreover, the computational complexity of the mathematical model is also a shortcoming that limits its application in clinical work. Thus, an easy-to-use prediction scoring system is urgently needed to guide individualized management of HCC with varying risks of recurrence.
Based on the above background, our team developed a novel scoring system built on account of HCC patients diagnosed between 2015 and 2016 to predict the risk of recurrence after local treatment, and it has achieved a good response in clinical use (13). The patients were stratified into low, intermediate, and high-risk groups of recurrence according to predicted probability, with significant statistical differences in RFS among different subgroups. Although our scoring system has important guiding value for screening outpatients in high-risk relapse risk, it has not been externally validated, which will lead to that miscalibration may occur owing to differences in the cases and situations, resulting in lower utility (14,15). Besides, with the completion of our clinical database, our center has enough cases for temporal external validation of the scoring system, which may temper overoptimistic expectations of prediction model performance in independent data (16,17). Hence, we designed this study based on patients in 2017-2019 to externally validate our scoring system, and make further refinement for more accurate prediction performance.

Patients and study design
The patients enrolled in the study were from the Beijing You 'an Hospital, Capital Medical University. A total of 1,053 patients diagnosed in 2017-2019 were screened and 483 eligible patients were ultimately included in the present study (Supplementary Figure S1). Differing from patients in the training cohort who were screened from January 1, 2015, to December 31, 2016 (13), patients in the temporal external validation cohort were screened from January 1, 2017, to December 31, 2019, with the last follow-up of July 1, 2022. Simultaneously, this cohort is used to explore possibilities for refinement of the existing scoring system.
Inclusion criteria for this study were as follows: (1) age ≥18 years and <75 years; (2) patients treated with TACE sequential ablation; (3) patients achieved complete ablation; (4) complete clinical data. Exclusion criteria included as follows: (1) patients with advanced HCC; (2) history of other malignancies; (3) secondary liver cancer; (4) major surgical treatment before 3 weeks of interventional therapy; (5) patients with autoimmune disease, systemic infection or inflammation. Furthermore, the diagnosis of HCC was established by histologic findings and/or the American Association for the Study of Liver Diseases criteria (18).
The study protocol was approved by the Ethics Committee of Beijing You 'an Hospital and complied with the requirements of the Declaration of Helsinki. As a retrospective study, the requirement for patient written informed consent was waived.

Treatment procedures
All patients enrolled were treated with TACE, which was performed by 2 interventional radiologists with at least 5 years of experience. For the procedure, the right femoral artery was cannulated by percutaneous puncture under local anesthesia. Then a super-selective microcatheter was inserted into the supplying artery of the tumor. A mixture of adriamycin and iodine oil was then injected, followed by embolization with gelatin sponge pledgets or polyvinyl alcohol particles. Angiography revealed occlusion of the intratumoral vessels, filling with an embolic agent, and loss of tumor staining, which was considered the end point of embolization. Local ablation was performed under the guidance of CT or magnetic resonance imaging (MRI) within 2 weeks after TACE. The skin was first thoroughly disinfected and covered with a sterile cloth, after which a local anesthetic was injected and the ablation needle was inserted into the skin. Blood pressure, pulse, respiratory rate, and oxygen saturation were monitored during the procedure. After complete ablation was confirmed, coagulation was performed along the needle tract before the probe was removed to prevent needle tract bleeding. Most importantly, the safe ablation range of 0.5-1.0 cm should be reserved to ensure complete coverage of the tumor and achieve complete ablation.

Follow-up and evaluation
Patients were followed up in the 1st month after discharge and then once every 3 to 6 months thereafter. The follow-up contents included a blood routine examination, liver function, AFP, and CT/MRI examinations. All patients were routinely followed up until July 1, 2022.
The criteria for recurrence were the same as the preoperative diagnostic criteria (18), and early recurrence was defined as tumor recurrence diagnosed within two years after treatment. The definitions of RFS and OS as well as a treatment after relapse were consistent with the original manuscript (13).

Statistical analysis
No formal sample size calculation was applied since this was an observational study. Categorical variables are presented as numbers (percentage) and compared using chi-square, ANOVA, or Fisher's exact test, while continuous data are expressed as mean ± standard deviation (SD) and analyzed by Student's t-test or Mann-Whitney U test. Survival curves were plotted by the Kaplan-Meier method and compared by the log-rank test. Moreover, receiver operating characteristics (ROC) analysis was performed to determine the optimal species cutoff.
For the validation of the original scoring system, the area under the receiver operating curve (AUROC) and Harrell's concordance index (C-index) were first used to determine discriminative ability and the corresponding area under the curve (AUC) values for years 1, 2 and 3 years were reported. Meanwhile, the calibration curves at different time points (1, 2, and 3 years) were plotted by bootstrapping with 1,000 resamples to evaluate the performance of the scoring system. Then, 1-, 2-, and 3-year decision curve analysis (DCA) was utilized to investigate the clinical net benefit for decision-making. Finally, improvement in the ability to predict in the different scoring systems was assessed using the Net Reclassification Index (NRI). Improvement to the original scoring system was first analyzed by univariate analyses, and then all variables with P < 0.05 were analyzed using backward stepwise Cox regression which is based on the Akaike information criterion (AIC). Eventually, variables with P < 0.05 in multivariable analysis were used in the establishment of the YA score.
In addition, the YA score was compared with other prognostic models, including monocyte-to-lymphocyte ratio (MLR), neutrophil-lymphocyte ratio (NLR), platelet-lymphocyte ratio (PLR), ALBI grade, and platelets-albumin-bilirubin (PALBI) grade. The discrimination of each model was assessed by estimating the AUC at each time point.
All statistical analyses were conducted in R 4.1.2 statistical software (R Foundation for Statistical Computing, Vienna, Austria) and SPSS 26.0 software (SPSS, Chicago, IL, USA). And all statistical tests were performed using a two-sided significance level of 0.05.

Baseline characteristics of patients in the validation cohort
The baseline characteristics of patients in the validation cohort are shown in (Table 1). Of those, 400 patients (82.8%) were males and 83 (17.2%) were females. The major etiology was hepatitis B virus (HBV) infection, 431 patients (89.2%) had liver cirrhosis and 369 (74.6%) had good liver function (Child-Pugh class A). Concerning the tumor characteristics, 74.3% of patients had a single tumor and 66.3% had a tumor size smaller than 3 cm, with most of the patients having BCLC stages 0 and A (88.4%). Patients were predominantly treated with radiofrequency ablation (64%), and a large proportion of patients were treated with a single ablation (88.6%).
There were no significant differences in baseline characteristics between the two cohorts by comparison with the historical data from the training cohort.

Validation of the original scoring system in the validation cohort
The C-statistic in the validation cohort was 0.695 [95% confidence interval (CI): 0.666-0.724]. The AUCs of the timedependent ROC curve were 0.680, 0.728, and 0.709 for 1-, 2-, and 3-year RFS in the training cohort (13). In the validation cohort, the AUCs at 1, 2, and 3 years were 0.697, 0.787, and 0.813, respectively ( Figure 1). All the results suggested that the original scoring system has a good discriminatory ability for RFS in the validation cohort.
Furthermore, the calibration plots showed an excellent agreement between the scoring system' predicted probability and  Figure S2). Also, the DCA plots showed that the scoring system had a favorable clinical net benefit in the validation cohort ( Figure 2). According to the scores of the original scoring system (13), the patients were divided into three groups: low-, intermediate-, and high-risk. Kaplan-Meier survival analysis was then performed on the RFS of the three groups. The results showed that the median RFS was 20.7 months (95% CI 17.4-24.1) and 12.4 months (95% CI 8.9-15.8) in the intermediate-risk and high-risk groups, and was not reached in the low-risk group (P < 0.001), which indicated a significant discriminatory ability of original scoring system for recurrence risks in the validation cohort ( Figure 3). Improvement of the original scoring system (development of YA score) Next, the scoring system was further optimized to improve the prediction performance. We reclassify the 2017-2019 cohort (the validation cohort mentioned above) into a new training cohort and a validation cohort, with the baseline information for both sets shown in (Table 2).
Based on the HR values of the above seven variables, a scoring system was obtained, which ranged from 0 to 14 by calculating the total score of included parameters ( Table 4). Patients were re-separated equally according to their total score, with scores of 0-4 defined as low risk of recurrence, 5-9 as immediate risk of recurrence, and 10-14 as high risk of recurrence. The resulting score was named the YA score (the score of Beijing You 'An Hospital).   Comparison of the ROC curves of the original scoring system at different time points in the validation cohort. Abbreviations: ROC, receiver operating characteristics; AUC, area under the curve.  Predictive performance of the YA score in the training cohort To evaluate the discriminatory power of the YA score, we plotted the ROC curve and calculated the AUC in the training cohort. Firstly, the C-statistic for the YA score was 0.712 (95% CI: 0.675-0.749). As for the time-dependent AUCs of 1, 2, and 3 years, the YA scores were 0.723, 0.844, and 0.891 (Figure 4), respectively, which were significantly better than the results of the original scoring system (0.697, 0.787 and 0.813), showing the prominence of discrimination in the YA score.
When comparing the Cox model fit with Kaplan-Meier plots, good agreement (calibration) between the predictions from the YA score to the observed probabilities was observed (Supplementary Figure S3). Meanwhile, the calibration plots in the YA score at 1, 2, and 3 years showed in Supplementary Figure S4 and also present an excellent agreement between the predicted probability and observed probability.
Lastly, the DCA curves suggested that using the YA score to predict RFS could increase the net benefit over the original scoring system ( Figure 5).
Besides, the NRI was used to evaluate the improvement of risk prediction. The NRI of the 1-, 2-and 3-year was 0.276 (95% CI: 0.158-0.676), 0.682 (95% CI: 0.443-0.913), and 0.826 (95% CI: 0.657-0.927), respectively, suggesting that the YA score has more significant potential for the correct prediction of recurrence compared to the original scoring system.
Predictive performance of the YA score in the validation cohort As for the time-dependent AUC at 1, 2, and 3 years in the validation cohort, the YA scores were 0.811, 0.847, and 0.902, respectively (Supplementary Figure S5), with a C-statistic of 0.787 (95% CI: 0.739-0.834). The calibration curves for the YA score demonstrated good agreement in the validation cohort (Supplementary Figure S6). And the DCA curves in the validation set al.so revealed that using YA scores to predict RFS can increment the net benefit (Supplementary Figure S7).

Clinical application value of the YA score
Based on the score of the YA score, the patients were divided into three groups. KM curves of RFS were then plotted, showing that the median RFS was 18.7 months (95% CI 15.7-21.7) and 13.8 months (95% CI 11.8-15.8) in the intermediate-risk and high-risk groups, and was not reached in the low-risk group. Note that, by the end of follow-up, half of the low-risk had not yet relapsed, while about 50% of the high-risk group relapsed in the first year, which indicated a significant discriminatory ability for HCC patients at high risk for recurrence predicted by the YA score (P < 0.001) ( Figure 6A). The YA score also has an excellent clinical application value for OS ( Figure 6B), which was similar to the previous results (13).
In addition, the YA score provided better forewarning management of early relapse, with a C-statistic of 0.707 (95% CI: 0.668-0.746). And the calibration curves for the probability of 1-and 2-year RFS showed good agreement between prediction and observation in 181 early recurrence patients with HCC (Supplementary Figure S8).

Comparison with other prognostic scores
We compared the predictive capacity of the YA score with those of five conventional prognostic scores. The outcomes suggested that the scoring system shows better discriminative power, which was markedly higher than the other five scores ( Table 5).

Discussion
The original scoring system established in 2019 was used in a clinical trial to screen out HCC patients who were at high risk of relapse and then gave them anti-PD-1 immunotherapy after local treatment (TACE combined with ablation) to reduce the risk of recurrence. The results showed that the scoring system could stratify patients based on the different risks of relapse. Meanwhile, immunotherapy could effectively reduce the recurrence rate of HCC patients with high relapse risk predicted by our scoring system (19). Even though the scoring system had good clinical value, it was not validated externally due to the limitation of the number of cases at that time.
In this study, we performed temporal external validation of the original scoring system, and the results showed that it had good discrimination, with a C-index of 0.695. Meanwhile, the timedependent AUC of 1 year, 2 years, and 3 years in the validation cohort were similar to the results in the training cohort. Also, the calibration curve and the DCA curve revealed that the  Fibrinogen (mean ± SD), g/L 3.09 ± 0.99 3.07 ± 0.94 0.869 RFA, radiofrequency ablation; MWA, microwave ablation; AHC, argon-helium cryoablation; HBV, hepatitis B virus; HCV, hepatitis C virus; ALD, alcohol liver disease; AFP, alpha-fetoprotein; ALT, alanine aminotransferase; AST, aspartate aminotransferase; γ-GT, gamma-glutamyltransferase; APR, albumin-toprealbumin ratio, the APR was estimated as the albumin divided by the prealbumin.
Qiao et al. 10.3389/fsurg.2023.1045213 Frontiers in Surgery original scoring system had high accuracy and positive net benefit, reconfirming the validity of the scoring system in predicting recurrence in HCC patients. During the external validation process, we improved the original scoring system and finally developed a YA score based on seven variables. Additionally, the C-index of the YA score is better than the original scoring system (0.712 vs. 0.695), and the time-dependent AUC also shows significant superiority. Although the KM curves showed no significant difference between the two scores, the YA score could predict the recurrence of HCC patients more accurately by evaluating the NRI.
The occurrence and development of HCC is a complex process with numerous contributing factors. And diversity in tumor burden and liver function reserve exerts a crucial impact on the survival and clinical course of HCC (20). Thus, the predictive power of the scoring system could be further improved by giving a comprehensive evaluation of relevant variables (21). Yet for the convenience of clinical use, most models usually only included little clinical markers or radiological imaging outcomes, limiting the prediction effect of the model. In our study, the YA score with 7 risk variables covering tumor burden (tumor size and tumor number), serum tumor markers (AFP and DCP), liver function (APR), coagulation function (Fib), and gender, was established to dramatically enhance the predictive reliability. The other advantage of the YA score is that all parameters containing clinically available serologic markers and imaging results are easily accessible and contribute to clinical workup. Simultaneously, owing to the simplicity of the calculation, the patient's recurrence risk score can be comfortably calculated depending on the YA score, allowing clinical follow-up decisions to be made.
As we all know, the male gender has been a commonly recognized risk factor for HCC recurrence (22), and increasing evidence indicates that the prognosis of HCC may be related to gender disparity, with males having worse outcomes (23). Apart from that, indicators of liver functions are also associated with HCC, as abnormalities in liver function that persist may lead to  inflammation, immune microenvironment disorder, and oxidative stress. In recent years, albumin and prealbumin, two serum biomarkers of liver function, have been demonstrated in several studies to be independent predictors of long-term prognosis for HCC (24). The APR was found to strongly predict recurrence after ablation in HCC patients in our previous study (13). In parallel, the HR of APR in the current study is 3.46, which remains an important variable for anticipating the risk of recurrence. Fibrinogen, an acute phase reactant produced by the liver in the presence of malignancy and/or systemic inflammation, is increased in patients with HCC, emerging as a novel predictor of clinical outcome (25).
Two new variables, tumor number, and DCP are added to the YA score. Contrary to our previous studies, the tumor number is an independent predictor of RFS in the current study, which could enhance the stability of the YA score for the reason that the combination of tumor number and tumor size could represent the tumor burden strongly correlated with recurrence after ablation in HCC patients, making it more effective to boost the predictability of YA score (26-28). Further, combined with other tumor markers could promote sensitivity, and specificity and make a reliable prognosis (29)(30)(31). The serum level of AFP correlated closely with tumor differentiation and aggressiveness (32), and was also a suggested indicator of hepatitis activity and severity, predicting the prognosis of HCC patients (33). DCP, another widely used highly specific diagnostic marker for HCC, could be a potentially potential predictor marker (34). In particular, DCP, having potential significance for the diagnosis of AFP-negative (35), may also be a prognostic supplement for AFP-negative patients. Our previous study was not evaluated for DCP for lack of validated data. In the present study, DCP is integrated into the YA score to strengthen the discriminatory ability of special populations.
As a first-line treatment for early-stage HCC, ablation therapy can produce comparable 5-year overall survival for HCC patients with early-stage compared to surgical resection (36, 37). While TACE is primarily recommended for patients with intermediate-stage, patients who cannot benefit from curative treatment, despite earlier-stage disease, could be good FIGURE 4 Comparison of the ROC curves of the YA score at different time points in the training cohort. Abbreviations: ROC, receiver operating characteristics; AUC, area under the curve.  ). However, few studies have examined prognostic markers in HCC patients treated with TACE and ablation, making the need for a scoring system in patient selection for combined (TACE + ablation) treatment increased. YA score, as a novel scoring system, can effectively predict the prognosis of patients after sequential treatment. Compared with other HCC staging systems, the YA score is substantially outperformed in predicting recurrence. As well, our results also further reveal critical correlations between tumor burden, tumor markers, liver function, and early recurrence of HCC, providing useful perspectives for the exploration of early recurrence mechanisms. The outstanding predictive power facilitates the early detection of recurrent HCC, thereby reducing patient recurrence and improving the quality of life. Note that, patients were stratified into three subgroups according to the YA score, demonstrating that the YA score provides valid differentiation between patients with different risks of recurrence and death, which is favorable for the guidance of physicians in the close monitoring and adjuvant treatment.
Nevertheless, several limitations were associated with this study. Firstly, this study was a retrospective study conducted in a single center with selection bias, while a large number of cases and external validation over time enhance the generalization ability of the scoring system. Next, our scoring system was developed based on HCC patients of early-to-mid stage receiving local treatment, lacking the capability to predict the prognosis of patients with advanced HCC or patients treated with surgery or liver transplantation. With the improvement of medical treatment, however, a wider range of patients is being detected at an early stage, raising the prospect of the scoring system. Finally, our scoring system lacks external validation in other centers, requiring multi-center and large sample experiments for further analysis.

Conclusion
In summary, by externally validating and improving the original scoring system, this study established the YA score, a novel, noninvasive, efficient, and feasible tool for predicting the postoperative prognosis of HCC patients after undergoing TACE plus ablation therapies, providing highly informative data for clinical management decisions.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s. The DCA curves of the comparison between original scoring system and the YA score in 1(A), 2(B), and 3(C) years of RFS. Abbreviations: RFS, recurrencefree survival; DCA, decision curve analysis.

Ethics statement
The study protocol was approved by the Ethics Committee of Beijing You'an Hospital. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.