Accuracy of prognostic serological biomarkers in predicting liver fibrosis severity in people with metabolic dysfunction-associated steatotic liver disease: a meta-analysis of over 40,000 participants

Introduction A prognostic model to predict liver severity in people with metabolic dysfunction-associated steatotic liver disease (MASLD) is very important, but the accuracy of the most commonly used tools is not yet well established. Objective The meta-analysis aimed to assess the accuracy of different prognostic serological biomarkers in predicting liver fibrosis severity in people with MASLD. Methods Adults ≥18 years of age with MASLD were included, with the following: liver biopsy and aspartate aminotransferase-to-platelet ratio (APRI), fibrosis index-4 (FIB-4), non-alcoholic fatty liver disease fibrosis score (NFS), body mass index, aspartate aminotransferase/alanine aminotransferase ratio, diabetes score (BARD score), FibroMeter, FibroTest, enhanced liver fibrosis (ELF), Forns score, and Hepascore. Meta-analyses were performed using a random effects model based on the DerSimonian and Laird methods. The study’s risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2. Results In total, 138 articles were included, of which 86 studies with 46,514 participants met the criteria for the meta-analysis. The results for the summary area under the receiver operating characteristic (sAUROC) curve, according to the prognostic models, were as follows: APRI: advanced fibrosis (AF): 0.78, any fibrosis (AnF): 0.76, significant fibrosis (SF): 0.76, cirrhosis: 0.72; FIB-4: cirrhosis: 0.83, AF: 0.81, AnF: 0.77, SF: 0.75; NFS: SF: 0.81, AF: 0.81, AnF: 0.71, cirrhosis: 0.69; BARD score: SF: 0.77, AF: 0.73; FibroMeter: SF: 0.88, AF: 0.84; FibroTest: SF: 0.86, AF: 0.78; and ELF: AF: 0.87. Conclusion The results of this meta-analysis suggest that, when comparing the scores of serological biomarkers with liver biopsies, the following models showed better diagnostic accuracy in predicting liver fibrosis severity in people with MASLD: FIB-4 for any fibrosis, FibroMeter for significant fibrosis, ELF for advanced fibrosis, and FIB-4 for cirrhosis. Clinical trial registration: [https://clinicaltrials.gov/], identifier [CRD 42020180525].

Due to the burden of this disease, early diagnosis of MASLD is an important clinical strategy to prevent its rapid progression to the most severe stages of the disease.According to different international guidelines, liver biopsy is still considered the gold standard for diagnosing liver fibrosis in MASLD (4,5).However, it is an invasive test that is not free of complications and is not recommended for monitoring disease severity (6).Therefore, the clinical practice guidelines for the management of MASLD recommend the use of non-invasive tests as a resource before the need for liver biopsy in order to stage the disease of fibrosis.These are non-invasive methods that make it feasible to assess disease progression (7).
Similarly, a systematic review of 38 studies aimed to evaluate the common non-invasive tests, NFS, enhanced liver fibrosis (ELF), transient elastography, and MRE, in obese patients with SF, AF, and cirrhosis.Evidence showed better accuracy of complex biomarker panels: NFS: summary receiver operator characteristic (SROC): 0.79-0.81 vs. ELF: 0.96; however, the search focused only on studies published until 2016, in English, in four databases, and in individuals with obesity (9).Finally, a recent meta-analysis of 37 studies evaluated the individual diagnostic performance of liver stiffness measurement by vibration-controlled transient elastography (LSM-VCTE), FIB-4, and NFS to derive diagnostic strategies that could reduce the need for liver biopsies.The AUROC results of individual LSM-VCTE, FIB-4, and NFS for AF were 0.85, 0.76, and 0.73, respectively.However, only two invasive tests were included in just one stage of liver fibrosis (10).
Considering the growing body of evidence and lack of consensus on the diagnostic performance of clinical scores, this systematic review and meta-analysis aimed to assess the accuracy prognostic serological biomarkers (APRI, FIB-4, NFS, BARD score, FibroMeter, FibroTest, ELF, Forns score, and Hepascore) in predicting liver fibrosis severity in people with MASLD.

Materials and methods
This systematic review and meta-analysis followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy Studies (PRISMA-DTA) guidelines (Supplementary Table S1) (11).The protocol for this meta-analysis was registered in the International Prospective Register of Systematic Reviews database (PROSPERO) under the number CRD42020180525.

Literature search strategy
This systematic review aimed to answer the following research questions: What is the diagnostic accuracy of the most clinically used serological biomarkers in predicting liver fibrosis severity in people with MASLD?The strategy was based on the participants, index tests, and target condition (PIT) criteria: P: adults ≥18 years with MASLD; I: APRI, FIB-4, NFS, BARD score, FibroMeter, FibroTest, ELF, Forns score, and Hepascore; and T: liver fibrosis.Liver biopsy was used as the reference standard.
Three authors (SLT, PBR, and COA) independently extracted the following data from the selected articles: first author; year of publication; type of paper; study design; study period; country; institution; number of participants; age (years); sex (percentage of males); race (percentages); BMI [kilograms (kg)/meters 2 (m 2 )]; hypertension (percentage of participants); diabetes (percentage of participants); dyslipidemia (percentage of participants); MetS (percentage of participants); laboratory tests (AST, ALT, AST/ALT ratio, platelets, glycosylated hemoglobin (HbA1C), glycemia, triglycerides, and cholesterol); and score models (APRI, FIB-4, NFS, BARD score, FibroMeter, FibroTest, ELF, Forns score, and Hepascore).For diagnostic parameters, we considered cutoff values, AUROC, Sen, Spe, TP, FP, TN, and FN.When the authors did not describe TP, FP, TN, or FN, these were calculated based on the Sen and Spe and the number of participants in each study to obtain the values for each model.

Risk of bias assessment
Three authors (SLT, PBR, and COA) independently assessed the risk of bias in the primary studies using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) (13).QUADAS-2 is a tool for evaluating the quality of primary diagnostic studies by examining quality separately in terms of "risk of bias" and "concerns regarding applicability." Risk of bias assessment items were organized into four domains: patient selection, index test, reference standard, and flow and timing.The applicability of a study was evaluated for the first three key domains and rated as "yes, " "no, " or "unclear, " where "yes" indicated a low risk of bias, "no" indicated a high risk of bias, and "unclear" indicated a lack of sufficient information (13).Disagreements were resolved by consulting a fourth reviewer (RM) to establish a consensus.The methodological quality of individual studies was visualized using the robvis web app, which depicts the plots obtained from these analyses (14).

Data synthesis and analysis
For inclusion in the meta-analysis, the score model should have been used in at least three studies in predicting liver fibrosis severity in people with MASLD.Diagnostic performance statistics were obtained for each study, including Sen, Spe, diagnostic odds ratio (DOR), positive likelihood ratio (LR+), and negative likelihood ratio (LR-), with their respective 95% confidence interval (95% CI).Then, for the DOR, LR+, and LR-, summarized meta-analytical estimates were obtained using a random effects model based on obtaining the variance between studies using the DerSimonian and Laird methods.Heterogeneity was evaluated using Cochran's Q (Q) statistic and I 2 statistic.The Cochran's Q statistic of homogeneity was measured based on the null hypothesis that all eligible studies have the same underlying effect size.The I 2 statistic, which represents the variability between studies, was 0-40%, 40-70%, and 70-100%, indicating low, moderate, and high variance, respectively (15,16).In addition, summary area under the receiver operating characteristic (sAUROC) curve was obtained using a mixed linear model with known variance estimates according to Reitsma's method.The area under curve (AUC) values were interpreted as follows: <0.5 indicated low accuracy, 0.6 to 0.79 indicated moderate accuracy, 0.8-0.90showed good accuracy, and > 0.90 represented excellent accuracy (17).A sensitivity analysis was performed to assess whether the results changed when only studies that included the most frequently found scores, FIB-4, APRI, and NFS, and without any fibrosis severity (AF, SF and cirrhosis) were used.All calculations were performed with R version 4. The TP, FP, FN, and TN numbers were extracted to construct the 2×2 tables, and the values for each reported test cutoff were calculated.In some studies that did not have the numbers, the prevalence, sensitivity, specificity, and sample size were calculated. 2  The diagnostic accuracy of the index tests was evaluated in the following dichotomized groups: any fibrosis (AnF) (F0 vs. F1-4), SF (F0-1 vs. F2-4), AF (F0-2 vs. F3-4), cirrhosis (F0-3 vs. F4).

Identification and selection of studies
The search strategy identified 2002 articles.Of these, 640 articles were duplicates, leaving 1,362 for title and abstract assessment.At this stage, 1,183 articles were excluded: 353 on other populations with chronic hepatitis; 130 on patients on autoimmune medication; 74 on animal studies; 198 on alcoholic liver disease; and 428 that did not involve the evaluation or validation of model performance.One hundred and seventy-nine studies were read in full, of which 41 studies were excluded: 26 studies did not include patients diagnosed with hepatic fibrosis; 10 on alcoholic liver disease; and 5 duplicates.Thus, 138 articles were included in this systematic review, of which 86 were included in the meta-analysis and met the eligibility criteria in Figure 1.
Table 1.Characteristics of the studies included in the systematic review.

Analysis of the quality and risk of bias in the included studies
The quality assessment was performed using the QUADAS-2 tool as shown in Figure 2. Studies with patients with MASLD and other morbid conditions were considered a high applicability concern due to the consecutive or random sample of patients enrolled, a casecontrol design, and inappropriate inclusions such as populations with diabetes, obesity, high levels of transaminases, and selected age.

Meta-analysis results
For inclusion in the meta-analysis, the score model should have been used in at least three studies in predicting liver fibrosis severity in people with MASLD.Only seven scores (APRI, FIB-4, NFS, BARD score, FibroMeter, FibroTest, and ELF) were used in at least three studies to evaluate the four degrees of liver fibrosis severity (AnF, SF, AF, and cirrhosis) and were therefore meta-analyzed (Supplementary Figure S1).

Diagnosis of SF (F0-F1 vs. F2-F4)
The DOR of the BARD score in the diagnosis of SF was 5.98 (95% CI 2.62-13.66), the LR+ was 2.49 (95% CI 1.72-3.61),the LR-was 0.46 The sensitivity analysis showed that there were no changes in the results when only tests with more than 40% of participants (APRI, FIB-4, NFS, and BARD score) and severities (SF, AF, and cirrhosis) were included (Supplementary Figures S40-S58; Supplementary Table S8).

Discussion
This systematic review and meta-analysis aimed to assess the accuracy of different prognostic serological biomarkers in predicting liver fibrosis severity in people with MASLD.The serological biomarkers varied according to the different degrees of severity of liver fibrosis.For any type of fibrosis, all the models had moderate precision.For significant fibrosis, the FibroMeter, FibroTest, and NFS models had high precision, and APRI, FIB-4, and BARD score had moderate precision.For advanced fibrosis, the ELF, FibroMeter, FIB-4, and NFS models had high precision, and BARD score, FibroTest, and APRI presented moderate precision.Finally, for cirrhosis, only FIB-4 showed high precision, while APRI and NFS had moderate diagnostic precision in the evaluation of this severity.
The APRI showed moderate diagnostic accuracy across all degrees of liver fibrosis severity, from AnF to cirrhosis, the results that are consistent with previous meta-analyses reporting moderate accuracy in assessing AF with this prognostic model.In addition, different studies have reported inconsistencies in predicting liver fibrosis using this score (8,96).Therefore, due to conflicting results regarding the effectiveness of the APRI score, the MASLD practice guideline of the AASLD, American College of Gastroenterology, and American Gastroenterological Association recommends using the FIB-4 or NFS score to identify patients with MASLD with stage 3 or 4 fibrosis (6).Our results support this recommendation as FIB-4 and NFS showed good diagnostic accuracy in the assessment of liver fibrosis severity, for SF and AF, and AF and cirrhosis, respectively.
As science has advanced, several serum tests have been developed using either direct biomarkers (reflecting the pathophysiology of hepatic fibrogenesis) or indirect biomarkers (reflecting functional changes in the liver) alone or in combination (57).Complex panels (such as FibroMeter and ELF) have been shown to be more accurate and reproducible for detecting AF than simple panels (159).Our results support these findings, suggesting that both models have good diagnostic accuracy for AF, whereas simple panels such as APRI and BARD score, although cheaper, easier to calculate, and widely available, are not as accurate as complex panels (159).
Different studies have consistently reported that the ELF model provides good results in the assessment of AF, including the 2021 National Institute of Health and Care Excellence guidelines, which established that for the assessment and treatment of people with MASLD, the ELF score is considered "the most cost-effective and appropriate test for AF in adults with MASLD" (160).However, the reality of clinical practice is different as the ELF score is not accessible to frontline health professionals, which may represent a barrier to the detection of liver fibrosis (9,57).
The FibroTest also showed good diagnostic performance for the assessment of SF in this review.FibroTest and FibroMeter are models that include the analysis of extracellular matrix substances directly involved in the progression of fibrosis and have better Sen and Spe, suggesting that the inclusion of a direct marker of liver fibrosis in a non-invasive test can improve its diagnostic accuracy (8,9).
Another relevant result was that only three models detected AnF: APRI, FIB-4, and NFS.These models are considered simple scores, that is, none of the complex models analyzed in this review identified this severity.Therefore, there is still a lack of studies evaluating any of these models in the assessment of AnF as most scores have focused on the importance of histological determinants of severe fibrosis and its relevance in the development of future disease.However, the identification of AnF in community settings will allow for the implementation of early lifestyle interventions and consequently inform the decision to refer to secondary care in severe cases (62,134).
MASLD is also strongly correlated with MetS.Of the 138 included studies, 54.6% reported at least some component of this syndrome.Two recent reviews have suggested that MASLD is both a cause and a consequence of MetS (161,162).This is because liver fat is presented as a marker of metabolic abnormalities that characterize MetS, and the possibility of MASLD should be considered in all patients diagnosed with MetS with any of the different sets of criteria (161, 162).In the present review, the mean values for both transaminases were above normal, indicating that the studies were conducted in populations with at least some alteration in the serological tests of the liver.In people with MASLD with normal transaminase levels, 16-24% of them may have AF, with the sAUROC for the BARD score, FIB-4, and NFS ranging from 0.71 to 0.85 (99,152).
In this review, we found a mean BMI of 32.8 kg/m 2 in the total study population, which is considered grade-I obesity.The findings of a meta-analysis suggest that there is evidence of a high predictive value of abdominal obesity as an indicator of increased risk of metabolic disorders and cardiovascular disease, as well as evidence supporting the cause-and-effect relationship between abdominal obesity and MASLD (163).A recent review showed that there is less evidence when evaluating the tests in populations of patients with obesity, and non-invasive tests tend to be less favorable in these populations due to differences in terms of BMI and alanine aminotransferase levels, which may mean that serum-based scores derived from the liver clinical setting in groups with different hepatic risk profiles do not adequately reflect the accuracy of these tests in the obese population (9).
Conversely, the present results of prognostic models showing moderate diagnostic accuracy may also be related to the fact that this meta-analysis included a larger number of studies, heterogeneous populations and their variables, and all degrees of fibrosis severity compared to previous meta-analyses (9,10).Although the objective of non-invasive models is not to replace the biopsy, our results highlight the importance of using these models in the evaluation of MALSD patients with suspected liver fibrosis, which determines the prognosis of the disease, as well as the usefulness and feasibility of performing these tests, given the lack of other methods in primary care for these patients (159).

Limitations
However, our meta-analysis has limitations.First of all, there was no stratification of the different models by age, race, weight, and morbidities, only by stages of fibrosis, since few studies were conducted in clinical trials to compare homogeneous populations.Another limitation of the present study is the non-inclusion of imaging biomarkers such as MRE.The decision not to include these biomarkers was made to focus on the serological biomarkers recommended by the guidelines to provide a more comprehensive assessment of their performance.However, this is a study with a large sample of participants, with low heterogeneity between the different studies, which aims to contribute to the generalization of results based on possible limitations in health services.

Conclusion
The findings of this meta-analysis suggest that when comparing the scores of serological biomarkers with liver biopsies for predicting liver fibrosis severity in people with MASLD, the FIB-4 has good predictive diagnostic accuracy for any fibrosis, the FibroMeter has good predictive diagnostic accuracy for significant fibrosis, the ELF has good predictive diagnostic accuracy for advanced fibrosis, and the FIB-4 has good diagnostic accuracy for cirrhosis.These non-invasive serological biomarkers can thus be considered as an alternative to determine the prognosis of this disease.

FIGURE 1 PRISMA
FIGURE 1PRISMA 2020 flowchart of the study selection process.

TABLE 1
Characteristics of studies included in the systematic review.

TABLE 2
Comparison of serological biomarkers in predicting liver fibrosis severity in people with MASLD: DOR; LR+, and LR−.