Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Endocrinol., 29 October 2025

Sec. Translational and Clinical Endocrinology

Volume 16 - 2025 | https://doi.org/10.3389/fendo.2025.1681686

From traditional metabolic markers to ensemble learning: comparative application of machine learning models for predicting NAFLD risk in adolescents

Chenming Zhang&#x;Chenming Zhang1†Bin Niu,&#x;Bin Niu2,3†Rong Wang,&#x;Rong Wang2,3†Liaoyun Zhang*Liaoyun Zhang2*
  • 1Academy of Medical Sciences, Shanxi Medical University, Taiyuan, China
  • 2Department of Infectious Diseases, The First Hospital of Shanxi Medical University, Taiyuan, China
  • 3Graduate School, Shanxi Medical University, Taiyuan, China

Background: Non-alcoholic fatty liver disease (NAFLD) is increasingly prevalent among adolescents and poses a significant public health challenge. Due to limitations in imaging and invasive diagnostic methods such as liver biopsy, there is a pressing need for accurate, cost-effective, and non-invasive risk prediction tools. This study aims to develop and compare multiple machine learning (ML) models to predict NAFLD risk in adolescents using routine anthropometric and laboratory data from the National Health and Nutrition Examination Survey (NHANES) 2011–2020 dataset.

Methods: Data from 2,132 U.S. adolescents (NHANES 2011–2020) were analyzed. Nine machine learning (ML) models were developed using features selected by Light Gradient Boosting Machine (LightGBM). Performance was assessed by AUC, accuracy, sensitivity, precision, F1-score, and calibration. The Extra Trees (ET) model was further compared with TyG-based logistic regression models. Model interpretability was evaluated using SHapley Additive exPlanations (SHAP), and an interactive online prediction tool was deployed.

Results: NAFLD prevalence was 13.0%. The ET model achieved the best overall performance (AUC = 0.784, ACC = 0.773, Kappa = 0.320), outperforming other ML algorithms and TyG-based models, which showed higher sensitivity but poorer precision. SHAP analysis identified waist circumference, triglycerides, insulin, and HDL as key predictors, revealing nonlinear threshold effects. The online tool allows individualized risk estimation based on routine clinical variables.

Conclusion: The ET-based ML model provides an accurate and interpretable approach for adolescent NAFLD risk prediction. By surpassing traditional metabolic indicators and offering an accessible web-based calculator, it supports scalable, cost-effective early screening and targeted prevention strategies.

1 Introduction

Non-alcoholic fatty liver disease (NAFLD) is characterized by excessive hepatic fat accumulation and is closely associated with insulin resistance, histologically defined as steatosis in more than 5% of hepatocytes (1). NAFLD has become a leading cause of chronic liver disease worldwide, with an estimated global prevalence of 32% (40% in males and 26% in females) (2). In 2024, the European Association for the Study of the Liver (EASL) recommended replacing the term NAFLD with metabolic dysfunction–associated steatotic liver disease (MASLD) (3). However, because the NHANES dataset and most prior epidemiological studies still adopt NAFLD, this terminology is retained in the present study. The prevalence of NAFLD has been reported to reach nearly 70% among overweight individuals (4), N and while disease progression is often slow, it can lead to fibrosis, cirrhosis, hepatocellular carcinoma, or end-stage liver disease in a subset of patients (5). I In recent years, pediatric NAFLD has risen in parallel with the global obesity epidemic, highlighting the urgent need for early detection and prevention strategies (6).

Despite its growing incidence, there is no consensus on standardized diagnostic criteria for NAFLD in adolescents. Liver biopsy remains the diagnostic gold standard but is invasive and unsuitable for large-scale use (7). Non-invasive imaging techniques, such as vibration-controlled transient elastography (VCTE), point shear wave elastography (pSWE), two-dimensional shear wave elastography (2D-SWE), and magnetic resonance elastography (MRE), as well as MRI-based methods (e.g., corrected T1 mapping, diffusion-weighted imaging), offer promising alternatives but face limitations in cost, availability, and pediatric accuracy (8).

Parallel to advances in imaging, artificial intelligence (AI) has gained prominence in healthcare, with machine learning (ML) enabling more objective risk prediction and individualized treatment strategies (9, 10). Traditional NAFLD risk assessment often relies on clinician judgment or simple indices, which may lack precision. In this context, the triglyceride–glucose (TyG) index and its derivatives have been proposed as surrogate markers of insulin resistance, showing predictive value in NAFLD and metabolic disorders (11, 12). However, these linear constructs are based on limited variables and may not capture the multifactorial, nonlinear nature of NAFLD. This gap highlights the promise of ML methods in refining risk prediction, particularly in youth populations.

Recent ML studies have developed predictive models for progression from NAFLD to more severe outcomes such as NASH, fibrosis, and hepatocellular carcinoma in adults (13, 14). Although some research has applied ML to adolescents, limitations remain, including reliance on complex predictors and lack of validation in real-world settings (1517). Therefore, the present study aimed to identify key predictors using robust feature selection strategies and to develop an interpretable ML-based system for predicting adolescent NAFLD. By leveraging readily obtainable clinical and laboratory indicators, this approach provides a cost-effective and scalable tool to support early screening and intervention. Materials and methods.

2 Materials and methods

2.1 Data sources and study population

The National Health and Nutrition Examination Survey (NHANES) is a population-based cross-sectional survey of U.S. adults and children, publicly available for epidemiological and clinical research. Data from the 2011–2020 cycles were used, and individuals aged 11–20 years were included. Further details are available on the NHANES website (https://www.cdc.gov/nchs/nhanes/index.html).

Sociodemographic variables included age, sex, and race/ethnicity (Mexican American, Other Hispanic, Non-Hispanic White, Non-Hispanic Black, and other races). Anthropometric measures were height, weight, waist circumference (WC), and body mass index (BMI). Laboratory parameters included white blood cell count (WBC), red blood cell count (RBC), platelet count (PLT), hemoglobin (HB), glycated hemoglobin (HbA1c), total cholesterol (TC), triglycerides (TG), high-density lipoprotein (HDL), low-density lipoprotein (LDL), fasting glucose (GLU), and fasting insulin. NAFLD was defined as alanine aminotransferase (ALT) >26 IU/L in males and >22 IU/L in females, without viral hepatitis, consistent with prior NHANES-based studies (18, 19).

The NHANES protocol was approved by the National Center for Health Statistics (NCHS) Research Ethics Review Board. Written informed consent was obtained at the time of data collection. As only de-identified, publicly available data were used, no additional institutional approval was required.

2.2 Feature selection

The Light Gradient Boosting Machine (LightGBM) package in Python was first used to rank variables by importance, and the top 10 predictors were retained based on AUC contribution. Because LightGBM also served as a benchmark model, feature stability was confirmed with a consensus strategy combining L1-penalized logistic regression, Boruta, and permutation importance. Six predictors (WC, TG, insulin, GLU, weight, BMI) were consistently identified, reducing bias toward tree-based methods (Supplementary Figures S1, S2).

2.3 Model construction, evaluation and validation

Using Python libraries including scikit-learn, XGBoost, and LightGBM, we constructed and evaluated nine supervised algorithms: artificial neural network (ANN), decision tree (DT), Extra Trees (ET), gradient boosting (GB), k-nearest neighbors (KNN), LightGBM, random forest (RF), support vector machine (SVM), and XGBoost. To address class imbalance (13% NAFLD vs. 87% non-NAFLD), the Synthetic Minority Oversampling Technique (SMOTE) was applied to the training set within each fold. Hyperparameters were optimized via grid search with five-fold stratified cross-validation (Supplementary Table S1), and final models were retrained on the full training set. Model performance was assessed by discrimination, calibration, and clinical utility.

To compare with traditional indicators, three metabolic indices were included: the triglyceride-glucose index (TyG), TyG-BMI, and TyG-waist circumference (TyG-WC). The TyG index was calculated as:

TyG=ln(TG(mg/dL)×FPG(mg/dL)2)

where TG is triglycerides and FPG is fasting plasma glucose. Each index was first used in logistic regression models, followed by multivariate models combining TyG and its derivatives. These were systematically compared with ET models trained on the same dataset.

For interpretability, SHapley Additive exPlanations (SHAP) values were used to quantify each feature’s contribution. SHAP identified the most influential predictors, revealed nonlinear effects, and enabled individualized risk profiling. To enhance accessibility, we developed a user-friendly online prediction tool using Streamlit (https://jd82bumajen97hthfgjsmr.streamlit.app/; source code available on GitHub: https://github.com/moresaying98/NAFLD-adolescence). The interface follows the logic of SHAP force plots: after inputting individual anthropometric and laboratory values, it outputs a personalized NAFLD risk probability along with feature-specific contributions for intuitive interpretation.

2.4 Statistical analysis

Statistical analyses were performed using R (version 4.3.0) and Python (version 3.10.6). Normally distributed variables were expressed as mean ± standard deviation (SD) and compared using independent t-tests. Non-normally distributed variables were expressed as median (Q1, Q3) and compared with Mann–Whitney U tests. Categorical variables were presented as n (%) and compared using chi-square or Fisher’s exact tests. A two-sided p < 0.05 was considered statistically significant.

3 Results

After initial screening, 83 participants with viral hepatitis were excluded, leaving a preliminary sample of 7,929 individuals. Due to missing laboratory data, 5,638 participants were further excluded, reducing the sample size to 2,291. An additional 93 individuals were excluded due to incomplete physical examination data, resulting in 2,198 eligible participants. Finally, 66 more participants were excluded because of missing laboratory parameters, yielding a final analytic sample of 2,132 adolescents. Among them, 1,854 participants were identified as NAFLD-free, while 278 were diagnosed with NAFLD. A detailed flowchart of the participant selection process is presented in Figure 1.

Figure 1
Flowchart showing the selection process for adolescents from NHANES 2011-2020. Starting with 8,012 participants, 83 with viral hepatitis are excluded, leaving 7,929. Next, 5,686 without TG and GLU data are excluded, leaving 2,291 participants. Exclusion of 93 without examination data leaves 2,198. Finally, exclusion of 66 without laboratory data results in 2,132 participants, of which 278 have NAFLD and 1,854 do not.

Figure 1. Flowchart of participant selection, showing exclusion criteria and the final sample of 2,132 adolescents used for model development.

A total of 2,132 participants were included in this study based on the inclusion criteria. Among them, 1,854 (86.96%) were classified as NAFLD-negative, while 278 (13.04%) were diagnosed with NAFLD. The baseline characteristics and group-wise comparisons are summarized in Table 1. Regarding anthropometric measures, participants in the NAFLD group had significantly higher height (167.69 ± 10.38 cm), weight [82.00 (65.10, 101.50) kg], BMI [28.50 (23.65, 34.68)], and waist circumference [94.10 (80.40, 110.45) cm] compared with those in the non-NAFLD group (all p < 0.001). In terms of hematological indicators, the NAFLD group showed significantly higher levels of white blood cell count [6.40 (5.30, 7.70)], red blood cell count [5.03 (4.72, 5.36)], hemoglobin [14.40 (13.40, 15.40)], and platelet count [257.50 (216.25, 293.00)] than the non-NAFLD group (all p < 0.05). In contrast, the HDL level was significantly lower in the NAFLD group compared to the non-NAFLD group [45.00 (39.00, 54.00) vs 52.00 (45.00, 61.00), p < 0.001]. Additionally, the NAFLD group exhibited significantly higher levels of TG, low-density LDL, TC, GLU, glycated HbA1c, and insulin than the non-NAFLD group (all p < 0.05). Among demographic variables, the proportion of males was significantly higher in the NAFLD group compared to the non-NAFLD group (63.67% vs. 49.24%, p < 0.001). Significant differences in racial/ethnic distribution were also observed between the two groups (p < 0.001).

Table 1
www.frontiersin.org

Table 1. Baseline characteristics of participants stratified by NAFLD status.

In this study, feature selection was performed using the LGBM algorithm implemented in Python. Initially, all variables were ranked based on feature importance scores derived from the initial LGBM classifier, and the top 15 variables were selected for further analysis (Figure 2). These variables were then sequentially added to the model in order of descending importance, and a series of LGBM classifiers were constructed to assess the incremental contribution of each variable to model performance. Model performance was evaluated using the (AUC). As shown in Figure 3, the AUC increased with the sequential addition of variables but plateaued after the inclusion of the 10th variable, indicating no substantial performance gain beyond that point. Therefore, the top 10 variables were selected for final model development. The final 10 key predictors included: WC, insulin, TG, PLT, Height, GLU, WBC, TC, RBC, and HDL.

Figure 2
Horizontal bar chart titled “Top 15 Feature Importance,” showing features ranked by importance. WC is the most important, followed by Insulin, TG, PLT, Height.cm, GLU, WBC, TC, RBC, HDL, HGB, LDL, BMI, Weight.kg, and HbA1c. Importance is measured along the x-axis. Bars are in descending order of importance.

Figure 2. Variable importance ranking from the Light Gradient Boosting Machine (LGBM) model.

Figure 3
Bar and line chart depicting feature contribution versus AUC performance for the top ten features. Feature importance is shown on the left y-axis, and Mean AUC is on the right y-axis. Each feature's bar contrasts with a line graph displaying AUC trends across features. The chart includes features like WC, Insulin, TG, and others. Bars are shaded, while red and black lines show the AUC performance, highlighting variations across features.

Figure 3. Feature selection using LGBM, with AUC plateauing after the 10th variable; top 10 predictors retained.

Nine machine learning models were developed and evaluated using the top ten selected features. Figure 4 displays the ROC curves for all models in both training and testing datasets. In the training set, AUCs ranged from 0.804 (SVM) to 1.000 (RF), with most models achieving values above 0.90. In the independent test set, AUCs were more modest, ranging from 0.671 (DT) to 0.788 (SVM). Specifically, the AUCs (95% CI) for each model were: ANN, 0.715 (0.656–0.770); DT, 0.671 (0.609–0.738); ET, 0.784 (0.724–0.845); GB, 0.762 (0.700–0.825); KNN, 0.740 (0.686–0.790); LGBM, 0.739 (0.675–0.808); RF, 0.760 (0.700–0.827); SVM, 0.788 (0.729–0.849); and XGBoost, 0.768 (0.707–0.830). Detailed classification metrics including accuracy, sensitivity, specificity, precision, F1-score, and Kappa are summarized in Table 2. Figure 5 shows the confusion matrices of the nine models on the testing set. The proportion of correctly classified non-NAFLD participants ranged from 63.4% (KNN) to 83.1% (ET), while correct identification of NAFLD cases varied between 41.0% (DT) and 74.7% (KNN). Models such as ET, RF, GB, and SVM demonstrated relatively high true negative rates, whereas KNN and GB achieved comparatively higher true positive rates. Detailed counts and proportions for each cell of the confusion matrices are displayed in Figure 5. Figure 6 presents the calibration curves for all models. Most algorithms showed acceptable agreement between predicted and observed probabilities, though calibration varied. In the test set, Brier scores ranged from 0.074 to 0.246, with ET, LGBM, GB, and XGBoost showing closer alignment to the reference line, while SVM and DT deviated more substantially. Figure 7 displays the decision curve analysis (DCA). Across a wide range of threshold probabilities, tree-based ensemble models generally achieved higher net clinical benefit than single classifiers. Among the nine models evaluated, the Extra Trees (ET) algorithm achieved the best overall performance (AUC = 0.784, Brier score = 0.074) with the highest net clinical benefit. ET was therefore selected as the optimal model for subsequent comparison with traditional metabolic indicators.

Figure 4
Side-by-side ROC curve plots for model comparison. Panel A shows training set performance with Random Forest having the highest AUC of 1.000. Panel B shows test set performance with Extra Trees having the highest AUC of 0.784. Both panels measure true positive rate against false positive rate, with a legend detailing different models and their AUC values.

Figure 4. ROC and DCA curves for nine machine learning models. (A) Training set. (B) Testing set.

Table 2
www.frontiersin.org

Table 2. Performance comparison of nine machine learning models for NAFLD prediction in the testing set.

Figure 5
Nine confusion matrices for different machine learning models: ANN, Decision Tree, Extra Trees, Gradient Boosting, KNN, LightGBM, Random Forest, SVM, and XGBoost. Each matrix is divided into four quadrants, displaying the percentage and count of true positive, true negative, false positive, and false negative predictions for NAFLD classification. Darker shades indicate higher values. A rate color bar is on the right.

Figure 5. Confusion matrices of nine machine learning models.

Figure 6
Panel A shows a calibration curve for the training set with various machine learning models, compared against a perfectly calibrated line. Panel B shows similar results for the test set. Models include ANN, Decision Tree, Extra Trees, Gradient Boosting, KNN, LightGBM, Random Forest, SVM, and XGBoost, each with corresponding confidence intervals. The x-axis is mean predicted value, and the y-axis is the fraction of positives.

Figure 6. Calibration curves of nine machine learning models. (A) Training set. (B) Testing set.

Figure 7
Graph A and B depict decision curve analyses for training and test sets, respectively. Both graphs plot net benefit against threshold probability, comparing models including ANN, Decision Tree, Extra Trees, Gradient Boosting, KNN, LightGBM, Random Forest, SVM, and XGBoost. Each model is represented with distinct colored lines. A black line indicates treating all, and a dashed gray line indicates treating none.

Figure 7. Decision curve analysis (DCA) of nine machine learning models.

Figure 8 and Tables 3, 4 compare the ET model with logistic regression models based on the TyG index and its derivatives. In the training set, AUCs ranged from 0.675 (TyG) to 0.958 (ET), while in the test set they ranged from 0.675 (TyG) to 0.784 (ET). Among the TyG-based models, derivatives such as TyG-BMI (AUC = 0.748), TyG-WC (AUC = 0.760), and multi-feature combinations (AUC up to 0.768) showed improved discrimination over TyG alone but remained inferior to ET. On the test set, the ET model achieved higher overall accuracy (0.773), precision (0.324), and Kappa (0.320), reflecting more balanced classification. In contrast, TyG-derived models often reached higher sensitivity (e.g., TyG-WC = 0.823) but at the expense of reduced precision and agreement, suggesting a tendency to overclassify positive cases. In summary, the ET model outperformed commonly used TyG-based traditional indicators, providing more reliable and balanced predictive performance.

Figure 8
Receiver Operating Characteristic (ROC) curves for different models. Panel A shows basic models on the training set, with curves for TyG, TyG-BMI, TyG-WC, and ET. Panel B displays the same models on the test set. Panel C illustrates logistic regression models on the training set, with combinations of TyG and clinical variables. Panel D shows these models on the test set. The x-axis represents the false positive rate, and the y-axis represents the true positive rate. Each curve includes an Area Under the Curve (AUC) value indicating model performance.

Figure 8. ROC curves comparing the Extra Trees (ET) model with TyG-based logistic regression models. (A) Training set – basic models. (B) Test set – basic models. (C) Training set – logistic regression models. (D) Test set – logistic regression models.

Table 3
www.frontiersin.org

Table 3. Performance comparison between the ET model and TYG-based indicators in the testing set.

Table 4
www.frontiersin.org

Table 4. Performance comparison of the ET model and logistic regression models based on TYG and its derived indices in the testing set.

SHAP analysis confirmed waist circumference (WC), triglycerides (TG), insulin, red blood cell count (RBC), and HDL as the most influential predictors of adolescent NAFLD, with glucose and platelet count also contributing (Figure 9). The SHAP summary plot (Figure 9B) demonstrated how higher WC, TG, and insulin levels increased risk, whereas higher HDL was protective. Dependence plots (Figure 10) further revealed nonlinear threshold effects, such as sharp risk increases at elevated WC and TG. At the individual level, SHAP force plots (Figure 11) decomposed predictions into feature-specific contributions, estimating, for example, a NAFLD probability of 0.56 versus 0.44 for non-NAFLD in a representative case. These visualizations provide clinically interpretable insights at both population and patient levels. Notably, the online risk prediction tool developed in this study adopts a similar framework: users input individual clinical and laboratory values, and the system generates a SHAP-like explanation of their personalized NAFLD risk. Together, these visualizations enhance both population-level interpretation and individual-level applicability.

Figure 9
Graph A depicts a horizontal bar chart of feature importance using SHAP values, with WC being the highest. Graph B displays a beeswarm plot of SHAP values, indicating feature impacts, with color gradation from blue to red signifying low to high feature values.

Figure 9. (A) Ranked feature importance of the ET model. (B) SHAP summary (beeswarm) plot showing direction and magnitude of feature contributions.

Figure 10
Nine scatter plots displaying SHAP values against different variables: WC, TG, RBC, HDL, Insulin, Height, GLU, TC, and PLT. Each plot features blue data points, a fitted line, and a red dashed line indicating a reference level. Patterns show various degrees of correlation and trends between SHAP values and each variable.

Figure 10. SHAP dependence plots for the nine most influential variables, illustrating nonlinear associations.

Figure 11
Diagram comparing two data scenarios labeled A and B. Both charts feature horizontal arrows indicating various value contributions. Chart A shows higher values for GLU, WBC, height, RBC, TG, and WC, while HDL and insulin are lower. The function value is 0.56. Chart B is the inverse with HDL and insulin higher, and other values lower, resulting in a function value of 0.44. Both charts use red for higher and blue for lower values.

Figure 11. SHAP force plots for the first test-set participant: (A) predicted probability for NAFLD = 0.56; (B) predicted probability for non-NAFLD = 0.44.

4 Discussion

In this study, we developed predictive models for adolescent NAFLD using NHANES data (2011-2020) and nine supervised algorithms. To address class imbalance (13% vs. 87%), SMOTE was applied during training. Feature selection identified WC, TG, insulin, HDL, and RBC count as the most influential predictors. While these are established risk factors, the ML framework added value by quantifying their relative contributions, capturing nonlinear effects, and enabling individualized prediction through SHAP analysis. Comparative evaluation showed that the Extra Trees (ET) model outperformed commonly used TyG-based indices and achieved the most balanced performance across discrimination, accuracy, and agreement metrics. Finally, we deployed the ET model as an online risk calculator to support practical application in adolescent NAFLD screening.

Over the past decade, the prevalence of NAFLD in the United States has increased from 34.4% to 38.1%, paralleling the rise in obesity and type 2 diabetes mellitus (T2DM) (20). Among children and adolescents with obesity, the prevalence is approximately 36.1% and is expected to rise further with the global obesity epidemic (21). Pediatric NAFLD often persists into adulthood and can progress to fibrosis, cirrhosis, or other complications, underscoring the need for early detection. However, the optimal timing, frequency, and modality of screening remain unclear, and current evidence in adolescents is limited. While liver biopsy is the diagnostic gold standard, its invasiveness and cost preclude large-scale use (22). Conventional ultrasonography is more practical but has limited sensitivity for mild steatosis (23), the controlled attenuation parameter (CAP) has been proposed as a first-line screening tool in the general population, providing a more objective and quantifiable assessment of hepatic fat and serving as a useful adjunct to conventional ultrasound (24). However, its performance appears less reliable in pediatric populations, likely due to differences in body habitus and abdominal fat distribution that compromise imaging accuracy (25). Magnetic resonance imaging–derived proton density fat fraction (MRI-PDFF) provides the most accurate noninvasive quantification of hepatic fat and performs well in children, but its high cost and technical demands restrict routine use. Consequently, recent research has emphasized the need for reliable serum biomarkers for large-scale adolescent NAFLD screening (26). In addition, recent studies have applied machine learning specifically to pediatric and adolescent populations, including an NHANES-based adolescent model and a multi-algorithm pediatric study, both of which reported encouraging predictive performance and provided interpretable insights into feature contributions (27, 28).

Using the LGBM algorithm, we initially ranked variables by feature importance and identified the top ten predictors: WC, insulin, TG, PLT, Height, GLU, WBC, TC, RBC, and HDL. These factors are well documented in adults—WC as the strongest body composition predictor of NAFLD (29, 30), insulin resistance as a central drive (31), and TG accumulation as a key pathological hallmark (32). Platelet and red blood cell indices have also been implicated in liver injury and repair processes (3336). Although these associations are established, their relative contributions and interactions in adolescents remain understudied. To ensure robustness, we validated the LGBM-based selection with a consensus strategy combining L1-logistic regression, Boruta, and permutation importance, which consistently highlighted overlapping predictors. This confirmed that our feature selection was not biased toward tree-based methods and provided a stable foundation for subsequent model development.

Comparative evaluation of nine supervised algorithms demonstrated that the Extra Trees (ET) model achieved the most consistent overall performance across discrimination, classification, calibration, and clinical utility. In ROC analysis, ET yielded the highest AUC in both training and testing sets, reflecting strong discriminative ability compared with ensemble and non-ensemble algorithms. Confusion matrix results further confirmed its balanced classification, with markedly higher sensitivity for NAFLD detection than most counterparts, while maintaining high overall accuracy. In terms of calibration, the ET model produced the lowest Brier score and curves closely aligned with the reference line, indicating reliable probability estimates. Decision curve analysis (DCA) also showed that ET consistently provided the greatest net clinical benefit across a wide range of threshold probabilities, outperforming the other eight models. Taken together, these findings indicate that ET offered the most robust balance of discrimination, reliability, and clinical applicability, supporting its selection as the optimal algorithm for subsequent comparison with traditional metabolic indices. Consistent with previous studies, ensemble tree-based methods such as ET and RF have repeatedly shown strong generalization in predicting chronic diseases, including NAFLD (3739). Our study extends this evidence to adolescents, representing the first application of ET in this population.

When compared with logistic regression models based on the TyG index and its derivatives, the ET model consistently demonstrated superior predictive balance. Although TyG-derived models—particularly TyG-WC (sensitivity = 0.823) and multi-feature combinations (AUC up to 0.768)—achieved higher sensitivity than ET, this came at the cost of lower precision and overall accuracy, reflecting a tendency to overclassify positive cases. By contrast, ET maintained the highest AUC (0.784), along with better precision (0.324) and agreement (Kappa = 0.320), offering a more reliable performance profile. These results suggest that while TyG indices capture important aspects of insulin resistance, their limited dimensionality constrains their predictive value. ET, by integrating complex nonlinear interactions, provides superior discrimination and more balanced performance, making it a more appropriate tool for adolescent NAFLD risk prediction.

SHAP analysis confirmed waist circumference (WC), triglycerides (TG), insulin, red blood cell count (RBC), and HDL as the most influential predictors of adolescent NAFLD, with glucose and platelet count also contributing (Figure 9). Beyond confirming known risk factors, the model quantified their relative importance and revealed nonlinear effects. Dependence plots indicated that higher WC and TG sharply increased risk, while elevated HDL exerted a protective but nonlinear effect (Figure 10). At the individual level, SHAP force plots illustrated how multiple features jointly shaped predictions, with WC and TG driving positive contributions and HDL and insulin reducing risk (Figure 11). These results provide clinically interpretable insights at both population and patient levels.

From a clinical perspective, the ET-based model is practical as it uses routinely available anthropometric and laboratory measures, enabling scalable screening in adolescents. The identification of nonlinear thresholds for WC, TG, and HDL offers actionable cutoffs for early intervention, while the accompanying online tool provides individualized risk estimates to support tailored prevention and patient engagement.

This study has several limitations. It was based on cross-sectional NHANES data, restricting causal inference, and NAFLD diagnosis relied on biochemical indicators rather than liver biopsy, which may have caused misclassification. Although multiple feature selection strategies were applied, important genetic or environmental factors might have been overlooked. In addition, the absence of external cohort validation and the U.S.-only adolescent sample limit generalizability to other populations. Nonetheless, the study’s strengths include the use of a large, nationally representative dataset with rigorous inclusion criteria, adjustment for key confounders, and the development of an accessible online prediction tool, together supporting its reliability and clinical relevance.

5 Conclusion

The machine learning model developed using the Extra Trees algorithm in this study demonstrates superior predictive performance for identifying adolescents at risk of NAFLD. Based on this model, an interactive web-based prediction tool was constructed, enabling clinicians to rapidly and conveniently estimate individual NAFLD risk using routine clinical indicators. This model not only improves early identification and risk stratification of NAFLD in youth populations but also has the potential to reduce unnecessary imaging examinations and laboratory testing, ultimately supporting cost-effective and personalized preventive strategies in clinical practice.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Ethics statement

Written informed consent was obtained from the minor(s)’ legal guardian/next of kin for the publication of any potentially identifiable images or data included in this article.

Author contributions

CZ: Conceptualization, Formal analysis, Software, Visualization, Writing – original draft. BN: Methodology, Data curation, Writing – review & editing. RW: Software, Validation, Visualization, Writing – review & editing. LZ: Supervision, Project administration, Funding acquisition, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research and/or publication of this article. This work was supported by the Key Laboratory Construction Project of Shanxi Province for Major Infectious Disease Prevention and Treatment, funded by the Shanxi Provincial Department of Science and Technology and the Shanxi Provincial Health Commission (Official Document Reference: Jin-Ke-Ji-Fa (2020) No. 12). The project was officially filed as a provincial-level science and technology initiative on February 21, 2020, and supported laboratory construction and related research activities in the field of infectious disease control.

Acknowledgments

The authors thank the participants and staff of the National Health and Nutrition Examination Survey 2011–2018 for their valuable contributions.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fendo.2025.1681686/full#supplementary-material

References

1. Han SK, Baik SK, and Kim MY. Non-alcoholic fatty liver disease: Definition and subtypes. Clin Mol Hepatol. (2023) 29:S5–16. doi: 10.3350/cmh.2022.0424

PubMed Abstract | Crossref Full Text | Google Scholar

2. Teng ML, Ng CH, Huang DQ, Chan KE, Tan DJ, Lim WH, et al. Global incidence and prevalence of nonalcoholic fatty liver disease. Clin Mol Hepatol. (2023) 29:S32–42. doi: 10.3350/cmh.2022.0365

PubMed Abstract | Crossref Full Text | Google Scholar

3. European Association for the Study of the Liver (EASL), European Association for the Study of Diabetes (EASD), European Association for the Study of Obesity (EASO). EASL-EASD-EASO clinical practice guidelines on the management of metabolic dysfunction-associated steatotic liver disease (MASLD). Obes Facts. (2024) 17:374–444. doi: 10.1159/000539371

PubMed Abstract | Crossref Full Text | Google Scholar

4. Quek J, Chan KE, Wong ZY, Tan C, Tan B, Lim WH, et al. Global prevalence of non-alcoholic fatty liver disease and non-alcoholic steatohepatitis in the overweight and obese population: a systematic review and meta-analysis. Lancet Gastroenterol Hepatol. (2023) 8:20–30. doi: 10.1016/S2468-1253(22)00317-X

PubMed Abstract | Crossref Full Text | Google Scholar

5. Sanyal AJ, Castera L, and Wong VWS. Noninvasive assessment of liver fibrosis in NAFLD. Clin Gastroenterol Hepatol. (2023) 21:2026–39. doi: 10.1016/j.cgh.2023.03.042

PubMed Abstract | Crossref Full Text | Google Scholar

6. Lee EJ, Choi M, Ahn SB, Yoo JJ, Kang SH, Cho Y, et al. Prevalence of nonalcoholic fatty liver disease in pediatrics and adolescents: a systematic review and meta-analysis. World J Pediatr. (2024) 20:569–80. doi: 10.1007/s12519-024-00814-1

PubMed Abstract | Crossref Full Text | Google Scholar

7. Starekova J, Hernando D, Pickhardt PJ, and Reeder SB. Quantification of liver fat content with CT and MRI: state of the art. Radiology. (2021) 301:250–62. doi: 10.1148/radiol.2021204288

PubMed Abstract | Crossref Full Text | Google Scholar

8. Selvaraj EA, Mózes FE, Jayaswal ANA, Zafarmand MH, Vali Y, Lee JA, et al. Diagnostic accuracy of elastography and magnetic resonance imaging in patients with NAFLD: A systematic review and meta-analysis. J Hepatol. (2021) 75:770. doi: 10.1016/j.jhep.2021.04.044

PubMed Abstract | Crossref Full Text | Google Scholar

9. Handelman GS, Kok HK, Chandra RV, Razavi AH, Lee MJ, and Asadi H. eD octor: machine learning and the future of medicine. J Intern Med. (2018) 284:603–19. doi: 10.1111/joim.12822

PubMed Abstract | Crossref Full Text | Google Scholar

10. Deo RC. Machine learning in medicine. Circulation. (2015) 132:1920–30. doi: 10.1161/CIRCULATIONAHA.115.001593

PubMed Abstract | Crossref Full Text | Google Scholar

11. Malek M, Khamseh ME, Chehrehgosha H, Nobarani S, and Alaei-Shahmiri F. Triglyceride glucose-waist to height ratio: a novel and effective marker for identifying hepatic steatosis in individuals with type 2 diabetes mellitus. Endocrine. (2021) 74:538–45. doi: 10.1007/s12020-021-02815-w

PubMed Abstract | Crossref Full Text | Google Scholar

12. Xue Y, Xu J, Li M, and Gao Y. Potential screening indicators for early diagnosis of NAFLD/MAFLD and liver fibrosis: Triglyceride glucose index-related parameters. Front Endocrinol. (2022) 13:951689. doi: 10.3389/fendo.2022.951689

PubMed Abstract | Crossref Full Text | Google Scholar

13. Chang D, Truong E, Mena EA, Pacheco F, Wong M, Guindi M, et al. Machine learning models are superior to noninvasive tests in identifying clinically significant stages of NAFLD and NAFLD-related cirrhosis. Hepatology. (2023) 77:546–57. doi: 10.1002/hep.32655

PubMed Abstract | Crossref Full Text | Google Scholar

14. Wang H, Cheng W, Hu P, Ling T, Hu C, Chen Y, et al. Integrative analysis identifies oxidative stress biomarkers in non-alcoholic fatty liver disease via machine learning and weighted gene co-expression network analysis. Front Immunol. (2024) 15:1335112. doi: 10.3389/fimmu.2024.1335112

PubMed Abstract | Crossref Full Text | Google Scholar

15. Huneault HE, Gent AE, Cohen CC, He Z, Jarrell ZR, Kamaleswaran R, et al. Validation of a screening panel for pediatric metabolic dysfunction–associated steatotic liver disease using metabolomics. Hepatol Commun. (2024) 8:e0375. doi: 10.1097/HC9.0000000000000375

PubMed Abstract | Crossref Full Text | Google Scholar

16. Li M, Shu W, Zunong J, Amaerjiang N, Xiao H, Li D, et al. Predictors of non-alcoholic fatty liver disease in children. Pediatr Res. (2022) 92:322–30. doi: 10.1038/s41390-021-01754-6

PubMed Abstract | Crossref Full Text | Google Scholar

17. Razmpour F, Daryabeygi-Khotbehsara R, Soleimani D, Asgharnezhad H, Shamsi A, Bajestani GS, et al. Application of machine learning in predicting non-alcoholic fatty liver disease using anthropometric and body composition indices. Sci Rep. (2023) 13:4942. doi: 10.1038/s41598-023-32129-y

PubMed Abstract | Crossref Full Text | Google Scholar

18. Vos MB, Abrams SH, Barlow SE, Caprio S, Daniels SR, Kohli R, et al. NASPGHAN clinical practice guideline for the diagnosis and treatment of nonalcoholic fatty liver disease in children: recommendations from the expert committee on NAFLD (ECON) and the north american society of pediatric gastroenterology, hepatology and nutrition (NASPGHAN). J Pediatr Gastroenterol Nutr. (2017) 64:319–34. doi: 10.1097/MPG.0000000000001482

PubMed Abstract | Crossref Full Text | Google Scholar

19. Gulati R, Gulati K, Durrani HM, Sahni H, Mhanna MJ, Kaelber DC, et al. Missed opportunities in guideline-based fatty liver screening among 3.5 million children. Acad Pediatr. (2024) 24:815–9. doi: 10.1016/j.acap.2024.01.019

PubMed Abstract | Crossref Full Text | Google Scholar

20. Wong RJ and Cheung R. Trends in the prevalence of metabolic dysfunction-associated fatty liver disease in the United States, 2011-2018. Clin Gastroenterol Hepatol Off Clin Pract J Am Gastroenterol Assoc. (2022) 20:e610–3. doi: 10.1016/j.cgh.2021.01.030

PubMed Abstract | Crossref Full Text | Google Scholar

21. Shaunak M, Byrne CD, Davis N, Afolabi P, Faust SN, and Davies JH. Non-alcoholic fatty liver disease and childhood obesity. Arch Dis Child. (2021) 106:3–8. doi: 10.1136/archdischild-2019-318063

PubMed Abstract | Crossref Full Text | Google Scholar

22. Rinella ME, Neuschwander-Tetri BA, Siddiqui MS, Abdelmalek MF, Caldwell S, Barb D, et al. AASLD Practice Guidance on the clinical assessment and management of nonalcoholic fatty liver disease. Hepatol Baltim Md. (2023) 77:1797–835. doi: 10.1097/HEP.0000000000000323

PubMed Abstract | Crossref Full Text | Google Scholar

23. Ferraioli G and Soares Monteiro LB. Ultrasound-based techniques for the diagnosis of liver steatosis. World J Gastroenterol. (2019) 25:6053–62. doi: 10.3748/wjg.v25.i40.6053

PubMed Abstract | Crossref Full Text | Google Scholar

24. Karlas T, Petroff D, Sasso M, Fan JG, Mi YQ, de Lédinghen V, et al. Individual patient data meta-analysis of controlled attenuation parameter (CAP) technology for assessing steatosis. J Hepatol. (2017) 66:1022–30. doi: 10.1016/j.jhep.2016.12.022

PubMed Abstract | Crossref Full Text | Google Scholar

25. Papachristodoulou A, Kavvadas D, Karamitsos A, Papamitsou T, Chatzidimitriou M, and Sioga A. Diagnosis and staging of pediatric non-alcoholic fatty liver disease: is classical ultrasound the answer? Pediatr Rep. (2021) 13:312–21. doi: 10.3390/pediatric13020039

PubMed Abstract | Crossref Full Text | Google Scholar

26. European Association for the Study of the Liver (EASL), European Association for the Study of Diabetes (EASD), European Association for the Study of Obesity (EASO). EASL-EASD-EASO Clinical Practice Guidelines on the management of metabolic dysfunction-associated steatotic liver disease (MASLD). J Hepatol. (2024) 81:492–542. doi: 10.1159/000539371

PubMed Abstract | Crossref Full Text | Google Scholar

27. Zhang ZY, Wu HY, Ma RW, Feng B, Yang R, Chen XG, et al. Machine Learning-Based predictive model for adolescent metabolic syndrome. Sci Rep. (2025) 15:3274. doi: 10.1038/s41598-025-88156-4

PubMed Abstract | Crossref Full Text | Google Scholar

28. Sayyari A, Magsudy A, Moeinipour Y, Hosseini A, Amiri H, Arzaghi M, et al. Investigation of predictive factors for fatty liver in children and adolescents using artificial intelligence. Front Pediatr. (2025) 13:1537098. doi: 10.3389/fped.2025.1537098

PubMed Abstract | Crossref Full Text | Google Scholar

29. Lee JH, Jeon S, Lee HS, and Kwon YJ. Cutoff points of waist circumference for predicting incident non-alcoholic fatty liver disease in middle-aged and older korean adults. Nutrients. (2022) 14:2994. doi: 10.3390/nu14142994

PubMed Abstract | Crossref Full Text | Google Scholar

30. Lee JH, Jeon S, Lee HS, and Kwon YJ. Association between waist circumference trajectories and incident non-alcoholic fatty liver disease. Obes Res Clin Pract. (2023) 17:398–404. doi: 10.1016/j.orcp.2023.09.005

PubMed Abstract | Crossref Full Text | Google Scholar

31. Khan RS, Bril F, Cusi K, and Newsome PN. Modulation of insulin resistance in nonalcoholic fatty liver disease. Hepatol Baltim Md. (2019) 70:711–24. doi: 10.1002/hep.30429

PubMed Abstract | Crossref Full Text | Google Scholar

32. Esler WP and Cohen DE. Pharmacologic inhibition of lipogenesis for the treatment of NAFLD. J Hepatol. (2024) 80:362–77. doi: 10.1016/j.jhep.2023.10.042

PubMed Abstract | Crossref Full Text | Google Scholar

33. Gasparyan AY, Ayvazyan L, Mukanova U, Yessirkepov M, and Kitas GD. The platelet-to-lymphocyte ratio as an inflammatory marker in rheumatic diseases. Ann Lab Med. (2019) 39:345–57. doi: 10.3343/alm.2019.39.4.345

PubMed Abstract | Crossref Full Text | Google Scholar

34. Chen M, Wang B, Huang J, Zhao J, Chen J, and Chen G. The role of platelet-related parameters for the prediction of NAFLD in OSAHS patients. BMC Pulm Med. (2022) 22:487. doi: 10.1186/s12890-022-02291-6

PubMed Abstract | Crossref Full Text | Google Scholar

35. García-Núñez A, Jiménez-Gómez G, Hidalgo-Molina A, Córdoba-Doña JA, León-Jiménez A, and Campos-Caro A. Inflammatory indices obtained from routine blood tests show an inflammatory state associated with disease progression in engineered stone silicosis patients. Sci Rep. (2022) 12:8211. doi: 10.1038/s41598-022-11926-x

PubMed Abstract | Crossref Full Text | Google Scholar

36. Yanagisawa H, Maeda H, Noguchi I, Tanaka M, Wada N, Nagasaki T, et al. Carbon monoxide-loaded red blood cells ameliorate metabolic dysfunction-associated steatohepatitis progression via enhancing AMP-activated protein kinase activity and inhibiting Kupffer cell activation. Redox Biol. (2024) 76:103314. doi: 10.1016/j.redox.2024.103314

PubMed Abstract | Crossref Full Text | Google Scholar

37. Lanjewar MG, Parab JS, Shaikh AY, and Sequeira M. CNN with machine learning approaches using ExtraTreesClassifier and MRMR feature selection techniques to detect liver diseases on cloud. Clust Comput. (2023) 26:3657–72. doi: 10.1007/s10586-022-03752-7

Crossref Full Text | Google Scholar

38. Md AQ, Kulkarni S, Joshua CJ, Vaichole T, Mohan S, and Iwendi C. Enhanced preprocessing approach using ensemble machine learning algorithms for detecting liver disease. Biomedicines. (2023) 11:581. doi: 10.3390/biomedicines11020581

PubMed Abstract | Crossref Full Text | Google Scholar

39. Lim DYZ, Chung GE, Cher PH, Chockalingam R, Kim W, and Tan CK. Use of machine learning to predict onset of NAFLD in an all-comers cohort-development and validation in 2 large asian cohorts. Gastro Hep Adv. (2024) 3:1005–11. doi: 10.1016/j.gastha.2024.06.007

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: machine learning, non-alcoholic fatty liver disease, adolescents, feature selection, public health

Citation: Zhang C, Niu B, Wang R and Zhang L (2025) From traditional metabolic markers to ensemble learning: comparative application of machine learning models for predicting NAFLD risk in adolescents. Front. Endocrinol. 16:1681686. doi: 10.3389/fendo.2025.1681686

Received: 07 August 2025; Accepted: 15 October 2025;
Published: 29 October 2025.

Edited by:

Redhwan Ahmed Al-Naggar, National University of Malaysia, Malaysia

Reviewed by:

Bikash Sadhukhan, Techno International New Town, India
Maria Teofila Vicente Herrero, University of Balearic Islands, Spain

Copyright © 2025 Zhang, Niu, Wang and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Liaoyun Zhang, emx5c2d6eUAxNjMuY29t

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.