Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Endocrinol.

Sec. Translational and Clinical Endocrinology

Volume 16 - 2025 | doi: 10.3389/fendo.2025.1681686

From Traditional Metabolic Markers to Ensemble Learning: Comparative Application of Machine Learning Models for Predicting NAFLD Risk in Adolescents

Provisionally accepted
  • 1Shanxi Medical University, Taiyuan, China
  • 2First Hospital of Shanxi Medical University, Taiyuan, China

The final, formatted version of the article will be published soon.

Background: Non-alcoholic fatty liver disease (NAFLD) is increasingly prevalent among adolescents and poses a major public health concern. Because imaging and biopsy are unsuitable for large-scale screening, there is an urgent need for accurate, non-invasive prediction tools. Methods: Data from 2,132 U.S. adolescents (NHANES 2011–2020) were analyzed. Nine machine learning (ML) models were developed using features selected by Light Gradient Boosting Machine (LightGBM). Performance was assessed by AUC, accuracy, sensitivity, precision, F1-score, and calibration. The Extra Trees (ET) model was further compared with TyG-based logistic regression models. Model interpretability was evaluated using SHapley Additive exPlanations (SHAP), and an interactive online prediction tool was deployed. Results: NAFLD prevalence was 13.0%. The ET model achieved the best overall performance (AUC = 0.784, ACC = 0.773, Kappa = 0.320), outperforming other ML algorithms and TyG-based models, which showed higher sensitivity but poorer precision. SHAP analysis identified waist circumference, triglycerides, insulin, and HDL as key predictors, revealing nonlinear threshold effects. The online tool allows individualized risk estimation based on routine clinical variables. Conclusion: The ET-based ML model provides an accurate and interpretable approach for adolescent NAFLD risk prediction. By surpassing traditional metabolic indicators and offering an accessible web-based calculator, it supports scalable, cost-effective early screening and targeted prevention strategies.

Keywords: machine learning, Non-alcoholic fatty liver disease, adolescents, Feature Selection, Publichealth

Received: 07 Aug 2025; Accepted: 15 Oct 2025.

Copyright: © 2025 Zhang, Niu, Wang and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Liaoyun Zhang, zlysgzy@163.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.