ORIGINAL RESEARCH article

Front. Nutr.

Sec. Nutrition and Metabolism

Volume 12 - 2025 | doi: 10.3389/fnut.2025.1616229

Machine learning prediction of Metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: Explainable analysis based on SHapley Additive exPlanations

Provisionally accepted
Yan  HongYan Hong1Xinrong  ChenXinrong Chen2Ling  WangLing Wang3Fan  ZhangFan Zhang1ZiYing  ZengZiYing Zeng1Weining  XieWeining Xie4*
  • 1Affiliated Guangdong Hospital of Integrated Traditional Chinese and Western Medicine of Guangzhou University of Chinese Medicine, Guangzhou University of Chinese Medicine, Foshan, Guangdong Province, Foshan, China
  • 2First Clinical Medical College, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong Province, China
  • 3First Affiliated Hospital of Guangxi University of Chinese Medicine, Nanning, Guangxi Zhuang Region, China
  • 4Infectious Disease Department, Guangdong Provincial Hospital of Integrated Traditional Chinese and Western Medicine, Foshan, Guangdong Province, China

The final, formatted version of the article will be published soon.

Metabolic dysfunction-associated fatty liver disease (MAFLD) is a prevalent and progressive liver disorder closely linked to obesity and metabolic dysregulation. Traditional anthropometric measures such as body mass index (BMI) are limited in their ability to capture fat distribution and associated risk.This study aimed to develop and validate machine learning (ML) models for predicting MAFLD using detailed body composition metrics and to explore the relative contributions of adipose tissue features through explainable ML techniques.Data from the 2017-2018 National Health and Nutrition Examination Survey (NHANES) were used to construct predictive models based on anthropometric, demographic, lifestyle, and clinical variables. Six ML algorithms were implemented: decision tree (DT), support vector machine (SVM), generalized linear model (GLM), gradient boosting machine (GBM), random forest (RF), and XGBoost.The Boruta algorithm was used for feature selection, and model performance was evaluated using cross-validation and a validation set. SHapley Additive exPlanations (SHAP) were employed to interpret feature contributions.Among the six models, the GBM algorithm exhibited the best performance, achieving area under the receiver operating characteristic curve (AUC) values of 0.875 (training) and 0.879 (validation), with minimal fluctuations in sensitivity and specificity. SHAP analysis identified visceral adipose tissue (VAT), BMI, and subcutaneous adipose tissue (SAT) as the most influential predictors. VAT had the highest SHAP value, underscoring its central role in MAFLD pathogenesis.This study demonstrates the effectiveness of integrating body composition features with machine learning techniques for MAFLD risk prediction. The GBM model offers robust predictive accuracy and interpretability, with potential applications in clinical decision-making and public health screening strategies. SHAP analysis provides meaningful insights into the relative importance of adiposity measures, reinforcing the value of fat distribution metrics beyond conventional obesity indices.

Keywords: MAFLD, Body Composition, machine learning, Shap, NHANES

Received: 22 Apr 2025; Accepted: 09 Jun 2025.

Copyright: © 2025 Hong, Chen, Wang, Zhang, Zeng and Xie. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Weining Xie, Infectious Disease Department, Guangdong Provincial Hospital of Integrated Traditional Chinese and Western Medicine, Foshan, Guangdong Province, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.