AUTHOR=Bao Shixue , Jin Qiankai , Wang Tieqiao , Mao Yushan , Huang Guoqing TITLE=Predicting the risk of lean non-alcoholic fatty liver disease based on interpretable machine models in a Chinese T2DM population JOURNAL=Frontiers in Endocrinology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/endocrinology/articles/10.3389/fendo.2025.1626203 DOI=10.3389/fendo.2025.1626203 ISSN=1664-2392 ABSTRACT=BackgroundNon-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease, seriously threatening the public health. Although the proportion of patients with lean NAFLD is lower than that of patients with obese NALFD, it should not be overlooked. This study aimed to construct interpretable machine learning models for predicting lean NAFLD risk in type 2 diabetes mellitus (T2DM) patients.MethodsThis study enrolled 1,553 T2DM individuals who received health care at the First Affiliated Hospital of Ningbo University, Ningbo, China, from November 2019 to November 2024. Feature screening was performed using the Boruta algorithm and the Least Absolute Shrinkage and Selection Operator (LASSO). Linear discriminant analysis (LDA), logistic regression (LR), Naive Bayes (NB), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGboost) were used in constructing risk prediction models for lean NAFLD in T2DM patients. The area under the receiver operating characteristic curve (AUC) was used to assess the predictive capacity of the model. Additionally, we employed SHapley Additive exPlanations (SHAP) analysis to unveil the specific contributions of individual features in the machine learning model to the prediction results.ResultsThe prevalence of lean NAFLD in the study population was 20.3%. Eight variables, including age, body mass index (BMI), and alanine aminotransferase (ALT), were identified as independent risk factors for lean NAFLD. Ten predictive factors, including BMI, ALT, and aspartate aminotransferase (AST), were screened for the construction of risk prediction models. The random forest model demonstrated superior performance compared to alternative machine learning (ML) algorithms, achieving an AUC of 0.739 (95% confidence interval [CI]: 0.676–0.802) in the training set, and it also exhibited the best predictive value in the internal validation set with an AUC of 0.789 (95% CI: 0.722–0.856). In addition, the SHAP method identified TG, ALT, GGT, BMI, and UA as the top five variables influencing the predictions of the RF model.ConclusionThe construction of lean NAFLD risk models based on the Chinese T2DM population, particularly the RF model, facilitates its early prevention and intervention, thereby reducing the risks of intrahepatic and extrahepatic adverse outcomes.