AUTHOR=Chen Hao , Zhang Jingjing , Chen Xueqin , Luo Ling , Dong Wenjiao , Wang Yongjie , Zhou Jiyu , Chen Canjin , Wang Wenhao , Zhang Wenbin , Zhang Zhiyi , Cai Yongguang , Kong Danli , Ding Yuanlin TITLE=Development and validation of machine learning models for MASLD: based on multiple potential screening indicators JOURNAL=Frontiers in Endocrinology VOLUME=Volume 15 - 2024 YEAR=2025 URL=https://www.frontiersin.org/journals/endocrinology/articles/10.3389/fendo.2024.1449064 DOI=10.3389/fendo.2024.1449064 ISSN=1664-2392 ABSTRACT=BackgroundMultifaceted factors play a crucial role in the prevention and treatment of metabolic dysfunction-associated steatotic liver disease (MASLD). This study aimed to utilize multifaceted indicators to construct MASLD risk prediction machine learning models and explore the core factors within these models.MethodsMASLD risk prediction models were constructed based on seven machine learning algorithms using all variables, insulin-related variables, demographic characteristics variables, and other indicators, respectively. Subsequently, the partial dependence plot(PDP) method and SHapley Additive exPlanations (SHAP) were utilized to explain the roles of important variables in the model to filter out the optimal indicators for constructing the MASLD risk model.ResultsRanking the feature importance of the Random Forest (RF) model and eXtreme Gradient Boosting (XGBoost) model constructed using all variables found that both homeostasis model assessment of insulin resistance (HOMA-IR) and triglyceride glucose-waist circumference (TyG-WC) were the first and second most important variables. The MASLD risk prediction model constructed using the variables with top 10 importance was superior to the previous model. The PDP and SHAP methods were further utilized to screen the best indicators (including HOMA-IR, TyG-WC, age, aspartate aminotransferase (AST), and ethnicity) for constructing the model, and the mean area under the curve value of the models was 0.960.ConclusionsHOMA-IR and TyG-WC are core factors in predicting MASLD risk. Ultimately, our study constructed the optimal MASLD risk prediction model using HOMA-IR, TyG-WC, age, AST, and ethnicity.