- 1Department of Pediatrics, Yonsei University College of Medicine, Gangnam Severance Hospital, Seoul, Republic of Korea
- 2Department of Family Medicine, Yonsei University College of Medicine, Yongin Severance Hospital, Yongin-si, Republic of Korea
- 3Biostatistics Collaboration Unit, Yonsei University College of Medicine, Seoul, Republic of Korea
- 4Department of Healthcare Research Team, Health Promotion Center, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea
Background: Metabolic dysfunction-associated steatotic liver disease (MASLD) is increasingly being diagnosed in young adults and is associated with long-term hepatic complications. Early detection remains challenging in asymptomatic individuals, highlighting the need for accurate and non-invasive risk assessment tools.
Methods: We developed and validated a machine learning (ML)-based model to predict MASLD in adults aged 20–40 years. A total of 13,047 participants from the Gangnam Severance Hospital were included in the training set, and 1,335 participants from the Yongin Severance Hospital were included in the external validation set. MASLD was defined as hepatic steatosis on ultrasonography with at least one cardiometabolic risk factor. Three models were constructed using stepwise variable addition: Model 1 (age, sex), Model 2 (Model 1 + body mass index [BMI], mean blood pressure), and Model 3 (Model 2 + bioelectrical impedance analysis [BIA] metrics, including percentage of body fat [PBF] and skeletal muscle index [SMI]). Logistic regression (LR), random forest (RF), and extreme gradient boosting (XGB) were also applied.
Results: In internal validation, Model 3 achieved the highest area under the receiver operating characteristic curve (AUROC): 0.90 (LR), 0.91 (RF), and 0.91 (XGB), with accuracies up to 0.81. External validation confirmed a strong performance with AUROCs of 0.89 (LR), 0.88 (RF), and 0.88 (XGB). BMI and PBF were the strongest predictors, whereas a higher SMI was unexpectedly associated with greater MASLD risk.
Conclusions: Our ML-based model using non-invasive parameters accurately predicted MASLD risk in young adults and may facilitate early screening in clinical practice.
1 Introduction
Metabolic dysfunction-associated steatotic liver disease (MASLD), formerly known as non-alcoholic fatty liver disease (NAFLD), is the most common chronic liver disease, with an estimated global prevalence of approximately 38% among adults, currently posing a significant public health burden worldwide (1). Recent epidemiological data suggest that the burden of MASLD is rapidly increasing not only in Western countries, but also in Asian populations, including Korea, due to shifts toward Westernized diets, sedentary lifestyles, and rising rates of obesity and type 2 diabetes (2). While traditionally regarded as a condition of middle-aged and older adults, MASLD is now being increasingly diagnosed in young adults (3), with reported prevalence rates ranging from 10% to 30% in individuals under years age of 40 (4, 5).
The early onset of MASLD is particularly concerning, as it may lead to a longer duration of metabolic liver injury and an elevated lifetime risk of adverse outcomes, such as type 2 diabetes, cardiovascular disease, cirrhosis, and mortality (5, 6). However, early identification of MASLD in asymptomatic young individuals remains challenging, as liver enzyme levels may be within normal ranges and imaging is not routinely performed in low-risk populations (7). Although ultrasonography is widely used to detect hepatic steatosis in clinical practice, it is limited by operator dependence and accessibility in large-scale screening settings (7). Liver biopsy, while definitive, is invasive and unsuitable for young, asymptomatic populations (8). Therefore, there is an urgent need for accurate, non-invasive, and easily applicable screening tools that can stratify MASLD risk, particularly in younger age groups.
In recent years, machine learning (ML) techniques have shown considerable promise in enhancing disease prediction by integrating diverse clinical and metabolic variables (9, 10). Applying these methods to routinely collected non-invasive parameters, such as anthropometric indices, blood pressure, and bioelectrical impedance analysis (BIA) metrics, may provide a practical and scalable approach for early risk assessment of MASLD, particularly among young adults. However, evidence on the performance of ML-based models using BIA-derived information remains limited, particularly for Asian populations.
Therefore, in this study, we aimed to develop and validate an ML-based model to predict MASLD using non-invasive clinical and body composition parameters in Korean adults aged 20–40 years. Specifically, we compared the predictive performance of logistic regression (LR), random forest (RF), and extreme gradient boosting (XGB) algorithms in routine health check-up settings.
2 Methods
2.1 Ethics statements
This study conformed to the ethical guidelines outlined in the 1975 Declaration of Helsinki and was approved by the Institutional Review Board (IRB) of the Yonsei University Gangnam Severance Hospital (IRB number: 3-2024-0221). All participants provided written informed consent prior to data collection.
2.2 Study participants
In this study, 13,047 young Korean adults aged 20–40 years who participated in the Gangnam Severance Hospital Check-up (GSHC) between January 2017 and December 2023 were included in the training set. A total of 1,335 young adults aged 20–40 years who participated in the Yongin Severance Hospital Check-up (YSHC) from June 2020 to July 2022 were included in the test set. Figure 1 illustrates the study design and workflow.
Figure 1. Flowchart of the participant selection. GSHC, Gangnam Severance Hospital Check-up; BIA, bioelectrical impedance analysis; AST, aspartate aminotransferase; ALT, alanine aminotransferase; HBV, hepatitis B virus; HCV, hepatitis C virus; ALD, alcohol-associated/related liver disease; MetALD, metabolic dysfunction and alcohol associated steatotic liver disease; SLD, steatotic liver disease; YSHC, Yongin Severance Hospital Check-up.
2.3 Anthropometric measurements and blood pressure
Height was accurately recorded (within 0.1 cm), and body weight was determined using an electronic scale to an accuracy of 0.01 kg. Waist circumference (cm) was measured by a trained nurse at the midpoint between the lower margin of the least palpable rib and top of the iliac crest in the horizontal plane of the waist. Obesity was defined as BMI ≥25 kg/m2 and abdominal obesity was defined as waist circumference ≥90 cm for males and ≥85 cm for females according to Asia-Pacific criteria (5) Mean blood pressure (MBP) was calculated as [2×systolic blood pressure + diastolic blood pressure]/3.
2.4 Questionnaire
Trained interviewers managed the questionnaire distribution. The survey included questions about coexisting conditions, such as diabetes mellitus, hypertension, dyslipidemia, alcohol consumption, smoking habits, and physical activity levels. Moderate physical activity was defined as engaging in moderate-to-vigorous physical activity for > 150 minutes per week (11).
2.5 BIA parameters
Soft lean mass (SLM), percentage of body fat (PBF), total body fat mass (TBF), visceral fat area (VFA), abdominal subcutaneous fat (ASF), and skeletal muscle mass (SMM) were measured using multifrequency bioelectrical impedance analysis (ACCUNIQ BC 720; SELVAS Healthcare, Korea). High PBF was defined as ≥25% for men and ≥35% for women (12). The skeletal muscle index (SMI) was determined by dividing SMM by BMI (13). Low skeletal muscle index (LSMI) was defined as SMI within the lowest quintile for individuals of the same sex aged 20–40 years, based on the sarcopenia criteria established by Janssen et al. (14).
2.6 Ultrasonographic analyses
The diagnosis of steatotic liver disease (SLD) was based on abdominal ultrasound results obtained using a 3.5 MHz probe (HDI 5000 at GSHC and EPIQ 7 at YSHC; Philips, Bothell, WA, USA). Ultrasound examinations were performed by one of the three experienced radiologists at each center who were blinded to the participant information. SLD was defined as the presence of at least two of the following ultrasonographic features: (1) a diffuse increase in liver parenchymal echogenicity compared to that in the kidney or spleen, (2) attenuation of the ultrasound beam, and (3) poor visualization of intrahepatic structures (15). Each feature was scored, with 2 indicating a definite presence, 1 indicating a probable presence, and 0 indicating an absence. The total fatty liver score ranged from 0 to 6, with 1–2 indicating mild fat infiltration, 3–4 indicating moderate infiltration, and 5–6 indicating severe infiltration. A score of 0 indicated no hepatic steatosis (15).
2.7 Definition of MASLD
MASLD was defined as SLD with one or more cardiometabolic risk factors after excluding individuals with alcoholic liver disease (ALD), metabolic dysfunction, alcohol-associated steatotic liver disease (MetALD), or hepatitis B and C virus infection, based on recent guidelines (8, 16). ALD was defined as hepatic steatosis associated with significant alcohol consumption (> 60 g/day for males and > 50 g/day for females), regardless of metabolic conditions. MetALD was considered as MASLD coexisting with moderate alcohol intake (30–60 g/day in males and 20–50 g/day in females). Cryptogenic SLD was characterized as steatotic liver disease with no clear underlying cause.
2.8 Statistical analyses
The baseline characteristics of the participants were compared between groups (normal and MASLD) in the training and test sets, using the independent t-test for continuous variables and the chi-squared test for categorical variables. Continuous variables were presented as mean ± standard deviation, and categorical variables were summarized as counts and percentages.
Multivariable LR analyses were performed to identify the factors associated with MASLD, with the results presented as odds ratios (ORs) and 95% confidence intervals (CIs). Before model fitting, multicollinearity among BMI and fat-related BIA parameters (PBF, TBF, VFA, and ASF) was assessed using the variance inflation factor (VIF). Three models were constructed using non-invasive parameters: Model 1 included age and sex; Model 2 incorporated age, sex, anthropometric measurements, and blood pressure; and Model 3 included BIA parameters. To assess whether the association between PBF and MASLD varied according to BMI, we incorporated an interaction term between BMI and PBF into an additional multivariable logistic regression model. The interaction term (BMI × PBF) was modeled per 10-unit increment of the product term to aid interpretability. Given the statistical significance of the interaction, we further developed BMI-stratified logistic regression models (normal BMI vs. obesity) and evaluated model discrimination using AUROC values. The prediction model was trained using LR, RF, and XGB models from the training set. Internal validation was performed using 5-fold cross validation for hyperparameter optimization, and external validation was performed with the test set (from YSHC). All model hyperparameters used in the final models are provided in Supporting Information Supplementary Table 1 to ensure reproducibility. Receiver operating characteristic (ROC) curve analyses were conducted to evaluate the discriminative performance of the models. The area under the ROC curve (AUROC) values, sensitivity, specificity, and accuracy were calculated for each model. Pairwise comparisons of the AUROCs between the models were performed using the Delong method. In addition, the AUROC of the LR model was compared with that of the established marker, the hepatic steatosis index (HSI), using the same DeLong test to evaluate relative discriminative performance for MASLD (17). We computed the Youden’s index, positive predictive value (PPV), negative predictive value (NPV), and F1-score to further assess diagnostic performance. Model calibration was evaluated using the calibration intercept, calibration slope, and the Brier score, and was visually assessed using calibration plots. Furthermore, the area under the precision–recall curve (AUPRC) was calculated to better assess discrimination in the presence of class imbalance. Finally, decision curve analysis (DCA) was performed to evaluate the clinical utility and net benefit of each model across a range of threshold probabilities. A scoring system with a nomogram was utilized to predict the probability, and verify the eligibility, of MASLD using the results of the multivariable LR analysis. The contributions of the variables in the models were assessed using Shapely additive explanation (SHAP).
All analyses were conducted using SAS (version 9.4; SAS Inc., Cary, NC, USA) and R (version 4.4.1; R Foundation for Statistical Computing, Vienna, Austria; http://www.R-project.org), with a p-value <0.05 considered statistically significant.
3 Results
3.1 Baseline characteristics
Table 1 shows baseline characteristics based on the presence of MASLD in the training and test sets. In the training set, participants with MASLD were older, predominantly male, and had higher BMI, WC, MBP, glucose, total cholesterol, triglycerides, aspartate aminotransferase (AST), alanine aminotransferase (ALT), HSI, SLM, PBF, TBF, VFA, and ASF levels, whereas HDL levels were lower than those in the control group (all p<0.001). Similar trends were observed in the test set, with significant differences in the same variables between MASLD and normal participants.
Supplementary Table 2 presents the baseline characteristics of the training and test sets. Participants in the test set had a lower proportion of obesity and higher BMI, weight, MBP, DBP, LDL, HSI, PBF, TBF, ASF, and LSMI than did those in the training set. Additionally, alcohol consumption was lower, and the prevalence of moderate physical activity was higher, in the test set.
3.2 LR analyses for MASLD
Multicollinearity among BMI and fat-related BIA parameters (PBF, TBF, VFA, and ASF) was assessed using the VIF. TBF, VFA, and ASF showed high multicollinearity with BMI; therefore, only PBF was included as the fat-related parameter in the final model (Supplementary Table 3).
Multivariable LR analyses identified significant factors associated with MASLD (Table 2). In Model 1, age and male sex were significantly and positively associated with MASLD. In Model 2, BMI and MBP were added to Model 1, and age, male sex, BMI, and MBP were positively associated with MASLD. Model 3 incorporated PBF and SMI, as BIA parameters, and these factors were significantly associated with MASLD. Age, male sex, BMI, MBP, PBF, and SMI were positively associated with MASLD.
To further examine whether the association between PBF and MASLD differed according to BMI level, we incorporated an interaction term between BMI and PBF into an additional logistic regression model. The interaction term was statistically significant (p = 0.036), indicating that the effect of PBF on MASLD risk varied across BMI strata. In BMI-stratified analyses, PBF showed a stronger association with MASLD in the normal BMI group (OR 1.23, 95% CI 1.20–1.26), whereas the association was attenuated in the obesity group (OR 1.17, 95% CI 1.14–1.20) (data not shown).
3.3 ROC analyses of each model for predicting MASLD
The ROC analyses revealed progressive improvements in the AUROC values across the models (Table 3). In both internal and external validation, model performance improved with the sequential addition of variables. Model 1, which included only age and sex, demonstrated moderate discriminative ability. The inclusion of BMI and MBP in Model 2 resulted in a increase in predictive accuracy across all machine learning algorithms. Model 3, which additionally incorporated PBF and SMI, yielded the highest performance.
In internal validation, Model 3 achieved AUROCs of 0.90 for LR, 0.91 for RF, and 0.91 for XGB, with corresponding AUPRCs of 0.77–0.80 and low Brier scores (0.108–0.119), with corresponding AUPRCs of 0.77–0.80 and low Brier scores (0.108–0.119), indicating good agreement between predicted and observed outcomes. The calibration intercepts were close to 0 and slopes near 1 across models, suggesting well-calibrated predictions. PPV, NPV, and F1 scores also increased progressively, with the highest values observed in Model 3. The calibration plots showed that the predicted probabilities from LR and XGBoost were closely aligned with the ideal 45° reference line, whereas those from the random forest model demonstrated a slightly larger deviation (Supplementary Figures 1A-C). DCA further demonstrated that all three algorithms in Model 3 provided greater net benefit than the treat-all or treat-none strategies across clinically relevant threshold probabilities (Figure 2A).
Figure 2. Decision curve analysis for Model 3 in internal and external validation datasets. (A) Decision curve analysis of Model 3 in the internal validation dataset. (B) Decision curve analysis of Model 3 in the external validation dataset. XGB, extreme gradient boosting.
In external validation, the corresponding AUROCs were 0.89, 0.88, and 0.88, with AUPRCs of 0.71–0.73 and low Brier scores (0.120–0.127), indicating stable predictive performance (Table 3). The calibration intercepts ranging from –0.42 to –0.32 and slopes near 1, demonstrating good overall agreement. The calibration plots showed that the predicted probabilities from LR and XGBoost were closely aligned with the ideal 45° reference line, whereas those from the random forest model demonstrated a slightly larger deviation. Calibration plots indicated closer alignment with the ideal 45° reference line for LR and XGBoost than for random forest (Supplementary Figures 1D-F). DCA for Model 3 showed that all algorithms provided greater net benefit than the treat-all or treat-none strategies across clinically relevant threshold probabilities (Figure 2B).
In the pairwise comparison, Model 2 was superior to Model 1, whereas Model 3 demonstrated superior predictive performance to both Models 1 and 2 across all comparisons (Supplementary Table 4). In addition, in the internal validation, the AUROC of the LR model (0.901) was significantly higher than that of the hepatic steatosis index (HSI; 0.893, p < 0.001). In the external validation, the AUROCs of the two models (0.886 vs. 0.889) were comparable, with no statistically significant difference (p = 0.631).
Consistent with these association findings, discriminatory performance of PBF also differed by BMI category (Supplementary Figure 2). In the internal validation dataset, AUROC values were 0.85 for the normal BMI group and 0.73 for the obesity group. A similar pattern was observed in the external validation dataset, with AUROCs of 0.83 and 0.72, respectively. These results indicate that PBF provides greater incremental predictive value for MASLD among individuals with normal BMI.
3.4 Scoring system with nomogram
A scoring system based on a multivariate LR model was developed to calculate the probability of developing MASLD (SI Figure 3). The probability is determined using the following formula:
Probability (MASLD) = 1/(1 + exp(−y)).
where y is computed as:
y = −14.743 + 0.083×Age − 2.167×Sex + 0.319×BMI + 0.020×MBP + 0.111×PBF + 0.010×SMI.
To enhance clinical applicability, we also developed an interactive web-based MASLD probability calculator that computes disease probability using these coefficients. The source code has been made publicly available on GitHub (https://github.com/endosong/MASLD-Probability-Calculator.git). An example of probability estimation using this calculator is presented in Figure 3.
Figure 3. Example of MASLD probability estimation using the web-based MASLD probability calculator. MASLD, metabolic dysfunction-associated steatotic liver disease.
3.5 Contribution of the variables
Figure 4 presents SHAP summary plots demonstrating the contribution of each variable to MASLD prediction across logistic regression (4A), random forest (4B), and XGB (4C) models in the external validation dataset. In all three models, male sex and BMI were identified as the parameters with the highest contributions to the prediction, followed by PBF. In the LR model, the next most influential parameters were age, SMI, and MBP, in that order, while the order shifted to age, MBP, and SMI for RF, and MBP, age, and SMI for XGB.
Figure 4. SHAP summary plot for contribution of the variables for predicting MASLD using the training set.(A) SHAP summary plot of the prediction model using logistic regression analysis using the training set (B) SHAP summary plot of the prediction model using random forest using the training set (C) SHAP summary plot of the prediction model using XGB using the training setThe color of the plot indicates whether a parameter has a relatively high or low value within the participant dataset. The horizontal position represents the degree of influence of the parameter on the prediction, with the placement reflecting a stronger or weaker impact.SHAP, Shapley additive explanation; MASLD, metabolic dysfunction-associated steatotic liver disease; BMI, body mass index; PBF, percentage of body fat; MBP, mean blood pressure; SMI, skeletal muscle mass index; XGB, extreme gradient boosting.
Supplementary Figure 4 presents the SHAP values of the LR model in the external validation dataset stratified by age group. In all age groups, sex and BMI showed the highest contributions to MASLD prediction. The relative importance of age, PBF, SMI, and MBP varied across age groups.
Supplementary Figure 5 shows the SHAP values of the LR model in the external validation dataset stratified by BMI category. In the normal BMI group, sex showed the highest contribution to MASLD prediction, followed by BMI, PBF, age, SMI, and MBP. In the obesity group, BMI had the greatest contribution, followed by sex, PBF, age, MBP, and SMI.
4 Discussion
In this study, we developed and validated ML-based models to predict MASLD in young adults using simple, non-invasive parameters derived from routine health check-ups. Model performance progressively improved with the sequential addition of variables, and Model 3, which incorporated BIA-derived measures, showed the best overall performance with high AUROC and AUPRC values and good calibration. In the comparison among algorithms within Model 3, LR, random forest, and XGBoost demonstrated comparable discriminative performance and clinical utility in both internal and external validation, as shown by ROC and decision curve analyses, indicating stable and consistent predictive performance across datasets. Moreover, our model was not inferior to the HSI, a conventional MASLD marker that requires blood testing, despite relying solely on non-invasive parameters. Our findings are consistent with those of previous studies, showing that obesity and visceral adiposity are key contributors to MASLD pathogenesis in younger populations (5, 18). Importantly, our model incorporated routinely collected non-invasive variables, making it highly applicable to primary care and health check-up settings where advanced imaging is not routinely available.
Several ML-based models have been developed for predicting MASLD using accessible clinical and demographic data (19–21). A recent study from China developed machine learning models to predict fatty liver disease using physical and biochemical variables in a general adult population with internal validation (22). Another large-scale Chinese study (n=10,007) using eight basic variables (e.g., age, waist/hip circumference, comorbidities) developed ML models with strong performance, with AUROCs of 0.798–0.806 in internal testing and 0.831 in external validation; multilayer perceptron and XGBoost also performed well with AUROCs of 0.823 and 0.784, respectively (19). Another recent model built on key metabolic indices, such as waist circumference, homeostatic model assessment of insulin resistance, triglycerides, and glucose, achieved a mean AUROC of 0.960, underscoring the relevance of insulin resistance-related features (20). In addition, a European study demonstrated that advanced MASLD outcomes, including metabolic dysfunction-associated steatohepatitis and fibrosis, could be predicted with high accuracy (AUROC 0.719–0.994) by using 19 routine clinical indicators (21). However, these models were primarily developed for middle-aged Western or Chinese populations and did not incorporate direct body composition data. Furthermore, conventional models and markers such as the HSI require blood tests, which limit their applicability in large-scale or community-based screening settings. In contrast, our study focused on a young Asian cohort and developed a model using only simple, non-invasive variables readily available in routine health check-up settings, namely age, sex, blood pressure, and body composition measures from BIA and validated the model with the independent test set. This approach highlights the potential for early MASLD risk assessment using easily obtainable data, even in non-hospital or community settings, without the need for laboratory or imaging tests. Despite relying on these basic parameters, our model demonstrated a high predictive performance. This highlights its practicality and strong potential for real-world applications, particularly in primary care and large-scale screening programs, where simplicity, cost efficiency, and accessibility are critical.
Importantly, our findings demonstrate that incorporating BIA-derived parameters significantly improves model performance. In our study, the model performance improved with each stepwise addition of variables. Model 1 showed moderate accuracy (AUROC up to 0.74), whereas Model 2, which included anthropometric and blood pressure data, improved significantly (AUROC up to 0.88) in the external validation. Model 3, which added BIA-derived parameters, achieved the highest performance, with AUROCs of 0.90–0.91 (internal validation) and 0.88–0.89 (external validation)—significantly outperforming Model 2, highlighting the value of body composition data in enhancing MASLD prediction.
Among these parameters, BMI and PBF emerged as the most influential contributors to MASLD prediction in all models, as confirmed by SHAP analysis. This finding is biologically plausible and consistent with existing pathophysiological knowledge. BMI reflects overall adiposity, which is closely linked to insulin resistance, hepatic fat accumulation, and systemic inflammation, which are the hallmarks of MASLD pathogenesis (23). However, BMI alone cannot distinguish between fat and lean mass (24). Therefore, the inclusion of PBF offers complementary information by specifically quantifying the proportion of body fat directly related to ectopic fat deposition and lipotoxicity in the liver (25, 26). Adipose tissue, particularly when in excess, secretes pro-inflammatory adipokines (e.g., TNF-α, IL-6) and reduces adiponectin levels, contributing to systemic insulin resistance and hepatic steatosis (27). Thus, the combined use of BMI and PBF provides a more nuanced assessment of metabolic risk than either metric alone, allowing for better identification of young individuals at risk of MASLD despite potentially normal body weight.
Our study found that a higher SMI was independently associated with an increased risk of MASLD, which contrasts with the conventional view that greater muscle mass is metabolically protective (28). Several factors could explain this unexpected finding. First, SMI derived from BIA may overestimate muscle mass in individuals with increased visceral fat or fluid retention, potentially confounding its association with metabolic risk (29, 30). Second, emerging evidence suggests that muscle quantity alone does not adequately capture metabolic health (31). The metabolic function of skeletal muscles is strongly influenced by muscle quality, including fiber type composition and fat infiltration (myosteatosis) (32). Individuals with obesity often exhibit a predominance of type II (fast-twitch), glycolytic, and insulin-resistant fibers, which may contribute to systemic insulin resistance, despite having greater muscle mass (33). Moreover, myosteatosis, characterized by ectopic fat accumulation within the muscle tissue, has been associated with hepatic steatosis, systemic inflammation, and metabolic dysfunction, independent of total muscle quantity (34). Consistent with our findings, a recent meta-analysis reported that sarcopenia significantly increased the risk of MASLD, whereas SMI alone was not associated with MASLD (28), further supporting the notion that muscle mass is not a sufficient marker of metabolic health. These findings underscore the limitations of relying solely on muscle mass and highlight the importance of incorporating measures of muscle strength or function, as these parameters may better reflect the true muscle quality and metabolic resilience (35).
Despite these strengths, several limitations of this study must be acknowledged. The study population was limited to Korean adults undergoing health checkups, which may limit generalizability to other ethnic or clinical populations. Additionally, the cross-sectional nature of the data precluded causal inferences. Although BIA may be less accurate than gold-standard methods, such as dual-energy X-ray absorptiometry or magnetic resonance imaging, it offers several practical advantages. BIA is non-invasive, rapid, cost-effective, and easily applicable in routine clinical and screening settings, making it particularly suitable for large-scale population studies (36).
5 Conclusion
In conclusion, we developed and validated an ML-based model that accurately predicts MASLD in young Korean adults using simple, non-invasive parameters routinely collected during health checkups, including basic clinical measures and BIA-derived body composition data. This model demonstrates that MASLD risk can be effectively estimated using easily accessible variables such as BMI, blood pressure, and BIA indices, even in non-hospital settings, thereby facilitating early detection and preventive management in the community. Importantly, our findings also highlight the added clinical value of incorporating PBF, particularly for identifying high-risk individuals with normal BMI who may otherwise be overlooked. This underscores the practical utility and scalability of the model for early MASLD risk assessment in real-world primary care and large-scale screening settings. Further validation in diverse populations and longitudinal studies are warranted to confirm the models broad applicability and long-term prognostic value.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics statement
The studies involving humans were approved by Institutional Review Board (IRB) of the Yonsei University Gangnam Severance Hospital (IRB number: 3-2024-0221). The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
KS: Conceptualization, Data curation, Investigation, Writing – original draft, Writing – review & editing. Y-JK: Conceptualization, Investigation, Writing – original draft, Writing – review & editing. EL: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Software, Validation, Writing – original draft. YY: Investigation, Methodology, Project administration, Resources, Writing – original draft. SB: Investigation, Methodology, Project administration, Resources, Writing – original draft. HL: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. HC: Conceptualization, Funding acquisition, Investigation, Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing.
Funding
The author(s) declared that financial support was not received for this work and/or its publication.
Conflict of interest
The authors declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fendo.2025.1701729/full#supplementary-material
References
1. Younossi ZM, Kalligeros M, and Henry L. Epidemiology of metabolic dysfunction-associated steatotic liver disease. Clin Mol Hepatol. (2025) 31:S32–s50. doi: 10.3350/cmh.2024.0431
2. Sun Z and Zheng Y. Metabolic diseases in the East Asian populations. Nat Rev Gastroenterol Hepatol. (2025) 22:500–16. doi: 10.1038/s41575-025-01058-8
3. Feng G, Targher G, Byrne CD, Yilmaz Y, Wai-Sun Wong V, Adithya Lesmana CR, et al. Global burden of metabolic dysfunction-associated steatotic liver disease, 2010 to 2021. JHEP Rep. (2025) 7:101271. doi: 10.1016/j.jhepr.2024.101271
4. Perumpail BJ, Manikat R, Wijarnpreecha K, Cholankeril G, Ahmed A, and Kim D. The prevalence and predictors of metabolic dysfunction-associated steatotic liver disease and fibrosis/cirrhosis among adolescents/young adults. J Pediatr Gastroenterol Nutr. (2024) 79:110–8. doi: 10.1002/jpn3.12219
5. Chung GE, Yu SJ, Yoo JJ, Cho Y, Lee KN, Shin DW, et al. Metabolic dysfunction-associated steatotic liver disease increases cardiovascular disease risk in young adults. Sci Rep. (2025) 15:5777. doi: 10.1038/s41598-025-89293-6
6. Simon TG, Roelstraete B, Hartjes K, Shah U, Khalili H, Arnell H, et al. Non-alcoholic fatty liver disease in children and young adults is associated with increased long-term mortality. J Hepatol. (2021) 75:1034–41. doi: 10.1016/j.jhep.2021.06.034
7. Stroes AR, Vos M, Benninga MA, and Koot BGP. Pediatric MASLD: current understanding and practical approach. Eur J Pediatr. (2024) 184:29. doi: 10.1007/s00431-024-05848-1
8. European Association for the Study of the Liver (EASL), European Association for the Study of Diabetes (EASD), and European Association for the Study of Obesity (EASO). EASL-EASD-EASO Clinical Practice Guidelines on the management of metabolic dysfunction-associated steatotic liver disease (MASLD). J Hepatol. (2024) 81:492–542. doi: 10.1016/j.jhep.2024.04.031
9. Zhu X, Ventura EF, Bansal S, Wijeyesekera A, and Vimaleswaran KS. Integrating genetics, metabolites, and clinical characteristics in predicting cardiometabolic health outcomes using machine learning algorithms -A systematic review. Comput Biol Med. (2025) 186:109661. doi: 10.1016/j.compbiomed.2025.109661
10. Tsai SF, Yang CT, Liu WJ, and Lee CL. Development and validation of an insulin resistance model for a population without diabetes mellitus and its clinical implication: a prospective cohort study. EClinicalMedicine. (2023) 58:101934. doi: 10.1016/j.eclinm.2023.101934
11. Piercy KL, Troiano RP, Ballard RM, Carlson SA, Fulton JE, Galuska DA, et al. The physical activity guidelines for americans. Jama. (2018) 320:2020–8. doi: 10.1001/jama.2018.14854
12. Yoon JL, Cho JJ, Park KM, Noh HM, and Park YS. Diagnostic performance of body mass index using the Western Pacific Regional Office of World Health Organization reference standards for body fat percentage. J Korean Med Sci. (2015) 30:162–6. doi: 10.3346/jkms.2015.30.2.162
13. Lee JH, Lee HS, Lee BK, Kwon YJ, and Lee JW. Relationship between muscle mass and non-alcoholic fatty liver disease. Biol (Basel). (2021) 10(2):122. doi: 10.3390/biology10020122
14. Janssen I, Heymsfield SB, and Ross R. Low relative skeletal muscle mass (sarcopenia) in older persons is associated with functional impairment and physical disability. J Am Geriatr Soc. (2002) 50:889–96. doi: 10.1046/j.1532-5415.2002.50216.x
15. Mathiesen UL, Franzén LE, Aselius H, Resjö M, Jacobsson L, Foberg U, et al. Increased liver echogenicity at ultrasound examination reflects degree of steatosis but not of fibrosis in asymptomatic patients with mild/moderate abnormalities of liver transaminases. Dig Liver Dis. (2002) 34:516–22. doi: 10.1016/s1590-8658(02)80111-6
16. Rinella ME, Lazarus JV, Ratziu V, Francque SM, Sanyal AJ, Kanwal F, et al. A multisociety Delphi consensus statement on new fatty liver disease nomenclature. J Hepatol. (2023) 79:1542–56. doi: 10.1016/j.jhep.2023.06.003
17. Lee JH, Kim D, Kim HJ, Lee CH, Yang JI, Kim W, et al. Hepatic steatosis index: a simple screening tool reflecting nonalcoholic fatty liver disease. Dig Liver Dis. (2010) 42:503–8. doi: 10.1016/j.dld.2009.08.002
18. Zheng H, Sechi LA, Navarese EP, Casu G, and Vidili G. Metabolic dysfunction-associated steatotic liver disease and cardiovascular risk: a comprehensive review. Cardiovasc Diabetol. (2024) 23:346. doi: 10.1186/s12933-024-02434-5
19. Zhu G, Song Y, Lu Z, Yi Q, Xu R, Xie Y, et al. Machine learning models for predicting metabolic dysfunction-associated steatotic liver disease prevalence using basic demographic and clinical characteristics. J Transl Med. (2025) 23:381. doi: 10.1186/s12967-025-06387-5
20. Chen H, Zhang J, Chen X, Luo L, Dong W, Wang Y, et al. Development and validation of machine learning models for MASLD: based on multiple potential screening indicators. Front Endocrinol (Lausanne). (2024) 15:1449064. doi: 10.3389/fendo.2024.1449064
21. McTeer M, Applegate D, Mesenbrink P, Ratziu V, Schattenberg JM, Bugianesi E, et al. Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information. PloS One. (2024) 19:e0299487. doi: 10.1371/journal.pone.0299487
22. Weng S, Hu D, Chen J, Yang Y, and Peng D. Prediction of fatty liver disease in a chinese population using machine-learning algorithms. Diagnostics (Basel). (2023) 13(6):1168. doi: 10.3390/diagnostics13061168
23. Huttasch M, Roden M, and Kahl S. Obesity and MASLD: Is weight loss the (only) key to treat metabolic liver disease? Metabolism. (2024) 157:155937. doi: 10.1016/j.metabol.2024.155937
24. Bosy-Westphal A and Müller MJ. Diagnosis of obesity based on body composition-associated health risks-Time for a change in paradigm. Obes Rev. (2021) 22 Suppl 2:e13190. doi: 10.1111/obr.13190
25. Zeng Q, Dong SY, Sun XN, Xie J, and Cui Y. Percent body fat is a better predictor of cardiovascular risk factors than body mass index. Braz J Med Biol Res. (2012) 45:591–600. doi: 10.1590/s0100-879x2012007500059
26. Han Y, Liu J, Li W, Zhang F, and Mao Y. Association between percent body fat reduction and changes of the metabolic score for insulin resistance in overweight/obese people with metabolic dysfunction-associated steatotic liver disease. Diabetes Metab Syndr Obes. (2024) 17:4735–47. doi: 10.2147/dmso.S486841
27. Kawai T, Autieri MV, and Scalia R. Adipose tissue inflammation and metabolic dysfunction in obesity. Am J Physiol Cell Physiol. (2021) 320:C375–c91. doi: 10.1152/ajpcell.00379.2020
28. Malik A, Javaid S, Malik MI, and Qureshi S. Relationship between sarcopenia and metabolic dysfunction-associated steatotic liver disease (MASLD): A systematic review and meta-analysis. Ann Hepatol. (2024) 29:101544. doi: 10.1016/j.aohep.2024.101544
29. Brunani A, Perna S, Soranna D, Rondanelli M, Zambon A, Bertoli S, et al. Body composition assessment using bioelectrical impedance analysis (BIA) in a wide cohort of patients affected with mild to severe obesity. Clin Nutr. (2021) 40:3973–81. doi: 10.1016/j.clnu.2021.04.033
30. El Dimassi S, Gautier J, Zalc V, Boudaoud S, and Istrate D. Body water volume estimation using bio impedance analysis: Where are we? Amsterdam, The Netherlands: Elsevier IRBM (2024). p. 100839.
31. Linge J, Ekstedt M, and Dahlqvist Leinhard O. Adverse muscle composition is linked to poor functional performance and metabolic comorbidities in NAFLD. JHEP Rep. (2021) 3:100197. doi: 10.1016/j.jhepr.2020.100197
32. Wang L, Valencak TG, and Shan T. Fat infiltration in skeletal muscle: Influential triggers and regulatory mechanism. iScience. (2024) 27:109221. doi: 10.1016/j.isci.2024.109221
33. Tanner CJ, Barakat HA, Dohm GL, Pories WJ, MacDonald KG, Cunningham PR, et al. Muscle fiber type is associated with obesity and weight loss. Am J Physiol Endocrinol Metab. (2002) 282:E1191–6. doi: 10.1152/ajpendo.00416.2001
34. Henin G, Loumaye A, Leclercq IA, and Lanthier N. Myosteatosis: Diagnosis, pathophysiology and consequences in metabolic dysfunction-associated steatotic liver disease. JHEP Rep. (2024) 6:100963. doi: 10.1016/j.jhepr.2023.100963
35. Lim TS, Kwon S, Bae SA, Chon HY, Jang SA, Kim JK, et al. Association between handgrip strength and cardiovascular disease risk in MASLD: A prospective study from UK biobank. J Cachexia Sarcopenia Muscle. (2025) 16:e13757. doi: 10.1002/jcsm.13757
Keywords: metabolic dysfunction-associated steatotic liver disease, body composition, body mass index, percentage of body fat, young adult
Citation: Song K, Kwon Y-J, Lee E, Youn YH, Baik SJ, Lee HS and Chae HW (2025) Machine learning-based model for predicting metabolic dysfunction-associated steatotic liver disease using non-invasive parameters in young adults. Front. Endocrinol. 16:1701729. doi: 10.3389/fendo.2025.1701729
Received: 09 September 2025; Accepted: 29 November 2025; Revised: 18 November 2025;
Published: 16 December 2025.
Edited by:
Anna Di Sessa, University of Campania Luigi Vanvitelli, ItalyReviewed by:
Shuwei Weng, First Affiliated Hospital of Fujian Medical University, ChinaMasanori Nojima, The University of Tokyo, Japan
Copyright © 2025 Song, Kwon, Lee, Youn, Baik, Lee and Chae. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Hye Sun Lee, aHNsZWUxQHl1aHMuYWM=; Hyun Wook Chae, aG9wZWNoYWVAeXVocy5hYw==
†These authors have contributed equally to this work and share first authorship
Eunju Lee3