Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Med., 14 January 2026

Sec. Hepatobiliary Diseases

Volume 12 - 2025 | https://doi.org/10.3389/fmed.2025.1704441

Development of a machine learning model for hepatic steatosis screening using non-invasive Traditional Chinese Medicine diagnostics and clinical variables: a health checkup study with community screening potential


Ke Zhu,&#x;Ke Zhu1,2†Lihua Li&#x;Lihua Li3†Zhihui ZhaoZhihui Zhao4Sheng ZhengSheng Zheng5Bing LinBing Lin6Wenjun TangWenjun Tang7Weihong Li,*Weihong Li2,8*
  • 1Department of Traditional Chinese Medicine Orthopedics and Traumatology, The Third Affiliated Hospital, Southern Medical University, Guangzhou, Guangdong, China
  • 2School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan, China
  • 3Department of Traditional Chinese Medicine, China-Singapore Guangzhou Knowledge City Hospital, Guangzhou, Guangdong, China
  • 4School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan, China
  • 5School of Rehabilitation Medicine, Gannan Medical University, Ganzhou, Jiangxi, China
  • 6Healthcare Management Center, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan, China
  • 7Department of Respiration, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan, China
  • 8School of Traditional Chinese Medicine, Sichuan College of Traditional Chinese Medicine, Mianyang, Sichuan, China

Background: Steatotic liver disease (SLD), underpinned by hepatic steatosis, is a global health concern affecting approximately 30% of the population. Current screening methods primarily rely on laboratory tests and lack broad-spectrum applicability. This study aims to develop a predictive model by selecting from non-invasive Traditional Chinese Medicine (TCM) diagnostics, demographic, and anthropometric variables to enhance early detection of hepatic steatosis.

Methods: Data from 1,703 local residents undergoing health checkup at the health management center of Affiliated Hospital of Chengdu University of Traditional Chinese Medicine between December 2018 and December 2021 were analyzed. Demographic, anthropometric, and TCM diagnostic data were collected using questionnaires and standardized instruments. Hepatic steatosis was diagnosed via ultrasonography. Predictive models were developed using three parametric and six non-parametric algorithms, evaluated through nested five-fold stratified cross-validation. Performance was evaluated in terms of discrimination, classification metrics at the optimal threshold, calibration, and clinical utility.

Results: Anthropometric variables body mass index (BMI), weight, diastolic blood pressure, and TCM diagnostic indicators HSV_H of nose, T5, phlegm-dampness constitution score, RGB_R of mid tongue, Lab_A of lip, T4, H5, and Lab_A of orbit, a total of 11 variables were selected as predictors. Logistic regression (AUC 0.83, 95% CI: 0.809–0.850) and XGBoost (AUC 0.84, 95% CI: 0.818–0.859) achieved the highest AUC among parametric and non-parametric models, respectively. XGBoost showed marginally better performance than logistic regression in AUC and clinical utility. Difference of classification metrics, calibration slops, and calibration intercepts of the two models was not statistically significant. SHAP analysis identified BMI and body weight as the most influential predictors, alongside substantial contributions from TCM features (HSV_H of nose and T5).

Conclusion: TCM features combined with anthropometric variables can be used to develop a non-invasive screening model for ultrasound-diagnosed hepatic steatosis. Both the XGBoost and Logistic Regression models demonstrated robust performance, though external validation is needed to confirm generalizability. This non-invasive approach offers a practical tool with potential for hepatic steatosis screening in community settings.

Introduction

Steatotic liver disease (SLD, formerly named fatty liver disease), characterized by excessive lipid accumulation in hepatocytes, represents a growing global health burden encompassing metabolic dysfunction-associated steatotic liver disease (MASLD), metabolic dysfunction and alcohol-associated steatotic liver disease (MetALD), alcohol-associated liver disease (ALD), and other subtypes (1). Collectively, these conditions affect approximately 30% of the global population, with regional prevalence ranging from 25% in Western Europe to 44% in Latin America, driven primarily by obesity, type 2 diabetes, and alcohol consumption (2). Notably, SLD has emerged as the predominant etiology of cirrhosis in both the European Union and United States, where MASLD and ALD represent the most common subtypes (2). Given that cirrhosis ranks as the 11th leading cause of mortality worldwide (2), the development of early detection strategies for hepatic steatosis carries significant public health implications.

However, translating this imperative into effective community and point-of-care (POC) practice faces an obstacle. The critical need in these settings is for a screening tool that can first and foremost detect the presence of hepatic steatosis, irrespective of its underlying cause (e.g., MASLD, ALD, or mixed etiology). However, most established models are not designed for this etiologically-agnostic task. They often rely on venipuncture for etiology-specific biomarkers and are optimized to identify specific subtypes like MASLD or ALD, causing them to miss other subtypes in real-world populations where etiologies frequently overlap. This gap underscores the need for accessible, non-invasive tools dedicated to the initial, accurate detection of hepatic steatosis itself.

Traditional Chinese Medicine (TCM) posits that internal physiological imbalances manifest through external signs, including complexion, tongue characteristics, and pulse patterns. From the perspective of TCM, the pathological basis of SLD is closely related to phlegm, dampness, blood stasis as well as dysfunction of the liver, spleen and kidney (3), all of which produce observable physical signs. Contemporary research has identified distinguishable tongue and pulse manifestations between SLD patients and healthy controls (4, 5), suggesting potential predictive value of TCM indicators for SLD. Building on these findings, our study aimed to develop a novel hepatic steatosis screening model using TCM diagnostic and common clinical data.

Materials and methods

Data source

This research was a secondary analysis of data from “A Real-World Study for the Medical Data of Four Diagnostic Synergies Centered on Tongue Image Data for Major Diseases” (Trial registration: Chinese Clinical Trial Registry, ChiCTR1800018090, registered 29 August 2018). The parent study aimed to establish a real-world clinical database and investigate the association between tongue manifestation and major diseases. It was approved by the ethics committee of the Affiliated Hospital of Chengdu University of Traditional Chinese Medicine (2018-KL050), and all participants provided written informed consent, which included permission for future research use of anonymized data.

Study population

The study population consisted of participants who underwent routine health checkups at the Affiliated Hospital of Chengdu University of Traditional Chinese Medicine between December 2018 and December 2021. These participants were originally enrolled in the parent study, which also included patients seeking care for chronic diseases. For our analysis, we exclusively focused on the health checkup cohort.

The original inclusion criteria for the parent study were: (1) Age ≥ 18 years and ≤ 75 years; (2) Healthy individuals with no acute or chronic diseases for at least 3 months prior to entering the study, or patients diagnosed with conditions such as hypertension, diabetes, lung cancer, or primary colorectal cancer; (3) Not participating in any other clinical studies; (4) Signed informed consent form by the patient or their immediate family member. Original exclusion criteria were: (1) Individuals with impaired consciousness who cannot express subjective discomfort, or patients with psychiatric disorders; (2) Patients with more than one type of severe secondary progressive malignant tumor or other debilitating diseases; (3) Patients with severe primary diseases affecting one or more major systems (e.g., cardiovascular, hepatic, renal, digestive, or hematopoietic systems); (4) Pregnant or breastfeeding women; (5) Individuals with severe depression or anxiety symptoms; (6) Those currently participating in other clinical trials.

Among 2,099 health checkup participants enrolled in the original study, 1,703 participants who have completed the Doppler ultrasound examination of liver and gallbladder were included in this secondary analysis (Figure 1).

FIGURE 1
Flowchart depicting the dataset development process. From 2,099 health checkup participants (2018-2021), 396 were excluded for not attending a liver color Doppler ultrasound, leaving 1,703 in the development dataset. This dataset undergoes five-fold stratified cross-validation, shown in a chart dividing folds into training (light purple) and test (dark purple) sets.

Figure 1. Flow diagram of study design.

Collection of demographic, anthropometric, and TCM diagnostic data

In the original study, age and gender were filled by the participants on the questionnaire. Height, weight, systolic and diastolic blood pressure were obtained with standardized electronic instruments. Body mass index was calculated based on height and weight. TCM diagnostic data were collected with DAOSH four examinations instrument (Shanghai Food & Drug Administration approval No. 20202200060 for medical devices). It was comprised of face, tongue, pulse manifestation data, TCM constitution data, and TCM-specific symptom data. The detailed measuring process of TCM data is provided in Supplementary Methods.

Assessment of hepatic steatosis

Hepatic steatosis was diagnosed using Doppler ultrasound. The examinations were assessed by experienced radiologists who were blinded to the study’s aim and the participants’ clinical data. The diagnostic criteria required the presence of at least two out of three characteristic findings on abdominal ultrasonography: diffusely increased echogenicity (“bright”) liver with liver echogenicity greater than kidney or spleen, vascular blurring, and deep attenuation of the ultrasound signal (6).

Data partitioning and preprocessing

We employed a nested five-fold cross-validation framework to ensure unbiased performance estimation and prevent overfitting. The design consisted of two hierarchical levels. In the outer loop for performance evaluation, the dataset was partitioned into 5 stratified folds preserving outcome variable proportions. Each fold served once as the independent test set while the remaining four formed the training set. Final performance metrics were obtained by aggregating predictions across all five outer test sets. Within each training set, an inner five-fold cross-validation was conducted for model optimization tasks.

Data preprocessing steps included outlier detection, missing value handling, and feature transformation. For optimal predictor selection, we combined recursive feature elimination (RFE) with XGBoost feature importance scoring. The detailed description of data preprocessing is provided in Supplementary Methods. All data preprocessing steps were exclusively fitted on the training set before application to the corresponding test set.

Model development and evaluation

The aim of our model was to predict the presence of hepatic steatosis. We employed both parametric and non-parametric algorithms to develop the prediction model. Parametric algorithms included logistic regression, linear discriminant analysis, and lasso regression. Non-parametric algorithms included decision tree, random forest, XGBoost, support vector machine, k-nearest neighbor, and Gaussian naive Bayes. Lasso regression performed both predictor selection and modeling, whereas the other algorithms used predictors selected by RFE. Within each outer training fold, the optimal hyperparameters were determined via an inner five-fold cross-validation with grid search. The combination that yielded the highest mean value of the area under the receiver operating characteristic curve (AUC) across these inner validation folds was selected.

The predictive performance of model was evaluated on the outer-loop test sets based on three aspects: Discrimination, calibration, and clinical utility. Discrimination was assessed primarily by the AUC value. Furthermore, the optimal probability threshold was determined by maximizing Youden’s index on the inner-loop validation sets. This threshold was subsequently applied to the outer-loop test sets to calculate threshold-specific metrics, including sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Calibration was visually examined using calibration curves and assessed by the agreement between predicted probabilities and observed outcomes. Clinical utility was quantified using decision curve analysis (DCA), which estimates the net benefit across a range of clinically relevant probability thresholds. On top of performance, we further employed SHAP (SHapley Additive exPlanations) analysis on the outer-loop test sets to quantify and visualize the contribution of each variable of the selected model.

Sample size estimation

To ensure adequate statistical power and enhance the generalizability of our findings, we utilized all available data from the database. Sample size estimation was performed based on the method proposed by Riley et al. (7), which accounts for four key parameters: outcome prevalence, number of predictor variables, shrinkage factor, and R2. In our study, the prevalence of fatty liver was 25.3%, and we anticipated selecting at most 20 predictor variables. With a target shrinkage factor of 0.9 and an R2 of 0.1, the calculated minimum required sample size was 1,698. Our dataset met this requirement, confirming its suitability for robust model development.

Statistical analysis

Continuous variables were presented as mean ± standard deviation (normal distribution) or median (quartile) (skewed distribution), and categorical variables were presented in frequency or as a percentage. The AUC of two selected models—one parametric and one non-parametric—was compared using the DeLong test, with 95% confidence intervals (CIs) derived from bootstrap resampling (1,000 replicates). Sensitivity and specificity were compared with the McNemar test, whereas bootstrap resampling (also with 1,000 replicates) was applied to compare the positive predictive value, negative predictive value, calibration slope, and calibration intercept across models. A two-sided p-value < 0.05 was considered statistically significant. All data were analyzed using Python version 3.7 (Python Software Foundation, Wilmington, DE, United States) with numpy, scipy, pandas, scikit-learn, matplotlib, XGBoost, SHAP libraries, and self-defined functions.

Results

General characteristics of participants

Participants diagnosed with hepatic steatosis accounted for 25.3%, and the proportion of males, as well as the values for height, weight, body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), were numerically higher in the hepatic steatosis group than in the non-hepatic steatosis group. Age distributions were similar in the two groups (Table 1).

TABLE 1
www.frontiersin.org

Table 1. General characteristics of 1,703 participants stratified by hepatic steatosis status.

Selected predictors

After the variable number reached 70, the AUC value increased very slowly as the number went up. So we picked 70 variables from each outer loop training fold and the final predictor set was determined by taking the intersection of variables selected across all outer training folds. Finally, anthropometric indicators weight, BMI, diastolic blood pressure and TCM diagnostic indicators RGB_R of mid tongue, phlegm-dampness constitution score, Lab_A of lip, T4, H5, Lab_A of orbit, T5, HSV_H of nose, a total of 11 variables were selected. AUC-variable number relationships, RFE-selected and LASSO-selected variables (per outer fold), and variance inflation factor analysis are reported in Supplementary Results.

The distribution of selected variables by hepatic steatosis status is shown in Table 2. Large effect sizes (standardized mean differences (SMD) ≥ 0.8) were observed for weight and BMI, and moderate differences (0.5 ≤ SMD < 0.8) were found in HSV_H of nose and Lab_A of orbit, while small effects (0.2 ≤ SMD < 0.5) were noted for DBP, phlegm-dampness constitution score, and RGB_R of mid tongue. In contrast, the differences for T5, Lab_A of lip, H5, and T4 were trivial (SMD < 0.2).

TABLE 2
www.frontiersin.org

Table 2. Comparison of selected predictors between hepatic steatosis and non-hepatic steatosis groups.

Model performance

Discrimination of different models

Among three parametric algorithms, AUC values were very close and logistic regression got the highest performance. Among non-parametric algorithms, XGBoost attained the highest AUC value, followed by random forest, support vector machine, Gaussian naive Bayes, k-nearest neighbor, and decision tree (Table 3). Optimal hyperparameters searched for each algorithm are provided in Supplementary Results.

TABLE 3
www.frontiersin.org

Table 3. Comparison of the area under the curve (AUC) values for nine algorithms.

The DeLong test indicated that the AUC of the XGBoost model was statistically significantly higher than that of the logistic regression model (AUC difference = 0.01, p-value = 0.012). Graphically, the ROC curves showed that the XGBoost model achieved a marginally higher True Positive Rate (TPR) across various False Positive Rate (FPR) thresholds compared to logistic regression (Figure 2).

FIGURE 2
Receiver Operating Characteristic (ROC) curve comparing XGBoost and Logistic Regression models. The x-axis represents the False Positive Rate, and the y-axis represents the True Positive Rate. Both models show a curve above the diagonal line, indicating better performance than random guessing. XGBoost is represented by a blue line, and Logistic Regression by an orange line.

Figure 2. Comparison of ROC curves of XGBoost and logistic regression.

The subgroup analysis showed that the predictive performance of the XGBoost model, while generally strong, varied across different participant strata. The model achieved higher AUC in males, younger participants (<60 years), and those without hypertension or diabetes. Although slightly lower AUC values were observed in females, older participants, and individuals with comorbidities, the model maintained acceptable discriminatory power (all AUCs > 0.77) in these clinically relevant subgroups (Table 4). Logistic regression echoed this trend but yielded uniformly lower AUC values in every subgroup (Supplementary Results).

TABLE 4
www.frontiersin.org

Table 4. Area under the curve (AUC) for the XGBoost model across subgroups.

Classification performance at the optimal threshold

The classification performance of XGBoost and logistic regression models at their respective optimal thresholds is summarized in Table 5. For XGBoost, the thresholds ranged from 0.204 to 0.288, while for logistic regression, they ranged from 0.226 to 0.301. Although numerical differences favored XGBoost across all metrics, including sensitivity, specificity, and predictive values, none of these differences were statistically significant.

TABLE 5
www.frontiersin.org

Table 5. Classification performance of XGBoost and logistic regression models at their optimal probability thresholds.

Calibration curves of XGBoost and logistic regression

The calibration curve of the XGBoost model demonstrated strong agreement between predicted probabilities and observed outcomes. The calibration intercept was 0.08, indicating minimal overall bias, while the calibration slope of 1.09 suggested near-ideal alignment with slight overestimation for higher probabilities. The 95% confidence interval largely overlapped with the ideal calibration line for predicted probabilities below 0.6. For predicted probabilities above 0.6, the observed probabilities were slightly lower than predicted (Figure 3A). Logistic regression also showed good but slightly inferior calibration, with an intercept of 0.14 and slope of 1.16. While still maintaining reasonable agreement between predicted and observed probabilities, the model exhibited minor systematic deviations—overestimating risk in both low (<0.3) and high (>0.6) probability ranges, with slight underestimation at mid-range probabilities (0.3–0.6) (Figure 3B). The numerical differences in calibration intercepts and slopes between the two models, while favoring XGBoost, were not statistically significant (p = 0.499 for intercept; p = 0.366 for slope).

FIGURE 3
Panel A shows a calibration curve for XGB with a 95% confidence interval shaded in blue. The observed probability aligns closely with estimated probability, with a calibration intercept of 0.08 and slope of 1.09. Panel B presents a calibration curve for LR with a 95% confidence interval shaded in orange. The observed probability also aligns with estimated probability, with a calibration intercept of 0.14 and slope of 1.16. Both panels include a dashed line representing ideal calibration.

Figure 3. Comparison of calibration curves of XGBoost and logistic regression. (A) Calibration curve of XGBoost. (B) Calibration curve of logistic regression.

Decision curves of XGBoost and logistic regression

As illustrated in Figure 4, the XGBoost model demonstrated superior performance across a broad range of clinically relevant threshold probabilities (0.2–0.8), maintaining the highest net benefit among all decision strategies. The advantage was most pronounced in the 0.6–0.8 threshold range, where XGBoost’s net benefit exceeded that of logistic regression. Notably, a crossover phenomenon occurred beyond the 0.8 threshold probability, with logistic regression yielding marginally higher net benefits.

FIGURE 4
A decision curve analysis graph showing net benefit versus threshold probability. The blue line represents XGBoost, the orange line represents Logistic Regression, the green dotted line represents Treat All, and the red dash-dot line represents Treat None. XGBoost and Logistic Regression show higher net benefits across various threshold probabilities compared to Treat All and Treat None.

Figure 4. Comparison of decision curves of XGBoost, logistic regression, and different treatment strategies.

Model interpretability

For XGBoost model, the SHAP summary plot revealed that BMI and weight were the most influential predictors, exhibiting the largest absolute SHAP values. Notably, TCM facial feature HSV_H of Nose and pulse feature T5 also demonstrated significant contributions. TCM constitution feature phlegm-dampness constitution score and anthropometric feature diastolic blood pressure exhibited moderate effects on the model’s output. In contrast, localized TCM facial features (e.g., Lab_A of orbit) showed relatively minor contributions (Figure 5).

FIGURE 5
Scatter plot showing SHAP values for various features influencing a model. Features include Body Mass Index, Weight, and others, with dots colored on a blue to pink gradient indicating low to high feature values. The x-axis represents SHAP values ranging from -1.5 to 1.5, indicating the impact on model output.

Figure 5. Feature contribution analysis of XGBoost using SHAP plot.

Logistic regression has inherent interpretability due to its parametric nature. The final logistic regression equation was derived by averaging coefficients across all five folds of cross-validation:

logit ( p ) = ( 4.229 ± 0.118 ) × weight + ( 3.648 ± 0.163 ) ×
BMI - ( 1.314 ± 0.247 ) × HSV _ H of Nose + ( 0.923 ± 0.088 ) ×
phlegm constitution score - ( 0.817 ± 0.081 ) × T5 +
( 0.621 ± 0.240 ) × Lab _ A of Orbit + ( 0.618 ± 0.247 ) × DBP +
( 0.482 ± 0.159 ) × RGB _ R of Mid Tongue - ( 0.398 ± 0.091 ) ×
Lab _ A of Lip - ( 0.255 ± 0.148 ) × H5 + ( 0.149 ± 0.136 ) ×
T4 - ( 4.031 ± 0.213 )

Similar to XGBoost model, weight, BMI, HSV_H of Nose had relatively large impact on the probability of hepatic steatosis.

Discussion

In this study, we evaluated nine machine learning algorithms to develop a screening model for ultrasound-detected hepatic steatosis based on purely non-invasive indicators. Among the candidate non-parametric and parametric algorithms, XGBoost and logistic regression demonstrated the best AUC values. Compared to logistic regression, XGBoost exhibited a consistent pattern of minor advantages across all evaluated metrics. It achieved a statistically higher AUC, albeit with a marginal absolute difference ( < 0.02). This trend of nominal improvement was also observed in sensitivity, specificity, predictive values at the optimal threshold, and calibration parameters, although none of these differences reached statistical significance. Decision curve analysis complemented these findings, showing a consistently higher, yet modest, net benefit for XGBoost across most threshold probabilities. Therefore, the choice between the two models depends on the specific clinical context, with XGBoost offering a slight edge in predictive performance and logistic regression providing simplicity and more straightforward interpretability.

Despite the overall robust performance, both XGBoost and logistic regression models exhibited an attenuation in AUC within specific subgroups such as females, older adults ( ≥ 60 years), and individuals with hypertension or diabetes. This phenomenon may be attributed to two non-mutually exclusive factors. First, the reduced sample sizes in these strata likely limited the statistical power to robustly capture the complex mapping between predictors and the outcome. Second, pathophysiological specificity in these populations may render the selected indicators less representative; for instance, hormonal influences on fat distribution in females can diminish the sensitivity of BMI (8), while in older, hypertensive, and diabetic subgroups, the widespread use of medications and the presence of subclinical comorbidities can mask the true metabolic risk, thereby diluting the predictive signal of clinical indicators.

To uncover the drivers of predictions from both the XGBoost and logistic regression models, we leveraged SHAP analysis and interpreted logistic regression coefficients, respectively. Both the SHAP analysis and the logistic regression coefficients identified BMI and weight as dominant contributors. This strongly aligns with established literature linking obesity to hepatic steatosis via mechanisms like insulin resistance and chronic inflammation (9, 10). Despite the correlation between BMI and weight in the model, the VIF analysis showed acceptable multicollinearity (both VIFs < 5, see Supplementary Results), and both the RFE and LASSO methods selected them as predictors, suggesting they provide complementary information for the prediction. Another anthropometric variable diastolic blood pressure played a role as well, exhibiting moderate effects on the model’s output. This is consistent with the known bidirectional association between hypertension and non-alcoholic fatty liver disease (NAFLD, now termed MASLD) (11).

Beyond established clinical indicators, our model incorporated a novel set of digitalized TCM metrics derived from three domains: 1) facial and tongue colors, quantified using HSV (Hue-Saturation-Value), Lab (lightness-chromaticity), and RGB (Red-Green-Blue) models—specifically, HSV_H of nose, Lab_A of lip, Lab_A of orbit, and RGB_R of mid tongue; 2) pulse waveforms, from which parameters T4, T5, and H5 were extracted (H and T denote the height and time values of the pulsation trajectory, respectively); and 3) a phlegm-dampness constitution score calculated from self-reported symptoms, quantifying the degree of phlegm-dampness constitution in TCM. The SHAP analysis and logistic regression coefficients both confirmed the significant predictive role of HSV_H of nose. This association with hepatic steatosis may be mediated through oxidative stress. SLD patients show elevated serum levels of oxidative stress markers such as malondialdehyde and 8-isoprostane (12). These may activate the NF-κB pathway, accelerating dermal fibroblast senescence and promoting lipofuscin deposition in skin tissue (13), potentially leading to nasal skin darkening.

Both T5 and phlegm-dampness constitution score showed relatively high contribution to the prediction. The lower mean T5 value in the hepatic steatosis group, indicating decreased cardiac function, aligns with the existing finding that NAFLD is significantly associated with left ventricular diastolic dysfunction (14). Regarding the phlegm-dampness constitution, its role as a pathological basis of SLD is well-recognized in TCM theory (3), and its epidemiological link to NAFLD has been previously documented (15). Our study further demonstrates the practical utility of this constitution score for hepatic steatosis screening.

Besides, Lab_A of orbit showed a moderate effect size for distinguishing non-hepatic steatosis and hepatic steatosis groups. However, its impact in SHAP analysis was tiny, indicating diminished predictive utility in the presence of stronger predictors. Separately, Lab_A of lip, RGB_R of mid tongue, H5, and T4 were ranked as the least important predictors by SHAP analysis, consistent with their small or trivial univariable effect sizes. Overall, building upon previous evidence that tongue features can assist in predicting NAFLD (16, 17), our study extends these findings by identifying the predictive value of facial and pulse characteristics, as well as TCM constitution scores, thereby broadening the range of TCM tools available for hepatic steatosis screening.

A screening model for hepatic steatosis is crucial for early SLD detection. However, existing models (AUC 0.8–0.89) (1823) are limited by subtype specificity (e.g., NAFLD, MASLD, ASLD) and reliance on blood tests, restricting their broad use. In contrast, our model avoids subtyping and relies solely on non-invasive predictors to directly identify ultrasound-detected hepatic steatosis. Trained on health check-up data from a health-conscious community subset, it offers a viable path for community screening. While the standardized TCM facial, tongue, and pulse indicators currently require specialized tools, upcoming advances in smartphone color correction and wearable sensor technology are expected to enable at-home measurement in the near future (24, 25).

Apart from model design and practicality, we also strengthened methodological rigor. Our study employed cross-validation to evaluate model performance—a method strongly advocated by Collins et al. (26) to mitigate the limitations of simple data splitting. Unlike previous studies relying on single train-test splits, which discard valuable data and introduce instability due to small test sets, our approach leverages repeated resampling to utilize all available data. This reduces optimism bias from overfitting, providing a more precise performance estimate.

Nevertheless, several limitations of this study should be acknowledged. First, the generalizability of the model is limited. It was developed from a hospital check-up cohort that likely represents individuals with higher socioeconomic status and greater health awareness (27, 28), potentially introducing selection bias. The model has only undergone internal validation and exhibited a moderate decline in performance within key subgroups. Furthermore, its applicability across diverse ethnic and racial groups may be constrained by variations in the manifestation of TCM features. Second, limitations related to the diagnostic standard should be considered. Ultrasound has inherent sensitivity constraints in detecting mild hepatic steatosis, which likely resulted in under-ascertainment of cases and consequently restricts the model’s effectiveness in identifying early-stage disease. Third, the absence of data on established predictors—particularly waist circumference—precluded a head-to-head comparison between our model and conventional, widely used indices such as the Fatty Liver Index (FLI). This gap impedes a direct evaluation of the incremental utility offered by our TCM-based model over existing tools.

Future research should pursue the following directions. First, external validation in community-based settings is necessary to verify the model’s utility in broader public health contexts. Second, future studies should intentionally recruit underrepresented populations such as females, older adults, and individuals with diabetes or hypertension to ensure equitable performance. Third, collecting waist circumference data to enable head-to-head comparison with established tools like the Fatty Liver Index (FLI) and to evaluate the incremental value offered by the TCM-based predictors. Finally, incorporating blood-based biomarkers such as ALT and TG could significantly enhance predictive capacity in high-risk populations. For individuals with conditions like type 2 diabetes or metabolic syndrome, where the pathophysiology of hepatic steatosis is more complex, the addition of biochemical parameters might provide a boost in detection sensitivity.

Conclusion

TCM facial, tongue, pulse manifestations and phlegm-dampness constitution score can be combined with anthropometric variables to develop a non-invasive screening model for hepatic steatosis identified by ultrasound. Both XGBoost and logistic regression demonstrated strong and comparable performance. The choice between them should be guided by the clinical context—whether inherent interpretability or slight predictive advantage is prioritized.

Data availability statement

The original contributions presented in this study are included in this article/Supplementary material, further inquiries can be directed to the corresponding authors.

Ethics statement

The studies involving humans were approved by the Ethics Committee of the Affiliated Hospital of Chengdu University of Traditional Chinese Medicine. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

KZ: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing. LL: Conceptualization, Formal analysis, Methodology, Writing – original draft, Writing – review & editing. ZZ: Data curation, Formal analysis, Writing – review & editing. SZ: Formal analysis, Writing – review & editing. BL: Data curation, Supervision, Writing – review & editing. WT: Data curation, Writing – review & editing. WL: Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the Sichuan Science and Technology Department Project (Grant no. 2024YFFK0036) and in part by the National Natural Science Foundation of China (Grant nos.82405607, 81873204, 82305091).

Acknowledgments

We thank Jinbo Sun, Jing Jiang, Wenyi Li, and other medical staff of the health manage center of the affiliated hospital of Chengdu University of traditional Chinese medicine for their assistance in of examining participants and collecting data.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2025.1704441/full#supplementary-material

References

1. Kanwal F, Neuschwander-Tetri B, Loomba R, Rinella M. Metabolic dysfunction–associated steatotic liver disease: update and impact of new nomenclature on the American association for the study of liver diseases practice guidance on nonalcoholic fatty liver disease. Hepatology. (2024) 79:1212–9. doi: 10.1097/hep.0000000000000670

PubMed Abstract | Crossref Full Text | Google Scholar

2. Israelsen M, Francque S, Tsochatzis E, Krag A. Steatotic liver disease. Lancet. (2024) 404:1761–78. doi: 10.1016/S0140-6736(24)01811-7

PubMed Abstract | Crossref Full Text | Google Scholar

3. Cai M, Yang Y. Clinical experience in TCM differential treatment of fatty liver. J Traditional Chinese Med. (2007) 27:115–6.

Google Scholar

4. Liu H, Jin Q. Pulse parameters signature analysis of 209 cases with adiposis hepatica (in Chinese). Shanxi J Traditional Chinese Med. (2011) 27:40–1. doi: 10.3969/j.issn.1000-7156.2011.07.027

Crossref Full Text | Google Scholar

5. Wang S, Li F, Liang R, Wang Z, Wu J, Chen S, et al. Analysis of tongue characteristics of patients with fatty liver disease in the health check-up population (in Chinese). J Basic Chinese Med. (2007) 11:847–9. doi: 10.3969/j.issn.1006-3250.2007.11.021

Crossref Full Text | Google Scholar

6. Farrell G, Chitturi S, Lau G, Sollano J. Guidelines for the assessment and management of non-alcoholic fatty liver disease in the Asia-Pacific region: executive summary. J Gastroenterol Hepatol. (2007) 22:775–7. doi: 10.1111/j.1440-1746.2007.05002.x

PubMed Abstract | Crossref Full Text | Google Scholar

7. Riley R, Ensor J, Snell K, Harrell F, Martin G, Reitsma J, et al. Calculating the sample size required for developing a clinical prediction model. Bmj. (2020) 368:m441. doi: 10.1136/bmj.m441

PubMed Abstract | Crossref Full Text | Google Scholar

8. Karastergiou K, Smith S, Greenberg A, Fried S. Sex differences in human adipose tissues – the biology of pear shape. Biol Sex Dif. (2012) 3:13. doi: 10.1186/2042-6410-3-13

PubMed Abstract | Crossref Full Text | Google Scholar

9. Fabbrini E, Sullivan S, Klein S. Obesity and nonalcoholic fatty liver disease: biochemical, metabolic, and clinical implications. Hepatology. (2010) 51:679–89. doi: 10.1002/hep.23280

PubMed Abstract | Crossref Full Text | Google Scholar

10. Cohen J, Horton J, Hobbs H. Human fatty liver disease: old questions and new insights. Science. (2011) 332:1519–23. doi: 10.1126/science.1204265

PubMed Abstract | Crossref Full Text | Google Scholar

11. Nakagami H. Mechanisms underlying the bidirectional association between nonalcoholic fatty liver disease and hypertension. Hypertension Res. (2023) 46:539–41. doi: 10.1038/s41440-022-01117-6

PubMed Abstract | Crossref Full Text | Google Scholar

12. Gonzalez A, Huerta-Salgado C, Orozco-Aguilar J, Aguirre F, Tacchi F, Simon F, et al. Role of oxidative stress in hepatic and extrahepatic dysfunctions during Nonalcoholic fatty liver disease (NAFLD). Oxidative Med Cell Longevity. (2020) 2020:1617805. doi: 10.1155/2020/1617805

PubMed Abstract | Crossref Full Text | Google Scholar

13. Rinnerthaler M, Bischof J, Streubel M, Trost A, Richter K. Oxidative stress in aging human skin. Biomolecules. (2015) 5:545–89. doi: 10.3390/biom5020545

PubMed Abstract | Crossref Full Text | Google Scholar

14. Gohil N, Tanveer N, Makkena V, Jaramillo A, Awosusi B, Ayyub J, et al. Non-alcoholic fatty liver disease and its association with left ventricular diastolic dysfunction: a systematic review. Cureus. (2023) 15:e43013. doi: 10.7759/cureus.43013

PubMed Abstract | Crossref Full Text | Google Scholar

15. Zhu K, Guo Y, Zhao C, Kang S, Li J, Wang J, et al. Etiology exploration of non-alcoholic fatty liver disease from traditional chinese medicine constitution perspective: a cross-sectional study. Front Public Health. (2021) 9:635818. doi: 10.3389/fpubh.2021.635818

PubMed Abstract | Crossref Full Text | Google Scholar

16. Wang R, Chen J, Duan S, Lu Y, Chen P, Zhou Y, et al. Noninvasive diagnostic technique for nonalcoholic fatty liver disease based on features of tongue images. Chinese J Integr Med. (2024) 30:203–12. doi: 10.1007/s11655-023-3616-1

PubMed Abstract | Crossref Full Text | Google Scholar

17. Jiang T, Guo XJ, Tu L-P, Lu Z, Cui J, Ma X-X, et al. Application of computer tongue image analysis technology in the diagnosis of NAFLD. Comput Biol Med. (2021) 135:104622. doi: 10.1016/j.compbiomed.2021.104622

PubMed Abstract | Crossref Full Text | Google Scholar

18. Xiao L, Zeng L, Wang J, Hong C, Zhang Z, Wu C, et al. development and validation of machine learning−based marker for early detection and prognosis stratification of nonalcoholic fatty liver disease. Adv Sci. (2025) 12:e10527. doi: 10.1002/advs.202410527

PubMed Abstract | Crossref Full Text | Google Scholar

19. Zou H, Zhao F, Lv X, Ma X, Xie Y. Development and validation of a new nomogram to screen for MAFLD. Lipids Health Dis. (2022) 21:133. doi: 10.1186/s12944-022-01748-1

PubMed Abstract | Crossref Full Text | Google Scholar

20. Li D, Zhang M, Wu S, Tan H, Li N. Risk factors and prediction model for nonalcoholic fatty liver disease in northwest China. Scientific Rep. (2022) 12:13877. doi: 10.1038/s41598-022-17511-6

PubMed Abstract | Crossref Full Text | Google Scholar

21. Yue L. Value of a multi-indicator model combining Elast PQ technology, blood lipids, liver function, and uric acid for early diagnosis of alcoholic fatty liver disease. Am J Transl Res. (2025) 17:3050–62. doi: 10.62347/gmgx5873

PubMed Abstract | Crossref Full Text | Google Scholar

22. Zhu X, Sun F, Gao X, Liu H, Luo Z, Sun Y, et al. Predictive value of triglyceride glucose index in non-obese non-alcoholic fatty liver disease. BMJ Open. (2025) 15:e083686. doi: 10.1136/bmjopen-2023-083686

PubMed Abstract | Crossref Full Text | Google Scholar

23. Hong Y, Chen X, Wang L, Zhang F, Zeng Z, Xie W. Machine learning prediction of metabolic dysfunction-associated fatty liver disease risk in American adults using body composition: explainable analysis based on SHapley Additive exPlanations. Front Nutr. (2025) 12:1616229. doi: 10.3389/fnut.2025.1616229

PubMed Abstract | Crossref Full Text | Google Scholar

24. Yang Z, Zhao Y, Yu J, Mao X, Xu H, Huang L. An intelligent tongue diagnosis system via deep learning on the android platform. Diagnostics. (2022) 12:2451. doi: 10.3390/diagnostics12102451

PubMed Abstract | Crossref Full Text | Google Scholar

25. Zhao Y, Sun Q, Mei S, Gao L, Zhang X, Yang Z, et al. Wearable multichannel-active pressurized pulse sensing platform. Microsyst Nanoeng. (2024) 10:77. doi: 10.1038/s41378-024-00703-7

PubMed Abstract | Crossref Full Text | Google Scholar

26. Collins G, Dhiman P, Ma J, Schlussel M, Archer L, Van Calster B, et al. Evaluation of clinical prediction models (part 1): from development to external validation. Bmj. (2024) 384:e074819. doi: 10.1136/bmj-2023-074819

PubMed Abstract | Crossref Full Text | Google Scholar

27. Shin H, Kang H, Lee J, Lim H. The association between socioeconomic status and adherence to health check-up in Korean adults, based on the 2010-2012 Korean national health and nutrition examination survey. Korean J Fam Med. (2018) 39:114–21. doi: 10.4082/kjfm.2018.39.2.114

PubMed Abstract | Crossref Full Text | Google Scholar

28. Lee H, Kim S, Neese J, Lee M. Does health literacy affect the uptake of annual physical check-ups?: results from the 2017 US health information national trends survey. Arch Public Health. (2021) 79:38. doi: 10.1186/s13690-021-00556-w

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: hepatic steatosis, machine learning, non-invasive screening, Traditional Chinese Medicine, XGBoost

Citation: Zhu K, Li L, Zhao Z, Zheng S, Lin B, Tang W and Li W (2026) Development of a machine learning model for hepatic steatosis screening using non-invasive Traditional Chinese Medicine diagnostics and clinical variables: a health checkup study with community screening potential. Front. Med. 12:1704441. doi: 10.3389/fmed.2025.1704441

Received: 23 September 2025; Revised: 01 December 2025; Accepted: 09 December 2025;
Published: 14 January 2026.

Edited by:

Rong-Rong He, Jinan University, China

Reviewed by:

Soumyajit Podder, Chang Gung University, Taiwan
Shuwei Weng, First Affiliated Hospital of Fujian Medical University, China
Ran Tong, The University of Texas at Dallas, United States

Copyright © 2026 Zhu, Li, Zhao, Zheng, Lin, Tang and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Weihong Li, bHdoQGNkdXRjbS5lZHUuY24=

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.