- 1Clinical Laboratory Center, Meizhou People's Hospital, Meizhou Academy of Medical Sciences, Meizhou, Guangdong, China
- 2Guangdong Engineering Technological Research Center for Clinical Molecular Diagnosis and Antibody Drugs, Meizhou, Guangdong, China
- 3Department of Cardiology, Meizhou People's Hospital, Meizhou Academy of Medical Sciences, Meizhou, Guangdong, China
Purpose: This study aimed to develop and validate a machine learning model for accurate estimation of trunk fat percentage using readily available anthropometric measures, and to evaluate its discriminative performance for cardiometabolic diseases compared with conventional whole-body fat percentage.
Methods: We utilized data from the National Health and Nutrition Examination Survey (NHANES; 1999–2006 and 2011–2018) as the development cohort (n = 30,443). Trunk fat percentage, measured by dual-energy X-ray absorptiometry (DXA), served as the gold standard. Six regression algorithms were evaluated, with model performance assessed by the coefficient of determination (R2). External validation was performed using the China Health and Retirement Longitudinal Study (CHARLS) cohort (n = 13,524), where the discriminative power for hypertension, dyslipidemia, diabetes, heart disease, and stroke was evaluated using the area under the receiver operating characteristic curve (AUC).
Results: The XGBoost model demonstrated superior performance in the development cohort, achieving an R2 of 0.8509 on the test set. A simplified model utilizing only five variables (sex, waist circumference, height, weight, and age) retained 99.3% of the full model’s accuracy (R2 = 0.8450). In external validation, the machine learning-estimated trunk fat percentage consistently outperformed whole-body fat percentage across all cardiometabolic conditions, with the highest AUC improvement observed for diabetes (trunk fat AUC = 0.6607 vs. whole-body fat AUC = 0.6401; relative improvement of 3.22%). The average relative improvement in AUC across all endpoints was 2.77%.
Conclusion: This study presents a highly accurate and clinically practical machine learning model for trunk fat percentage estimation using five basic anthropometric measurements. External validation confirms that trunk fat percentage is a superior biomarker for identifying cardiometabolic risks compared to whole-body fat percentage. The model provides a reliable tool for non-invasive central adiposity assessment in large-scale epidemiological studies and clinical practice.
Introduction
Obesity represents a critical global public health challenge, with its prevalence among adults having risen markedly over the past decade (1). A key limitation in the field is the reliance on traditional measures like body mass index (BMI), which fails to distinguish between fat and lean mass or to capture the critical aspect of fat distribution (2, 3). This oversight is epitomized by the “metabolically obese, normal-weight” (MONW) phenotype, where individuals with a normal BMI exhibit metabolic abnormalities typically associated with obesity, often driven by excess central adiposity (4).
The health risks of obesity are not solely determined by the total amount of body fat but are profoundly influenced by its anatomical distribution (5). Accumulating evidence underscores that trunk (central) fat is a more potent pathogenic depot than peripheral fat, being strongly associated with insulin resistance, dyslipidemia, systemic inflammation, and an elevated risk of cardiovascular diseases (5–10). Consequently, quantifying central adiposity is essential for accurate risk stratification.
While the concept of body fat percentage (BF%) is widely accepted as superior to BMI, it still presents a limitation: it reflects a global measure that does not reveal the specific deposition of fat in the high-risk trunk region (11). This has led to growing recognition that trunk fat percentage (TF%)—the proportion of total body fat stored in the trunk—may be a more discriminative indicator of cardiometabolic risk than total adiposity alone (12–14). However, the practical assessment of body composition, especially region-specific fat distribution, remains a challenge in large-scale settings. Gold-standard techniques like dual-energy X-ray absorptiometry (DXA) are expensive and not readily available for routine use (15).
This practical constraint has spurred the development of anthropometric prediction equations as simple, cost-effective alternatives. While several models exist for estimating total body fat percentage (16–18), there is a notable gap in the literature: no universally applicable equation or machine learning model has been established specifically for predicting Trunk Fat Percentage. This gap persists despite evidence suggesting TF%'s significant clinical relevance.
Therefore, the primary aim of this study was to develop and validate a novel prediction model for TF% using advanced machine learning algorithms. Leveraging a large sample from the National Health and Nutrition Examination Survey (NHANES), we aimed to create a accurate tool based on simple anthropometric measures. Furthermore, we externally validated the model using data from the China Health and Retirement Longitudinal Study (CHARLS) and investigated the association of predicted TF% with key cardiometabolic conditions—hypertension, diabetes, dyslipidemia, heart disease, and stroke—to demonstrate its incremental utility over traditional BF% in risk assessment.
Materials and methods
Study population
In this study we utilized data from two independent surveys to develop and validate a model for estimating trunk fat percentage (Figure 1). The development cohort came from the NHANES covering 1999–2006 and 2011–2018 (19). The NHANES did not collect DXA body composition data during the 2007–2010 survey cycles; therefore, data from this period were unavailable for inclusion in the development cohort. The initial pool contained 80,630 participants. We applied sequential exclusion criteria. First, we removed individuals under 18 years of age leaving 46,449 participants. Next, we excluded those without DXA-measured body composition data resulting in 31,258 participants. Finally, we excluded individuals missing data for key anthropometric variables height weight or waist circumference which yielded our final analytical sample of 30,443 participants.
For external validation we used data from the CHARLS which initially included 17,708 participants (20). After data preprocessing and the removal of records with missing values for essential variables such as height weight and waist circumference the usable sample size was 13,524. This cohort was used to assess the model’s validity and its association with various health conditions.
Dual-energy X-ray absorptiometry measurements
Dual-energy X-ray absorptiometry (DXA) measurements were performed by trained staff. Scans were conducted using a Hologic QDR 4500 A fan-beam X-ray bone densitometer at the Mobile Examination Center. Standard exclusion protocols were applied including recent use of radiographic contrast material or nuclear medicine tests and self-reported weight over 300 pounds or height over 6 feet 5 inches. All DXA scans underwent rigorous quality control review. Body composition metrics including lean mass fat mass and bone mineral content were derived using Hologic Discovery software version 12.1. Invalid scans were flagged as missing in the dataset. For this study the primary outcome was trunk fat percentage calculated from the DXA-derived trunk fat mass and total trunk mass.
Anthropometric measurements and other covariates
Anthropometric data were collected by trained health technicians following standardized protocols. Height was measured with a stadiometer to the nearest 0.1 cm. Weight was measured using a calibrated digital scale to the nearest 0.1 kg. Body mass index was calculated as weight in kilograms divided by height in meters squared. Waist circumference was measured at the uppermost lateral border of the hip crest to the nearest 0.1 cm. Demographic information such as age and race/ethnicity was collected through standardized interviews.
Biomarker and disease status collection
We analyzed laboratory biomarkers and disease status to evaluate the clinical relevance of trunk fat percentage. Biomarkers from NHANES included triglycerides total cholesterol LDL cholesterol and HDL cholesterol measured using standard laboratory methods. Fasting blood samples were used for triglyceride and LDL measurements when available. Disease status in NHANES focused on type 2 diabetes and hypertension. The CHARLS dataset provided self-reported or physician-diagnosed conditions including dyslipidemia diabetes heart disease and stroke. This approach enabled a comparative assessment of how trunk fat percentage relates to cardiometabolic risks across the two populations.
Statistical analysis
Statistical analysis followed a structured pipeline for model development and evaluation. The NHANES development cohort was randomly split into a training set (70%) and an internal testing set (30%). Sample weights were incorporated during model training by weighting the loss function and during performance evaluation using weighted metrics (21). We evaluated six regression algorithms: Linear Regression served as the baseline model; Ridge Regression introduced L2 regularization; Lasso Regression implemented L1 regularization for feature selection; Elastic Net combined both L1 and L2 penalties (22); Random Forest provided an ensemble tree-based approach (23); and XGBoost leveraged gradient boosting (24). This suite of algorithms was chosen to cover a spectrum from interpretable linear models to complex, non-linear ensemble methods, enabling a robust comparison. All models incorporated sample weights to account for the complex survey design of NHANES.
Feature importance was determined using gain-based importance scores for tree-based models and standardized coefficients for linear models (25). We employed feature simplification strategies, including selecting top-ranked features by importance score, applying a threshold to filter out low-impact features, and prioritizing variables based on biological relevance. The performance of simplified models was rigorously assessed via cross-validation, with visual comparisons to evaluate the trade-off between simplicity and accuracy.
Subsequently, hyperparameter tuning for the top-performing models was conducted via a randomized search over a defined parameter space with 50 candidate combinations evaluated using 5-fold cross-validation (26). The final model was selected based on its performance on the internal testing set, using the coefficient of determination (R2) as the primary metric.
Model validation included analyzing the distribution of residuals (assessed using the Shapiro–Wilk test for normality), detecting outliers via standardized residuals, evaluating stability with learning curves, and visualizing actual versus predicted values. We also conducted subgroup analyses to assess model robustness across sex, age, and BMI categories, evaluating performance variation across these key demographic and clinical strata. Model reliability was quantified through 5-fold cross-validation, calculating the standard deviation of performance metrics to assess prediction stability.
External validation was performed using the CHARLS dataset. We compared trunk fat percentage estimates from our optimized XGBoost model against those from conventional anthropometric equations. The comparative analysis involved Receiver Operating Characteristic (ROC) curves with Area Under the Curve (AUC) evaluation for discriminating cardiometabolic conditions (hypertension, dyslipidemia, diabetes, heart disease, and stroke), correlation analysis, and distribution comparisons (20).
All analyses were performed in Python (version 3.9). Key libraries included XGBoost (v1.6.0) for model implementation, Scikit-learn (v1.0) for general machine learning utilities and evaluation, Pandas (v1.4.0) and NumPy (v1.22.0) for data manipulation, SciPy (v1.8.0) for statistical tests, and Matplotlib (v3.5.0) for visualization. Sample weights were incorporated throughout all analyses to properly represent the population-level estimates from both NHANES and CHARLS (27).
Software implementation
We developed a publicly accessible, web-based calculator for estimating trunk fat percentage using our pre-trained XGBoost machine learning model. The application was built using the Flask framework (Python) and provides a user-friendly interface for inputting five anthropometric measurements: gender, waist circumference, height, weight, and age. This tool is designed to facilitate cardiometabolic risk assessment in clinical and research settings. The application is freely available at: https://trunkfatmodel.pythonanywhere.com/.
Results
Participant characteristics stratified by trunk fat percentage quartiles
The demographic and clinical characteristics of the study participants, stratified by quartiles of TF% in both the training and testing sets, are presented in Supplementary Table S1. The baseline characteristics were highly consistent between the training and testing sets, supporting the robustness of the data split.
A clear gradient was observed across TF% quartiles for most variables. Participants in higher TF% quartiles were significantly older and had higher BMI, waist circumference, and body weight (all p-values < 0.001). The distribution of gender was markedly different across quartiles, with a substantial majority of males in the lowest quartile (Q1: 74% in training set) and a predominance of females in the highest quartile (Q3: 82% in training set). Significant differences were also noted in the distribution of race/ethnicity across quartiles (p < 0.001).
Regarding cardiometabolic risk factors, the prevalence of diabetes and hypertension increased significantly with increasing TF% quartiles. Similarly, adverse lipid profiles were associated with higher TF%, including elevated levels of LDL cholesterol and total cholesterol, and lower levels of HDL cholesterol (all p-values < 0.001). These consistent trends in both datasets confirm that trunk fat percentage is strongly associated with known metabolic risk factors.
Model selection and performance comparison
The comparative performance of six algorithms for estimating trunk fat percentage is summarized in Figure 2. All models demonstrated strong predictive capability, with substantial variations in performance metrics observed across different algorithms.
Figure 2. (A) R2 scores: test set vs. cross-validation. (B) Model comparison based on mean absolute error. (C) Cross-validation R2 score distribution and stability analysis. (D) Performance vs. stability trade-off across models.
XGBoost emerged as the top-performing model, achieving the highest R2 score on the test set (0.8509, 95% CI: 0.8439–0.8556) and exhibiting excellent cross-validation consistency (mean CV R2 = 0.8477 ± 0.0047). Random Forest closely followed with comparable performance (test R2 = 0.8481, CV R2 = 0.8439 ± 0.0047), demonstrating the superiority of ensemble tree-based methods over traditional linear models for this prediction task (Figures 2A–C).
Among linear models, Linear Regression and Ridge Regression showed identical performance (test R2 = 0.8104, CV R2 = 0.8108 ± 0.0052), suggesting minimal benefit from L2 regularization in this context, likely due to the low multicollinearity among features. The slightly lower performance of Lasso Regression (test R2 = 0.8057) and Elastic Net (test R2 = 0.7960) may be attributed to the sparsity constraint of L1 regularization, which might have excluded some predictive information, and the suboptimal balance between L1 and L2 penalties in Elastic Net. In contrast, tree-based models effectively captured non-linear relationships and interactions, resulting in superior performance.
The stability analysis revealed that tree-based models exhibited superior robustness, with Random Forest and XGBoost showing the lowest cross-validation standard deviations (±0.0047), indicating consistent performance across different data subsets (Figure 2D). This stability advantage, combined with their superior predictive accuracy, underscores the reliability of ensemble methods for trunk fat percentage estimation.
Feature simplification and model optimization
Systematic feature evaluation revealed a minimal set of five readily available clinical measurements: sex, waist circumference, height, weight, and age. These predictors were selected based on a dual rationale: (1) their dominant collective importance in the XGBoost model (>99% of total feature importance), and (2) their universal availability in clinical and field settings. This simplified model retained 99.3% of the full model’s accuracy (R2 = 0.8450 vs. 0.8509) while reducing feature requirements by 44.4% (Figures 3A,B).
Figure 3. (A) R2 score comparison across feature selection methods. (B) Model MAE comparison by feature selection. (C) Feature quantity vs. performance relationship. (D) Performance loss comparison among feature selection methods.
Feature importance analysis identified sex and waist circumference as the primary determinants, with anthropometric measurements (height, weight) and age providing meaningful incremental value (Supplementary Figure S1). Hyperparameter optimization yielded only marginal improvements (final R2 = 0.8471), confirming the robustness of the baseline configuration. The model demonstrated excellent cross-validation stability (R2 = 0.8434 ± 0.0047), supporting its reliability for clinical application (Supplementary Table S2).
This simplification strategy balances predictive accuracy with practical implementation needs, requiring only basic measurements routinely collected in clinical settings (Figures 3C,D). The optimized model maintains high performance while substantially reducing data collection burden, enhancing its suitability for widespread adoption in both clinical practice and epidemiological research.
Model diagnostic analysis
Comprehensive diagnostic evaluation confirmed the robustness and clinical validity of the simplified XGBoost model for trunk fat percentage estimation. The model demonstrated excellent stability across validation frameworks, with 5-fold cross-validation yielding consistent performance (R2 = 0.8423 ± 0.0049) that closely matched the holdout test set results (R2 = 0.8463), indicating minimal overfitting and strong generalizability (Figure 4B).
Figure 4. (A) Learning curve of the simplified XGBoost model. (B) Predicted vs. actual trunk fat ratio using simplified XGBoost model. (C) Distribution of absolute prediction errors from the simplified XGBoost model.
Residual analysis revealed a well-behaved error distribution with mean residuals approaching zero (0.025) and standard deviation of 3.70%. While formal normality testing indicated slight deviation from perfect Gaussian distribution (Shapiro–Wilk p < 0.001), the practical distribution characteristics remained favorable for clinical application. Outlier analysis identified only 4.9% of predictions exceeding 2 standard deviations, consistent with theoretical expectations (Supplementary Figure S2).
Learning curve analysis confirmed adequate model training, with convergence between training (R2 = 0.8837) and validation (R2 = 0.8383) performance, indicating appropriate complexity balance (Figure 4A). The diagnostic assessment supports the model’s readiness for clinical implementation, with mean absolute error of 2.91% providing practical utility for individual-level estimation while maintaining population-level accuracy suitable for epidemiological applications (Figure 4C).
Subgroup analysis for model robustness
To evaluate the generalizability of our simplified XGBoost model, we conducted subgroup analyses across sex, age, and BMI categories (Supplementary Figure S3). The model demonstrated consistent performance across sex subgroups (male: R2 = 0.800, MAE = 2.73%; female: R2 = 0.773, MAE = 3.10%). Age-based analysis showed strong performance in individuals under 60 years (R2 = 0.854, MAE = 2.86%) and moderate performance in those 60 years or older (R2 = 0.748, MAE = 3.16%). Subgroup analyses further revealed performance variation across BMI categories, with reduced accuracy at both extremes of body composition (underweight and severe obesity), as detailed in Supplementary Figure S4. These results demonstrate that while the model maintains reasonable predictive accuracy across diverse population subgroups, performance varies across different strata, with the strongest performance observed in younger individuals (<60 years) and those in the normal or overweight category.
External validation of disease correlation
External validation in the CHARLS cohort (N = 13,044) demonstrated the superior discriminative performance of trunk fat percentage estimated by our XGBoost model compared to conventional whole-body fat percentage across multiple cardiometabolic conditions (Figures 5A,B). The machine learning-derived trunk fat percentage consistently outperformed whole-body fat percentage in predicting disease risk, with statistically significant improvements observed for all five cardiometabolic endpoints evaluated.
Figure 5. (A) AUC comparison between TF% and BF% for cardiometabolic diseases. (B) AUC radar comparison between TF% and BF% across diseases. (C) Correlation between TF% and BF%. (D) Fat distribution in TF% vs. BF%.
For diabetes risk discrimination, trunk fat percentage achieved the highest AUC improvement (AUC = 0.6607 vs. 0.6401 for whole-body fat percentage, +3.22% relative improvement). Similarly, for dyslipidemia and hypertension, trunk fat percentage showed meaningful enhancements in predictive accuracy (AUC = 0.6531 vs. 0.6342 and 0.6348 vs. 0.6160, representing 2.98 and 3.06% improvements, respectively). Even for stroke, where both measures showed limited discrimination, trunk fat percentage maintained a consistent advantage (AUC = 0.5486 vs. 0.5326).
Distribution analysis revealed strong correlation between the two fat percentage measures (r = 0.953), confirming that trunk fat percentage captures the essential information contained in whole-body estimates while providing incremental predictive value through its focus on central adiposity (Figures 5C,D). The external validation in this independent Chinese cohort demonstrates the generalizability of our machine learning approach and supports the utility of trunk fat percentage as a superior risk stratification tool in diverse populations.
Clinical interpretation of TF% thresholds
To provide actionable references for cardiometabolic risk assessment, we determined the optimal TF% thresholds for predicting five key conditions in the CHARLS cohort using Youden’s index. The thresholds were 28.6% for hypertension, 28.9% for dyslipidemia, 30.8% for diabetes, 31.8% for heart disease, and 30.3% for stroke, with a median of 30.3% across conditions. These empirically derived values offer data-driven benchmarks; for instance, a TF% exceeding 28.6% suggests a higher probability of hypertension and may warrant closer monitoring. The distribution and trade-offs (sensitivity vs. specificity) of these disease-specific thresholds are detailed in Supplementary Figure S5.
Discussion
This study successfully developed and validated a machine learning model to accurately estimate TF% using simple anthropometric measures. Our findings demonstrate that TF% is a more discriminative indicator of cardiometabolic risk than BF%. This underscores that the specific distribution of fat—rather than overall adiposity—is a critical determinant of metabolic health. The sophisticated yet practical XGBoost model effectively captured the complex, non-linear relationships underlying central adiposity.
Trunk fat percentage as a superior risk Indicator
The strong, graded associations observed between predicted TF% and the prevalence of diabetes, hypertension, and dyslipidemia underscore the critical importance of central fat distribution. This aligns with the well-established pathophysiology linking visceral adiposity to insulin resistance and metabolic syndrome (13, 28–30). Crucially, our external validation in an independent cohort provides compelling evidence that TF% consistently outperforms BF% and BMI in predicting key cardiometabolic conditions, despite a high correlation between the two measures. The persistent, albeit modest, improvement in AUC across multiple outcomes strongly suggests that TF% captures unique pathophysiological processes related to fat distribution (31, 32), which are not represented by the global measure of total adiposity. Our results argue for a paradigm shift in risk assessment from how much fat to where the fat is located.
Model performance and clinical translation
Our study advances the field by prioritizing extreme accessibility, in contrast to prior machine learning approaches that relied on specialized imaging or laboratory data for visceral fat assessment (33, 34). A key achievement of this study is the demonstration that machine learning (XGBoost) achieves superior predictive accuracy for trunk fat percentage compared to traditional linear models. This performance advantage indicates the presence of important non-linear relationships and interactions between anthropometric features that linear models cannot capture. Our feature importance analysis, consistent with established physiological principles, confirmed waist circumference as the most powerful predictor, followed by sex. The analysis also quantified the distinct contributions of weight, height, and age, providing both high accuracy and valuable biological insight into the determinants of central adiposity (30, 35).
A significant finding was that a parsimonious set of five core anthropometric variables proved sufficient to achieve excellent predictive performance. This favorable trade-off between simplicity and accuracy makes the model highly suitable for deployment in diverse clinical and field settings where advanced body composition assessment tools are unavailable (36). The resulting web-based tool enables identification of individuals with adverse fat distribution patterns even in the context of normal BMI, allowing for earlier and more targeted interventions in both clinical practice and population health screening (4, 37–39).
Limitations and future directions
Several limitations should be acknowledged. While DXA-measured trunk fat percentage served as our reference standard and we adhered to NHANES quality protocols, potential calibration differences between DXA devices and across NHANES survey waves should be considered as a measurement variability source. DXA is not a direct measure of visceral adipose tissue volume, as provided by CT or MRI. The cross-sectional nature of the data establishes association but not causality; prospective studies are needed to confirm TF%'s predictive value for incident disease. Although we included major anthropometric variables, unmeasured factors such as diet, physical activity, and genetic predisposition contribute to residual variance. Furthermore, in the external validation (CHARLS), conditions such as dyslipidemia and heart disease were based on self-report, which may introduce misclassification bias and limit the achievable AUC for these endpoints. While externally validated in a distinct Asian cohort, further validation in other global populations is warranted to ensure broad applicability.
Looking ahead, several promising research directions emerge from this work. First, the model’s simplicity makes it highly suitable for integration into public health apps or portable screening devices, enabling low-cost, widespread assessment of central adiposity. Second, future studies with larger and more diverse samples could employ deep neural networks to further enhance accuracy and uncover complex, non-linear relationships between anthropometrics and trunk adiposity. Finally, while our model performed well in U. S. and Chinese cohorts, validation in other global populations (e.g., Latin American, African) is essential to confirm its generalizability and optimize its use across different ethnic and geographic settings.
Conclusion
This study provides robust evidence that TF%, predicted with high accuracy using a machine learning model applied to simple anthropometrics, is a superior marker of cardiometabolic risk compared to total body fat percentage. The developed XGBoost model offers a powerful, accessible, and translatable tool for quantifying central adiposity, addressing a critical gap in practical risk assessment. By demonstrating that TF% provides unique predictive information beyond BF%, we underscore the paramount importance of specifically assessing central fat distribution. This model holds significant potential to enhance screening, risk stratification, and personalized management of cardiometabolic diseases.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding authors.
Ethics statement
Ethical approval was not required for the study involving humans in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and the institutional requirements.
Author contributions
LZ: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. XG: Formal analysis, Funding acquisition, Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing. HW: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing. CH: Data curation, Formal analysis, Investigation, Writing – original draft, Writing – review & editing.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This study was supported by grant from the Zhongnanshan Medical Foundation of Guangdong Province (No. ZNSXS-20240023).
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that Generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnut.2026.1715570/full#supplementary-material
References
1. Dai, H, Alsalhe, TA, Chalghaf, N, Riccò, M, Bragazzi, NL, and Wu, J. The global burden of disease attributable to high body mass index in 195 countries and territories, 1990-2017: an analysis of the global burden of disease study. PLoS Med. (2020) 17:e1003198. doi: 10.1371/journal.pmed.1003198,
2. Ashwell, M, Gunn, P, and Gibson, S. Waist-to-height ratio is a better screening tool than waist circumference and BMI for adult cardiometabolic risk factors: systematic review and meta-analysis. Obes Rev. (2012) 13:275–86. doi: 10.1111/j.1467-789X.2011.00952.x,
3. Sweatt, K, Garvey, WT, and Martins, C. Strengths and limitations of BMI in the diagnosis of obesity: what is the path forward? Curr Obes Rep. (2024) 13:584–95. doi: 10.1007/s13679-024-00580-1,
4. Zhang, Y, Fu, J, Yang, S, Yang, M, Liu, A, Wang, L, et al. Prevalence of metabolically obese but normal weight (MONW) and metabolically healthy but obese (MHO) in Chinese Beijing urban subjects. Biosci Trends. (2017) 11:418–26. doi: 10.5582/bst.2017.01016,
5. Mecherques-Carini, M, Albaladejo-Saura, M, Vaquero-Cristóbal, R, Baglietto, N, and Esparza-Ros, F. Validity and agreement between dual-energy X-ray absorptiometry, anthropometry and bioelectrical impedance in the estimation of fat mass in young adults. Front Nutr. (2024) 11:1421950. doi: 10.3389/fnut.2024.1421950,
6. Mtintsilana, A, Micklesfield, LK, Chorell, E, Olsson, T, and Goedecke, JH. Fat redistribution and accumulation of visceral adipose tissue predicts type 2 diabetes risk in middle-aged black south African women: a 13-year longitudinal study. Nutr Diabetes. (2019) 9:12. doi: 10.1038/s41387-019-0079-8,
7. Lee, JJ, Pedley, A, Hoffmann, U, Massaro, JM, Levy, D, and Long, MT. Visceral and intrahepatic fat are associated with Cardiometabolic risk factors above other ectopic fat depots: the Framingham heart study. Am J Med. (2018) 131:684–692.e12. doi: 10.1016/j.amjmed.2018.02.002,
8. Ganpule-Rao, A, Joglekar, C, Patkar, D, Chinchwadkar, M, Bhat, D, Lubree, H, et al. Associations of trunk fat depots with insulin resistance, β cell function and glycaemia - a multiple technique study. PLoS One. (2013) 8:e75391. doi: 10.1371/journal.pone.0075391,
9. Kuang, M, Lu, S, Yang, R, Chen, H, Zhang, S, Sheng, G, et al. Association of predicted fat mass and lean body mass with diabetes: a longitudinal cohort study in an Asian population. Front Nutr. (2023) 10:1093438. doi: 10.3389/fnut.2023.1093438,
10. Gorini, S, Camajani, E, Feraco, A, Armani, A, Quattrini, C, Tarsitano, MG, et al. Gender and age differences in weekend eating habits: associations with fat mass percentage in a cross-sectional study. Front Nutr. (2025) 12:1578574. doi: 10.3389/fnut.2025.1578574,
11. Park, Y, Kim, NH, Kwon, TY, and Kim, SG. A novel adiposity index as an integrated predictor of cardiometabolic disease morbidity and mortality. Sci Rep. (2018) 8:16753. doi: 10.1038/s41598-018-35073-4,
12. Maher, JJ. Trunk fat as a determinant of liver disease. Gastroenterology. (2010) 138:1244–6. doi: 10.1053/j.gastro.2010.02.031,
13. Kouda, K, Fujita, Y, Ohara, K, Tachiki, T, Tamaki, J, Yura, A, et al. Associations between trunk-to-peripheral fat ratio and cardiometabolic risk factors in elderly Japanese men: baseline data from the Fujiwara-kyo osteoporosis risk in men (FORMEN) study. Environ Health Prev Med. (2021) 26:35. doi: 10.1186/s12199-021-00959-9,
14. Snijder, MB, Dekker, JM, Visser, M, Bouter, LM, Stehouwer, CDA, Yudkin, JS, et al. Trunk fat and leg fat have independent and opposite associations with fasting and postload glucose levels: the Hoorn study. Diabetes Care. (2004) 27:372–7. doi: 10.2337/diacare.27.2.372,
15. Lee, DH, and Giovannucci, EL. Body composition and mortality in the general population: a review of epidemiologic studies. Exp Biol Med (Maywood). (2018) 243:1275–85. doi: 10.1177/1535370218818161,
16. Gómez-Ambrosi, J, Silva, C, Catalán, V, Rodríguez, A, Galofré, JC, Escalada, J, et al. Clinical usefulness of a new equation for estimating body fat. Diabetes Care. (2012) 35:383–8. doi: 10.2337/dc11-1334,
17. Lee, DH, Keum, N, Hu, FB, Orav, EJ, Rimm, EB, Sun, Q, et al. Development and validation of anthropometric prediction equations for lean body mass, fat mass and percent fat in adults using the National Health and nutrition examination survey (NHANES) 1999–2006. Br J Nutr. (2017) 118:858–66. doi: 10.1017/S0007114517002665,
18. Liu, M, Zhang, Z, Zhou, C, Ye, Z, He, P, Zhang, Y, et al. Predicted fat mass and lean mass in relation to all-cause and cause-specific mortality. J Cachexia Sarcopenia Muscle. (2022) 13:1064–75. doi: 10.1002/jcsm.12921,
19. CDC. NHANES Questionnaires, Datasets, and Related Documentation. Available online at: https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/ (Accessed July 1, 2025).
20. Zhao, Y, Hu, Y, Smith, JP, Strauss, J, and Yang, G. Cohort profile: the China health and retirement longitudinal study (CHARLS). Int J Epidemiol. (2014) 43:61–8. doi: 10.1093/ije/dys203,
21. DuGoff, EH, Schuler, M, and Stuart, EA. Generalizing observational study results: applying propensity score methods to complex surveys. Health Serv Res. (2014) 49:284–303. doi: 10.1111/1475-6773.12090,
22. Friedman, J, Hastie, T, and Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. (2010) 33:1–22. doi: 10.18637/jss.v033.i01
24. Chen, T, and Guestrin, C. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, California: ACM (2016). p. 785–794.
25. Battistella, E, Ghiassian, D, and Barabási, A-L. Improving the performance and interpretability on medical datasets using graphical ensemble feature selection. Bioinformatics. (2024) 40:btae341. doi: 10.1093/bioinformatics/btae341,
26. Bergstra, J, and Bengio, Y. Random search for hyper-parameter optimization. J Mach Learn Res. (2012) 13:281–305.
27. Mirel, LB, Mohadjer, LK, Dohrmann, SM, Clark, J, Burt, VL, Johnson, CL, et al. National Health and nutrition examination survey: estimation procedures, 2007-2010. Vital Health Stat 2. (2013) 2:1–17.
28. Institute of Physiotherapy, Faculty of Medicine, University of Rzeszow, Poland,Wyszyńska, J, and Mazur, A. Excessive body mass and its correlation with hypertension – a review of the literature. Med Rev. (2016) 14:209–19. doi: 10.15584/medrev.2016.2.7
29. van Dis, I, Kromhout, D, Geleijnse, JM, Boer, JMA, and Verschuren, WMM. Body mass index and waist circumference predict both 10-year nonfatal and fatal cardiovascular disease risk: study conducted in 20,000 Dutch men and women aged 20-65 years. Eur J Cardiovasc Prev Rehabil. (2009) 16:729–34. doi: 10.1097/HJR.0b013e328331dfc0,
30. Walton, C, Lees, B, Crook, D, Worthington, M, Godsland, IF, and Stevenson, JC. Body fat distribution, rather than overall adiposity, influences serum lipids and lipoproteins in healthy men independently of age. Am J Med. (1995) 99:459–64. doi: 10.1016/s0002-9343(99)80220-4,
31. Alser, M, Naja, K, and Elrayess, MA. Mechanisms of body fat distribution and gluteal-femoral fat protection against metabolic disorders. Front Nutr. (2024) 11:1368966. doi: 10.3389/fnut.2024.1368966,
32. Hauner, H. Secretory factors from human adipose tissue and their functional role. Proc Nutr Soc. (2005) 64:163–9. doi: 10.1079/PNS2005428
33. Cuesta-Vargas, A, Arjona-Caballero, JM, Olveira, G, de Luis Román, D, Bellido-Guerrero, D, and García-Almeida, JM. Automatic analysis of ultrasound images to estimate subcutaneous and visceral fat and muscle tissue in patients with suspected malnutrition. Diagnostics. (2025) 15:988. doi: 10.3390/diagnostics15080988,
34. Palmieri, F, Akhtar, NF, Pané, A, Jiménez, A, Olbeyra, RP, Viaplana, J, et al. Machine learning allows robust classification of visceral fat in women with obesity using common laboratory metrics. Sci Rep. (2024) 14:17263. doi: 10.1038/s41598-024-68269-y,
35. Peppa, M, Koliaki, C, Hadjidakis, DI, Garoflos, E, Papaefstathiou, A, Katsilambros, N, et al. Regional fat distribution and cardiometabolic risk in healthy postmenopausal women. Eur J Intern Med. (2013) 24:824–31. doi: 10.1016/j.ejim.2013.07.001,
36. Després, J-P, Lemieux, I, Bergeron, J, Pibarot, P, Mathieu, P, Larose, E, et al. Abdominal obesity and the metabolic syndrome: contribution to global cardiometabolic risk. Arterioscler Thromb Vasc Biol. (2008) 28:1039–49. doi: 10.1161/ATVBAHA.107.159228
37. Nuttall, FQ. Body mass index: obesity, BMI, and health a critical review. Nutr Today. (2015) 50:117–28. doi: 10.1097/NT.0000000000000092,
38. Jayedi, A, Soltani, S, Zargar, MS, Khan, TA, and Shab-Bidar, S. Central fatness and risk of all cause mortality: systematic review and dose-response meta-analysis of 72 prospective cohort studies. BMJ. (2020) 370:m3324. doi: 10.1136/bmj.m3324,
Keywords: central adiposity, dual-energy X-ray absorptiometry, metabolic risk prediction, percent fat, trunk fat
Citation: Zeng L, Guo X, Wu H and Huang C (2026) Machine learning-based estimation of trunk fat percentage and its association with cardiometabolic risk leveraging two large national cohorts. Front. Nutr. 13:1715570. doi: 10.3389/fnut.2026.1715570
Edited by:
Vijaya Juturu, Independent Researcher, Flemington, NJ, United StatesReviewed by:
Thiago Gonçalves dos Santos Martins, Federal University of São Paulo, BrazilMesut Ozdag, University of Central Florida, United States
Yan Xu, The Affiliated Hospital of Qingdao University, China
Copyright © 2026 Zeng, Guo, Wu and Huang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Xuemin Guo, Z3VveHVlbWluQG16cm15eS5jb20=; Liangming Zeng, bG16ZW5nLm9mZmljaWFsQG91dGxvb2suY29t