Phthalate metabolites and sex steroid hormones in relation to obesity in US adults: NHANES 2013-2016

Background Obesity and metabolic syndrome pose significant health challenges in the United States (US), with connections to disruptions in sex hormone regulation. The increasing prevalence of obesity and metabolic syndrome might be associated with exposure to phthalates (PAEs). Further exploration of the impact of PAEs on obesity is crucial, particularly from a sex hormone perspective. Methods A total of 7780 adult participants in the National Health and Nutrition Examination Survey (NHANES) from 2013 to 2016 were included in the study. Principal component analysis (PCA) coupled with multinomial logistic regression was employed to elucidate the association between urinary PAEs metabolite concentrations and the likelihood of obesity. Weighted quartiles sum (WQS) regression was utilized to consolidate the impact of mixed PAEs exposure on sex hormone levels (total testosterone (TT), estradiol and sex hormone-binding globulin (SHBG)). We also delved into machine learning models to accurately discern obesity status and identify the key variables contributing most to these models. Results Principal Component 1 (PC1), characterized by mono(2-ethyl-5-carboxypentyl) phthalate (MECPP), mono(2-ethyl-5-hydroxyhexyl) phthalate (MEHHP), and mono(2-ethyl-5-oxohexyl) phthalate (MEOHP) as major contributors, exhibited a negative association with obesity. Conversely, PC2, with monocarboxyononyl phthalate (MCNP), monocarboxyoctyl phthalate (MCOP), and mono(3-carboxypropyl) phthalate (MCPP) as major contributors, showed a positive association with obesity. Mixed exposure to PAEs was associated with decreased TT levels and increased estradiol and SHBG. During the exploration of the interrelations among obesity, sex hormones, and PAEs, models based on Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) algorithms demonstrated the best classification efficacy. In both models, sex hormones exhibited the highest variable importance, and certain phthalate metabolites made significant contributions to the model’s performance. Conclusions Individuals with obesity exhibit lower levels of TT and SHBG, accompanied by elevated estradiol levels. Exposure to PAEs disrupts sex hormone levels, contributing to an increased risk of obesity in US adults. In the exploration of the interrelationships among these three factors, the RF and XGBoost algorithm models demonstrated superior performance, with sex hormones displaying higher variable importance.


Introduction
The prevalence of central obesity, defined as body mass index (BMI) ≥ 30.0 kg/m 2 , significantly increased from 45.2% in 1999-2000 to 56.7% in 2013-2014 (1).Obese people are also more likely to develop metabolic diseases that threaten population health, such as cardiovascular disease, type 2 diabetes, dyslipidaemia, osteoarthritis, sleep apnoea, certain types of cancer and all-cause mortality (2)(3)(4).The increase in obesity rates in the population can be attributed to alterations in genetic, lifestyle, and environmental factors and their interactions (5).Research have proven a tight correlation between sex steroid hormones and obesity.Testosterone (TT) is the major androgenic steroid hormone in adult males and is responsible for maintaining sperm production, libido, and sexual efficacy (6).Men with obesity exhibit reduced levels of testosterone, and sex hormone-binding globulin (SHBG) (7,8).Obesityassociated reduction in testosterone is accompanied by reduced levels of luteinizing hormone (LH), whereas age-related reduction in testosterone is correlated with increased LH (7), indicating central rather than gonadal dysregulation in obesity.Estradiol, the principal hormone in female reproduction, is vital for the development and maintenance of female reproductive tissues and the regulation of the menstrual cycle (9).Women with overweight and obesity tend to have higher estrogen levels compared to their normal-weight counterparts (10).Weight loss interventions have been shown to effectively reduce estrogen levels among females with obesity (11).SHBG is a glycoprotein that transports TT and estradiol to target tissues, thereby influencing the bioavailability of these reproductive hormones (12).Observational studies have indicated that lower levels of SHBG are associated with an increased incidence of insulin resistance and type 2 diabetes, independent of sex hormone concentrations (13).Sex steroid hormone drugs have been used to treat obesity and metabolic imbalances (14).
Endocrine disruptors chemicals (EDCs) are a group of substances with endocrine hormone effects, most of which are artificially synthesized chemicals, such as bisphenol A, phthalates (PAEs), insecticides, polychlorinated biphenyls, and more (15).These substances can enter the human body through ingestion in the digestive tract, inhalation in the respiratory tract, and skin contact, resulting in a variety of adverse effects, which are mainly characterized by endocrine disruption, hormone function disruption, and reproductive organ developmental disorders, and in severe cases, can induce cancer (16,17).Evidence suggests that EDCs may be associated with a significant increase in the prevalence of metabolic diseases such as obesity (18).PAEs as EDCs continue to receive academic attention as risk factors for metabolic diseases such as diabetes mellitus, hypertension, hyperlipidemia, and the reproductive toxicity (19,20).PAEs are mainly used as plasticizers in the manufacturing of plastic products.Plastic products can be found everywhere in modern life, from infants to the elderly, all of whom are exposed to PAEs in the environment for long periods of time.The variety of PAEs used as plasticizers is large, and their hydrolysis process in the human body is complex, with different stages of metabolites (21).Previous studies have indicated the presence of 22 phthalate metabolites in human urine (22).Exposure to PAEs may induce hypothalamicpituitary-gonadal (HPG) axis dysfunction, disrupting the balance of multiple sex hormones within the body (23,24).Exposure to PAEs is closely linked to obesity, except for their role in causing imbalances in sex hormone levels.The association between PAEs and obesity has been extensively investigated in diverse populations (25)(26)(27)(28).PAEs metabolites exhibit biochemical activity, including the activation of peroxisome proliferator receptors and antiandrogenic effects, which contribute to the development of obesity (29).
Considering the correlation between sex steroid hormones and obesity, we included all these three aspects in our study.To explore the role that sex steroid hormones play in the increased risk of obesity due to phthalates, we tried various machine learning models for interpretation.We aimed to investigate the association between PAEs, sex steroid hormones, and obesity from a novel perspective, thereby highlighting the health risks associated with PAEs.

Study design and participants
NHANES is an ongoing cross-sectional survey conducted by the National Center for Health Statistics (NCHS) of the Centers for Disease Control and Prevention (CDC) to collect health screening data from a nationally representative sample of U.S. residents and noninstitutionalized civilians.The dataset for this study contains two cycles of NHANES (2013-2014 and 2015-2016), which includes laboratory data on phthalate metabolites and sex steroid hormones (TT, estradiol and SHBG).We initially selected 10090 participants.Of these, 1035 participants were excluded due to the presence of interfering sex hormone levels (including hormone medication using, pregnancy, ovariectomy and menstrual disorders), and a further 391 participants were excluded due to missing data on phthalate variables, resulting in the inclusion of 7780 eligible adult subjects (3915 males and 3865 females).The flow chart for screening participants is shown in Figure 1.

Sex steroid hormones
Sex steroid hormone data were categorized in the NHANES laboratory data.Briefly, total testosterone and estradiol were determined using isotope dilution liquid chromatography-tandem mass spectrometry (ID-LC-MS/MS).SHBG is not measured directly, but is based on a chemiluminescent measurement of the reaction of SHBG with an immunological antibody and the reaction products.For detailed descriptions of all laboratory test methods, refer to the CDC's Laboratory Methods document (34,35).Sex steroid hormones were measured using blood samples.

Covariates
Several covariates were selected for inclusion in the statistical model based on the characteristics of the population studied.These included some typical demographic variables such as age, gender, race, place of birth, and marital status.Educational level was divided into four categories: lower than high school, high school, some college, or Associate of Arts (AA) degree, and college graduate or above.The ratio of household income to poverty was set to three categories (1.3 and 3.5 as the two dividing lines for the ratio values).Smoking, drinking, hypertension, and diabetes were included in the study as basic diseases of the population.Other key covariates were body mass index (BMI), time of day of serum collection (i.e.morning, afternoon, evening) and urinary creatinine.Considering the possible influence on individual obesity status, we included physical activity and average daily calorie intake as covariates (36).entire U.S. population.Therefore, we conducted statistical descriptions on the sample data and weighted data separately.The population was divided into three subgroups based on body mass index (BMI): BMI of less than 25 were 'Normal weight', between 25 and 30 were 'Overweight', greater than or equal to 30 were defined as 'Obese'.PAEs metabolite levels were derived from urine samples, to control for the interference of renal metabolic differences in different individuals in the study, urinary creatinine levels were adjusted to accommodate variations in urine dilution (36,38,39), and the timing of blood sample collection was regulated to address diurnal fluctuations in sex hormone concentrations (36).For continuous variables, skewness and kurtosis tests were used to test whether the distribution of the data was approximately normal, and non-normal continuous variables were described using the median (IQR).The Mann-Whitney U test and Kruskal-Wallis' rank sum test were used to compare differences between subgroups.For categorical variables, the frequency and percentage were calculated, and the Chi-square test was used to check differences between subgroups.P values for multiple comparisons were calibrated using the Bonferroni's correction.
We plotted correlation heat maps and calculated Spearman's correlation coefficients and FDR-corrected (false discovery rate) P values to confirm the correlation between urinary creatinine and phthalate metabolites.Due to the presence of multicollinearity among the phthalate metabolites, principal component analysis was used to obtain principal component variables for the phthalate metabolite variables, which were included in the multinomial logistic regression instead of the original variables.Weighted quantile sum (WQS) regression was used to explore the effects of mixed PAEs exposure on sex steroid hormones and BMI.
We experimented with a variety of machine learning models to build predictive models with the aim of making full use of data features to shed light on the impact of phthalates on human health.We tested K-Nearest Neighbor (KNN), Naive Bayes, Support Vector Machines (SVM), Decision Trees (DT), Random Forest (RF), Gradient Boosting Decision Tree (GBDT) and eXtreme Gradient Boosting (XGBoost) algorithms to build the predictive models, using cross-validation and calculating the predictive accuracy of each model.'Accuracy' and 'F1-score' was calculated to evaluate the superiority of prediction performance among the models.
Data analysis and machine learning modelling were implemented using R of version 4.2.1, with missing values of the independent variables filled in by the "DMwR2" package.Outliers were identified as values outside the interquartile plus or minus three interquartile range (IQR) and were removed from further analysis.Principal component analysis, multinomial logistic regression and WQS regression were implemented by the package "stats", "nnet" and "gWQS", respectively.The statistical significance level was set at 0.05.The modelling was mainly performed using a number of machine learning algorithms integrated in the "mlr" package (40).

Characteristics of subjects
Table 1 shows the descriptive statistics of the 7780 subjects included in the analysis, which were divided into three groups based on BMI; i.e., 1643 participants were 'Normal weight', 2372 participants were 'overweight', and 3765 participants were defined as 'obese'.The median ages for the total group and the three subgroups were 60, 58, 62, and 59, respectively.Non-Hispanic black people were in the majority in each subgroup, followed by non-Hispanic white people.The highest proportion of males was found in the overweight subgroup.Overall, the 'Overweight' and 'Obese' subgroups had lower educational attainment than the 'Normal weight' subgroup.The 'Obese' subgroup had a smaller proportion of higher incomes.Over half of the participants in the overweight and obese subgroups were married.The 'Obese' subgroup had a higher percentage of US births than overall.The 'Normal weight' subgroup had a higher proportion of neversmokers than overall.Notably, the 'Obese' subgroup had a higher rate of diabetes, while the 'Overweight' subgroup had a higher average daily energy intake.The weighted statistical descriptions of the participants can be viewed in Table S1.Table 2 characterizes the levels and distribution of serum sex steroid hormones (TT, estradiol and SHBG), urinary creatinine and urinary phthalate metabolites by median and interquartile spacing (IQR).We used the Kruskal-Wallis' rank sum test to compare between-group differences in continuous variables across the three subgroups, and the Mann-Whitney U test for two-by-two comparisons between subgroups.We found significant differences in the levels and distributions of these laboratory variables across the three subgroups, with the results of the Mann-Whitney U test showing more pronounced differences between the subgroups 'Normal weight' and 'Obese'.Median urinary levels of phthalate metabolites were significantly higher in the 'Obese' subgroup than in the other two subgroups, as were median levels of urinary creatinine (115.0 mg/dL).Serum estradiol levels were higher in the 'Obese' subgroup (21.3 pg/mL), whereas serum SHBG levels

Regression results
To examine the distinct associations among PAEs metabolites, sex steroid hormones, and participants' obesity status, we constructed a multinomial logistic regression model with obesity status as a three-categorical dependent variable.However, we observed multicollinearity among these phthalate metabolites.Additionally, since these metabolites were sampled in urine samples, there was some correlation with urinary creatinine levels.We generated correlation heatmaps to illustrate the relationships between PAEs metabolites and urinary creatinine levels, as depicted in Figure 2. Positive correlations between PAEs metabolites and urinary creatinine were prevalent, and the FDRcorrected P-values remained significant (P< 0.001).
To address the issue of multicollinearity among phthalate metabolites, we employed principal component analysis (PCA) to reduce the dimensionality of the original variables.As shown in Supplementary Figure 1, PCA extracted 11 principal components that accounted for all the variances in the original variables.Notably, the proportions of variances explained by the components beyond the sixth were consistently lower than 0.04.Consequently, we opted for the first six principal components instead of the original variables, collectively explaining 90.5% of the total variance in the original variables.The contribution of each phthalate metabolite to each PC is presented in Supplementary Table 3.These principal components derived from PCA were incorporated into multinomial logistic regression to calculate odds ratios (ORs) and corresponding 95% confidence intervals (CI) for the other two categorical endpoints, with 'Normal weight' as the reference (Table 3).The regression results revealed a statistically significant association between PC1, PC2, and obesity status, with PC2 posing a risk factor for obesity (OR = 1.082,P< 0.001) and PC1 acting as a protective factor (OR = 0.890, P = 0.003).When using 'Normal weight' as the reference, PC5 emerged as a risk factor for the population's tendency to be overweight (OR = 1.152,P = 0.02), but the association with obesity was not statistically significant (P = 0.136).Since principal components can be interpreted as linear combinations of primitive continuous variables, those associated with obesity status can be analyzed based on their composition.As outlined in Table S3, the risk factor PC2 for obesity status was primarily explained by MCNP, MCOP, and MCPP, while the protective factor PC1 was predominantly explained by MECPP, MBP, MEHHP, MEHP, and MEOHP.Concerning sex steroid hormones, TT and SHBG emerged as protective factors for obesity status (OR = 0.997; OR = 0.977), with only estradiol identified as a risk factor (OR = 1.008).Among other covariates, hypertension and diabetes were significant risk factors for obesity (OR = 2.938; OR = 3.31).
The WQS regression method was employed to examine the impact of mixed PAEs exposure on sex steroid hormones and BMI (as a continuous variable).This weighted approach aimed to amalgamate PAEs metabolites into a 'phthalate index' to address multicollinearity among the original variables.The objective was to obtain interpretable regression coefficients that quantify the combined effect of phthalate metabolites on sex hormones.The WQS regression results indicated that the phthalate index was significant for all three sex steroid hormones and BMI (Table 4).Notably, the phthalate index exhibited a negative correlation with TT (b = -20.85,P< 0.001) and positive correlations with estradiol (b = 3.00, P = 0.001) and SHBG (b = 5.22, P< 0.001).Regarding the negative correlation of the phthalate index with total testosterone, MCOP contributed the most with 34.6%, while in the positive correlation with estradiol, MiBP contributed the most with 37.0%.In the positive correlation of the phthalate index with SHBG, MEOHP accounted for the most with 24.3% (Table 5).Additionally, the PAEs index demonstrated a positive correlation with BMI, with MEP contributing to 41.0% of the mean weight.
Given the variation in sex hormone levels across genders and ages, we conducted subgroup analyses to explore potential differences (Supplementary Table 5).For age stratification, participants were divided into 'middle-aged' and 'older' subgroups using the median age (60 years) as the cutoff.The stable negative Heat map showing Spearman's correlation matrix for concentrations of eleven urinary phthalate metabolites and urinary creatinine levels.
The FDR-corrected P values indicate that Spearman's correlation matrix is statistically significant (P< 0.001).The color corresponds to the strength of correlations (blue: positive correlation; white: no correlation; red: negative correlation).
correlation between PC1 and obesity was observed across all four subgroups, with no statistically significant differences in 'Femalemid' and 'Male-mid'.WQS regression was applied to investigate the effects of mixed PAEs exposure on sex hormones in all subgroups (Supplementary Table 6).The results revealed a positive association between BMI and mixed PAEs exposure in all subgroups, along with a negative association between TT and mixed PAEs exposure.Concerning estradiol, a positive correlation with mixed PAEs exposure was observed in all subgroups except for 'Male-mid'.
Similarly, a positive correlation with mixed PAEs exposure was noted for SHBG in all subgroups, though the differences in 'Femalemid' and 'Male-mid' were not statistically significant.

Machine learning models
After obtaining regression results, our next objective was to discern the predominant influences of sex hormones and phthalate metabolites on the population's obesity status.To achieve this, we employed various algorithms for prediction models.Independent variables underwent preprocessing using the one-hot coding technique, as some machine learning algorithms do not accommodate categorical independent variables.The total samples were randomly divided into an 80% training set and a 20% validation set.Utilizing the grid search technique, the training set entered a suitable hyperparameter space to identify the optimal hyperparameter combination.To mitigate errors from random sampling, a 5-fold cross-validation was employed during the search process.This methodology pinpointed the hyperparameter combination minimizing the average error, used to construct the prediction model.Subsequently, the validation set evaluated the predictive performance of the model, and the 'Accuracy' represented the ratio of correctly predicted samples to the total number in the validation set.F1-score is obtained from the confusion matrix of the prediction results, which is a comprehensive evaluation combining 'Precision' and 'Recall', and is calculated by the formula of F1= 2ÂPrecisionÂRecall Precision+Recall .Simply put, when the F1-score is higher, the accuracy and recall are higher and the model has better predictive power.
When comparing the predictive performance of all models (Table 6), those with high accuracy also exhibit higher F1-scores.
To ensure an unbiased model selection, we introduced several simple yet classical algorithms, including KNNs, Naive Bayes, and the acquired multinomial logistic regression model.Regression models and Naive Bayes, being less reliant on hyperparameter tuning, are user-friendly and easily interpretable.However, their prediction accuracy, as revealed by the outcomes, falls below 60%, indicating suboptimal efficacy on our dataset.The KNN algorithm achieves a prediction accuracy of 83.4% with an F1-score of 0.813, employing a hyperparameter 'k' set to 1. SVM, classified into 'radial' and 'polynomial' based on different kernel functions, both demonstrate an accuracy of approximately 85%, with F1-scores exceeding 0.83.Despite the decision tree model having an accuracy of 80.4%, the decision tree algorithm remains a foundational concept for numerous complex algorithms.
The RF algorithm is an extension of the decision tree classification algorithm.In our RF model, we utilize 300 decision tree models (ntree = 300), a maximum of 15 features used on the nodes of each decision tree model (mtry = 15), a minimum of 12 samples on the leaf nodes (nodesize = 12), and a total maximum number of leaf nodes set to 350 (maxnodes = 350).With this set of hyperparameters, the RF model achieves an accuracy of 88.4% and an F1-score of 0.87.Despite the randomization method employed by RF, which reduces the risk of overfitting, we exercise control over  The importance of the characteristics in the RF model for the three categorical endpoints was comprehensively evaluated using 'mean decrease accuracy' (41).The top 20 independent variables contributed to 83% of the mean decrease accuracy values of all variables (Figure 3).The sex hormone variable emerged as the most significant contributor to the predictive accuracy of the model, followed by age, diabetes mellitus, average daily caloric intake, hypertension, and phthalate metabolite levels.These findings underscore the importance of age, diabetes, and hypertension as crucial factors in predicting obesity status.Furthermore, the substantial variations in sex hormone levels across different subsets highlight the significant contributions of these three sex hormone variables to the predictions of the RF model.Among PAEs metabolites, MEP, MiBP and MCOP exhibited higher mean decreasing accuracy values (Figure 3).
Both GBDT and RF are extensions of the decision tree algorithm, but RF is a variant of the decision tree algorithm optimized with the bootstrap aggregating (or bagging for short) technique, while GBDT is the decision tree algorithm optimized with the gradient boosting technique.Since our target variable involves a triple classification of physical states, the hyperparameter 'distribution' for GBDT is set to 'multinomial'.The final GBDT model consists of a total of 400 decision trees (n.trees = 400), with a minimum of 40 observations in the terminal nodes (n.minobsinnode = 40), and a learning rate of 0.9 for each decision tree (shrinkage = 0.9).However, the GBDT model's prediction accuracy is only 77.1%, and the F1-score is only 0.752, indicating suboptimal performance on our dataset.In response to this, we explored the XGBoost algorithm, which is also grounded in the gradient boosting technique.
In our XGBoost model, the learning rate is set to 0.1 (eta = 0.1), the minimum loss reduction at leaf nodes is 0.556 (gamma = 0.556), the maximum depth of the trees is 10 (max_depth = 10), the minimum impurity level before node division is 1.5 (min_child_weight = 1.5), the proportion of independent variables used in a single decision tree is 0.5 (colsample_bytree = 0.5), the total number of decision trees is 100 (nrounds = 100), and the loss function employed is the logarithmic loss function (eval_metric = 'mlogloss').We also constrained the search range of hyperparameters during the hyperparameter search to prevent overfitting.This configuration of hyperparameters for our XGBoost model yields a prediction accuracy of 89.1% and an F1-score of 0.879.
To assess the contribution of each variable in predicting individual physical states, we applied the Shapley additive  The classification models were built using machine learning algorithms with adjusted parameters, and accuracy and F1-score were used as model evaluation metrics.explanatory (SHAP) tree framework to the XGBoost model with a customized loss model (42).The SHAP value combines the effect of a given variable on its own and the effect of the interaction of that variable with other parameters.For a given individual (local interpretation), the sum of the SHAP values for all variables of the model represents the deviation of the individual from the predicted propensity of obesity status for the entire dataset.The greater the overall SHAP value, the more significant the contribution of the variable to predicting obesity status.The global SHAP values for the top 15 variables, as depicted in Figure 4, account for 73.2%, 67.9%, and 71.2% of the average total SHAP contribution, respectively.The three subplots depict the global SHAP values for Normal weight (Figure 4A), Overweight (Figure 4B), and Obese (Figure 4C) respectively.As depicted in Figure 4, within the XGBoost model, sex steroid hormones exhibited the most substantial contributions to predicting all obesity conditions.Additionally, age, hypertension, diabetes, urinary creatinine, and certain phthalate metabolites showed high global SHAP values.Notably, individuals with hypertension or diabetes displayed a clear inclination toward obesity, underscoring the significant role of hypertension and diabetes as risk factors for obesity, consistent with earlier regression findings.Among the top 15 variables contributing most to the prediction of obesity status (Figure 4C), only three phthalate metabolites-MEHP, MCNP, and MCOP-were present, with MEHP being consistently negatively associated with obesity.Diabetes exhibited a negative association with 'Normal weight' status (Figure 4A).Participants from 'other races' tended to predict normal weight, while Mexican Americans displayed the opposite trend.The regression results from the principal component analyses described earlier indicated statistically significant associations between the three sex steroid hormones and obesity status, with a low strength of association (odds ratios approximating 1.0), consistent with the global SHAP values for the sex hormone variables in Figure 4.

Discussion
The analysis results unveiled noteworthy connections between sex steroid hormones and obesity in the population, with specific principal components of phthalate metabolite composition also displaying substantial associations with obesity status in US adults.The outcomes from WQS regression models pointed out that mixed exposure to phthalate metabolites was linked to total testosterone TT, estradiol, and SHBG.Among various predictive models, RF and XGBoost exhibited superior predictive performance for obesity, with sex steroid hormones contributing the most to the model predictions, followed by demographic variables such as diabetes and hypertension, and phthalate metabolites.
PAEs, as a typical environmental EDCs, have multifaceted and multi-systemic effects on human health (43).This study reveals a positive correlation between PAEs exposure and BMI, with MEP contributing 41.0% to the average weight of PAEs index.The regression findings indicated a positive connection between the PC2 of phthalate metabolites and the likelihood of obesity.MCNP, MCOP, and MCPP constituted the primary contributing factors to PC2, thus categorizing them as obesity risk factors.Additionally, the fifth principal component (PC5), predominantly composed of MiBP and MBP, displayed a positive correlation with the occurrence of overweight.These findings underscore the complex and interconnected impact of PAEs on human health, particularly in relation to weight-related outcomes.These results align closely with previous research findings.Stahlhut et al. found the PAEs exposures and their associations with obesity in adult US males (participant in the NHANES 1999-2002) (43).MBP, MCOP, MCNP, MCPP and MECPP were also found to be associated with obesity or BMI in adults participating in the U.S.-based NHANES (44,45).Notably, MCOP exhibited associations not only with BMI but also with waist circumference.In a more recent cohort study involving 942 elderly individuals in China(Li et al.) (46), urinary levels of MEP, MEOHP, MBP, and MMP were positively associated with general obesity in Top-20 importance ranking features based on mean decrease accuracy from the RF model.Among PAEs metabolites, MEP, MiBP and MCOP exhibited higher mean decreasing accuracy values.
males.Furthermore, an intriguing discovery indicated that MCNP, MCOP, and MEHP play a role in the onset of obesity (47).A comprehensive meta-analysis concluded that MMP, MEP, and MiBP showed positive associations with abdominal obesity, while MEHHP, MECPP, and MCOP exhibited positive correlations with general obesity in adults (48) Their study formed part of a broader understanding of phthalate exposure and its effects on obesityrelated outcomes.In a longitudinal cohort study conducted by the Women's Health Initiative (WHI), certain phthalate biomarkers, including MCNP, were found to be positively associated with an increase in visceral adipose tissue (VAT) in postmenopausal women.However, no significant correlation was established between other phthalate biomarkers (MCOP, MCPP, etc.) and either VAT or subcutaneous adipose tissue (SAT) (49).This cumulative evidence supports the intricate relationship between phthalate exposures and obesity across diverse populations and study designs.
Sexual steroid hormones affect the metabolism, distribution, and increase of adipose tissue by binding to receptors in adipose tissue, and a decrease in estrogen and/or androgens typically leads to central obesity (50).The reproductive toxicity of PAEs, as confirmed by numerous existing studies, encompasses adverse effects on the HPG axis, including abnormal release of gonadotropin-releasing hormone and gonadotropins, along with dysfunction of sex hormone receptors and steroid hormone synthesis (23).These factors collectively contribute to a heightened prevalence of metabolic disorders (38,51).Additionally, some researchers contend that obesity, on the contrary, heightens the risk of sex hormone imbalances (52).Lapauw et al. found that low serum SHBG and total testosterone levels were very common in obese men (53).Our results indicated that estradiol was positively associated with obesity, while total testosterone and SHBG levels were negatively associated with obesity.The relationship between these elevated rates and the concomitant presence of chronic sex hormone imbalances and PAEs exposure warrants further investigation and scrutiny.
Building upon this, it's noteworthy that combined exposure to phthalates and their metabolites contributes to an elevation in estradiol and SHBG levels, coupled with a reduction in TT levels.The disruptive impact of phthalates on hormonal balance has been established in both animal and human studies, suggesting potential implications for endocrine systems.In particular, DEHP has exhibited anti-androgenic effects and estrogen-mimicking activities both in vivo and in vitro, and it has been associated with decreased TT levels in male animals and humans (54)(55)(56)(57).Cathey et al. found that TT was positively associated with MHBP and inversely associated with MEP in women during pregnancy (57).Similarly, urinary MEHP was found to be inversely associated with circulating steroid hormone levels in adult men (58).Drawing on data from NHANES 2015-2016, a study involving 1768 adults measured 16 urinary phthalate metabolites and three serum sex hormones.Among males, TT levels displayed a negative association with MnBP, MEHHP, MECPP, MEP, and MiBP.Conversely, among females, the natural logarithm-transformed estradiol exhibited an increase of 0.18 pg/mL and 0.15 pg/mL with each 1 natural logarithm-concentration rise in MEHP and MNP, respectively (30).In a cross-sectional study involving 614 women aged 45-54 years, an association was identified between phthalate exposure and an increase in estradiol levels (59).However, when comparing the relationship between SHBG and phthalate metabolites, disparities emerged between previous studies and our current research.Some studies reported that elevated levels of exposure to MECCP, MEOHP, MEHHP, and MBzP were linked to decreased SHBG levels, but not to increased TT levels (58).We attribute these variations to differences in the study population and our comprehensive approach to phthalate exposure analysis.
In the negative correlation of phthalate index with TT, MCOP contributed the most (34.6%), while in the positive correlation of phthalate index with estradiol, MiBP contributed the most with 37.0%.MEOHP accounted for the most in the positive correlation of phthalate index and SHBG with 24.3%.The correlation between TT and MCOP aligns with findings from previous research.A study focusing on midlife women revealed a negative correlation between MCOP and TT levels.(D%: -2.08%; 95% CI, -3.66 to -0.47) (60).Further, a study involving 1179 children aged 6-19 years demonstrated that MiBP, MCOP and MBzP were generally negatively associated with estrodial and TT, while positively associated with SHBG (61).However, the correlation between estrogen and MiBP shows some differences compared to previous studies.MCOP and MBzP exhibited a positive association with estrogen, while MEP, MiBP, and MEOHP demonstrated an inverse correlation with estrogen (62).In a study focused on 297 women of childbearing age, MiBP was linked to a 0.01 (95% CI: -0.01, 0.00) decrease in natural logarithm-unit levels of estradiol.Additionally, a study involving 297 girls aged 12 to 19 in the NHANES (2013-2016) found that MBzP was positively associated with SHBG, while MCNP and MECPP showed an inverse association with SHBG (62).
In this study, variations in the influence of PAEs metabolites on estrogen and SHBG were observed compared to the existing literature.To further investigate the reasons behind the aforementioned differences, we stratified the study population based on age and gender.In group "Male-mid", we observed a negative correlation between PAEs exposure and estrogen levels, while in the other subgroups, it showed a positive correlation.This finding aligns with results reported in certain earlier studies.In male adolescents, there was a negative correlation between PAEs and estrogen (b= -0.137, 95% CI: -0.263, -0.011), as well as TT (b= -0.189, 95% CI: -0.375, -0.002) (63).Data from U.S. population found that exposure to PAEs, both individuals and as a mixture, was inversely associated with estradiol levels and the ratio of TT to estradiol in children (61).In a cross-sectional investigation, it was observed that urinary DEHP metabolites and MEHP exhibited a notable positive correlation with serum estradiol levels (64).This aligns with a previous study, which indicated a positive correlation between urinary DEHP metabolites (MEHP, MEOHP, MEHHP) and estradiol levels in polyvinyl chloride production workers (65).On the contrary, DEHP was linked to reduced serum estradiol levels in postmenopausal women (36), a finding corroborated by prior studies that reported non-significant negative results (66,67).Gender-specific analyses indicated that phthalate exposure has a distinct impact on various sex steroid hormones.Exposure to phthalates (PAEs) manifests most prominently in sex hormones among middle-aged individuals, while it exhibits a more pronounced effect on BMI increase in elderly females.
To explore the association between phthalate metabolites and obesity in the population from the perspective of sex hormones, we used multiple machine learning algorithms to model the data of these algorithms from the studied cohort.Of these algorithms, RF and XGBoost exhibited the best classification performance.During the interpretation of the models, we observed that sex steroid hormones tended to perform better.Hypertensive and diabetic individuals had a higher risk of obesity, consistent with the regression results.Phthalate metabolites also contributed significantly to the model classification performance.A relatively clear negative correlation was found, during the interpretation of the XGBoost model, between MEHP and population obesity.However, a study by Desvergne et al. found that MEHP, a selective PPARg modulator, is capable of disrupting lipid and carbohydrate metabolism, thereby increasing the risk of obesity (68).For the negative association with obesity exhibited by MEHP in this study, we speculate that it is related to the multiple correlations of this class of phthalate metabolites.Considering that the hydrolysis products of DEHP are sufficiently complex, perhaps MEHP could serve as an intermediate cue for the DEHP hydrolysis process, which requires more research to elucidate the mechanisms involved.
In this study, we used principal component analysis to deal with the issue of collinearity PAEs metabolites and combined with sex hormone status to explore the association between PAEs and obesity in the population.We tried multiple classes of predictive models, and the results showed that XGBOOST and RF performed better; such integrated models have good interpretability and can fully exploit potentially meaningful associations among numerous features.It is hoped that integrated machine learning algorithms will be considered and attempted in a wide range of bioinformatic research.This study has certain limitations.Firstly, adjustment methods for urine dilution could bias conclusion.As urinary creatinine levels may be affected by factors including age, sex, and kidney disease, statistical estimation with traditional creatinine adjustment may be influenced under certain circumstances (69).The use of urinary creatinine to adjust urine dilution may bias the chemical exposure estimates and therefore the association with the health outcome as well.And we opted for single-point urine samples instead of 24-hour urine samples to assess phthalate exposure, a choice that could potentially introduce measurement errors.Replication of these findings is crucial, and further studies are warranted for validation.Secondly, we formed a composite variable by weighting 11 PAEs metabolites to examine the relationship between overall exposure and hormones.At present, we have not explored the influence of each PAEs metabolite on hormones.
In future research, exploring gender and age-specific subgroup analyses could uncover unique patterns in the link between PAEs and obesity, shedding light on different roles within these demographics.Additionally, employing advanced statistical methods, such as deep learning on longitudinal data, may provide a more comprehensive understanding of the potential relationship between PAEs exposure and obesity, capturing the long-term effects of PAEs in the development of obesity.Finally, integrating in vivo and in vitro methods, such as cell culture experiments or animal models, could contribute to a more in-depth understanding of the biological impact of PAEs on sex hormones and their mechanistic relationship with obesity.

Conclusions
Our study explores the impact of phthalate exposure on sex steroid hormone levels and the propensity for obesity in adults.The results from the PCA indicate that PC2, primarily composed of MCNP, MCOP, and MCPP, shows a positive association with obesity.Specifically, Estradiol is positively correlated with obesity, whereas TT and SHBG exhibit negative associations.Notably, combined exposure to phthalates and their metabolites leads to an increase in estradiol and SHBG levels, while decreasing TT levels.Among the machine learning algorithms utilized, the RF and XGBoost models demonstrate the highest capability to distinguish adult obesity status.Additionally, the interpretation of both models underscores the effectiveness of sex steroid hormones as predictor variables, highlighting the recommendation to consider them in future obesity-related studies.

4
FIGURE 4 Global explainability of physical state.Global explainability of the XGBoost model for the top 15 most important variables (ranked in order of importance based on the mean of the absolute SHAP values).Each dot color codes the SHAP value of each variable for each individual; yellow and purple indicate high and low values of the variable, respectively.A positive or negative SHAP value on the x-axis imply that the variable contributes to a positive or negative estimate of physical state for a given individual.(A) Global explainability of normal weight state.(B) Global explainability of overweight state.(C) Global explainability of obesity state.

TABLE 1 Continued
were instead lower (45.5 nmol/L).Remarkably, the median level of serum TT was much higher in the 'Overweight' subgroup (268.0 ng/ dL) than in the 'Normal weight' and 'Obese' subgroups.Weighted statistical descriptions of serum sex steroid hormones, urinary creatinine, and urinary phthalate metabolites are shown in Supplementary Table2.

TABLE 2
Descriptive statistics for phthalate metabolites and sex steroid hormones in different subgroups.

TABLE 3
Results of multinomial logistic regression.

TABLE 3 Continued
The results of the multinomial logistic regression are expressed as ORs and 95% CIs, with 'Normal weight' as the reference.Bolded ORs indicate statistical significance (P<0.05).The categorical variables are referenced to the selected categories and the corresponding ORs are obtained.1 Family PIR represents the ratio of family income to poverty. 2 Average daily energy intake (kcal/day).3 Principal components (PC) consisting of the original PAEs variables.

TABLE 4
Results of weighted quantile sum regression.
*Regression coefficients and 95% CI for mixed exposures in the weighted quantile sum regression.the values of mtry, nodesize, and maxnodes to further mitigate the potential for overfitting during model training.

TABLE 5
Composition of phthalate metabolites in mixed exposures.
*Original variables with the largest contribution to the Phthalates index.

TABLE 6
Classification performance of all models.