Machine Learning to Identify Metabolic Subtypes of Obesity: A Multi-Center Study

Background and objective Clinical characteristics of obesity are heterogenous, but current classification for diagnosis is simply based on BMI or metabolic healthiness. The purpose of this study was to use machine learning to explore a more precise classification of obesity subgroups towards informing individualized therapy. Subjects and Methods In a multi-center study (n=2495), we used unsupervised machine learning to cluster patients with obesity from Shanghai Tenth People’s hospital (n=882, main cohort) based on three clinical variables (AUCs of glucose and of insulin during OGTT, and uric acid). Verification of the clustering was performed in three independent cohorts from external hospitals in China (n = 130, 137, and 289, respectively). Statistics of a healthy normal-weight cohort (n=1057) were measured as controls. Results Machine learning revealed four stable metabolic different obese clusters on each cohort. Metabolic healthy obesity (MHO, 44% patients) was characterized by a relatively healthy-metabolic status with lowest incidents of comorbidities. Hypermetabolic obesity-hyperuricemia (HMO-U, 33% patients) was characterized by extremely high uric acid and a large increased incidence of hyperuricemia (adjusted odds ratio [AOR] 73.67 to MHO, 95%CI 35.46-153.06). Hypermetabolic obesity-hyperinsulinemia (HMO-I, 8% patients) was distinguished by overcompensated insulin secretion and a large increased incidence of polycystic ovary syndrome (AOR 14.44 to MHO, 95%CI 1.75-118.99). Hypometabolic obesity (LMO, 15% patients) was characterized by extremely high glucose, decompensated insulin secretion, and the worst glucolipid metabolism (diabetes: AOR 105.85 to MHO, 95%CI 42.00-266.74; metabolic syndrome: AOR 13.50 to MHO, 95%CI 7.34-24.83). The assignment of patients in the verification cohorts to the main model showed a mean accuracy of 0.941 in all clusters. Conclusion Machine learning automatically identified four subtypes of obesity in terms of clinical characteristics on four independent patient cohorts. This proof-of-concept study provided evidence that precise diagnosis of obesity is feasible to potentially guide therapeutic planning and decisions for different subtypes of obesity. Clinical Trial Registration www.ClinicalTrials.gov, NCT04282837.

Background and objective: Clinical characteristics of obesity are heterogenous, but current classification for diagnosis is simply based on BMI or metabolic healthiness. The purpose of this study was to use machine learning to explore a more precise classification of obesity subgroups towards informing individualized therapy.

Subjects and Methods:
In a multi-center study (n=2495), we used unsupervised machine learning to cluster patients with obesity from Shanghai Tenth People's hospital (n=882, main cohort) based on three clinical variables (AUCs of glucose and of insulin during OGTT, and uric acid). Verification of the clustering was performed in three independent cohorts from external hospitals in China (n = 130, 137, and 289, respectively). Statistics of a healthy normal-weight cohort (n=1057) were measured as controls.
Results: Machine learning revealed four stable metabolic different obese clusters on each cohort. Metabolic healthy obesity (MHO, 44% patients) was characterized by a relatively healthy-metabolic status with lowest incidents of comorbidities. Hypermetabolic obesityhyperuricemia (HMO-U, 33% patients) was characterized by extremely high uric acid and a large increased incidence of hyperuricemia (adjusted odds ratio [AOR] 73.67 to MHO, 95%CI 35.46-153.06). Hypermetabolic obesity-hyperinsulinemia (HMO-I, 8% patients) was distinguished by overcompensated insulin secretion and a large increased incidence of polycystic ovary syndrome (AOR 14.44 to MHO, 95%CI 1.75-118.99). Hypometabolic obesity (LMO, 15% patients) was characterized by extremely high glucose, decompensated insulin secretion, and the worst glucolipid metabolism (diabetes: AOR 105.85 to MHO, 95%CI 42.00-266.74; metabolic syndrome: AOR 13.50 to MHO, 95%CI 7.34-24.83). The assignment of patients in the verification cohorts to the main model showed a mean accuracy of 0.941 in all clusters.

INTRODUCTION
The effects of weight loss treatments on patients with obesity vary greatly between cohorts/individuals. This may relate to the heterogeneity of the disease in terms of clinical presentation and pathogenesis (1)(2)(3)(4)(5). Conventional classification of obesity is mainly by a single dimension, e.g., body mass index (BMI) (6) or healthy/ unhealthy metabolism (7). However, the coarse classification made by BMI inaccurately reflects the complexity and heterogeneity of obesity (6,8). The metabolic healthy/unhealthy classification criteria are also controversial (7,9,10), where the patient distribution in the unhealthy group can vary substantially, ranging from 25% to 94% in reported studies (11). Towards precision treatment, a more refined metabolic classification of obesity phenotypes is highly demanded for a personalized diagnosis, aiming to identify patients at elevated risk of certain metabolic disorders or obesity comorbidities at the initial diagnostic visit. This kind of refined classification can provide a more precise diagnosis and enable more individualized preventive interventions and early treatments (12).
Artificial intelligence techniques have been quickly adopted in medicine. Data-driven machine learning modeling provides an intelligent method to mine up large and multi-dimensional data for refined classification and quantitative analysis. Applying machine learning to the obesity field is emerging but limited (13)(14)(15)(16). Recent work shows encouraging preliminary evidences that some latent phenotypes of obesity could be revealed by machine learning (14)(15)(16). However, these obesity classification paradigms lack the consideration of an important clinical factor, i.e., metabolic abnormality, are limited to using data measured from specific devices that are not routinely available in clinical practice, or are short of external validation.
The purpose of this study was to develop a refined obesity classification criterion through an unsupervised machine learning approach in the setting of a multi-center study, where four independent study cohorts (a total of 1438 patients with obesity and 1057 normal-weight controls) were used for obesity classification and validation, using three common/key clinical variables representing a multi-dimensional characterization of obesity progression in terms of metabolism, hormone, inflammation, and oxidation (17).

Study Design
We conducted a multicenter study (ClinicalTrials No. NCT04282837) with approval from a local ethical committee and an institutional review board of the participating institutions. As shown in Figure 1, we retrospectively collected four patients cohorts (BMI ≥ 24kg/m 2 according to the WHO criteria for overweight/ obesity (6)) and one normal-weight control cohort from four different hospitals in P.R. China. We used one patient cohort (main cohort) for model learning and the rest three patient cohorts (verification cohorts) for verification. Detailed data analyses of the identified obesity subgroups were performed on the main cohort with a comparison to the control cohort.

Study Sites and Population
Patients' inclusion and exclusion criteria were shown in the Supplemental Appendix. From January 2010 to December 2019, at the Endocrinology and Metabolic Center of Shanghai Tenth People's Hospital (main site), 2094 patients (the full cohort at the main site) with obesity were included, and 882 patients were included in Cohort-1 (main cohort) after exclusion (400 men and 482 women, median age 29 years, median BMI 35.9 kg/m 2 ). Cohort-1 consisted of two sub-cohorts: Cohort-1A included 632 outpatients with obesity (296 men and 336 women, median age 28 years, median BMI 33.6 kg/m 2 ). Most of these patients had relatively lower BMIs, mild to moderate obesity comorbidities, and were treated mainly with lifestyle interventions and weightloss drugs. Cohort-1B included 250 inpatients with morbid obesity (104 men and 146 women, median age 31 years, median BMI 39.3 kg/m 2 ) who were candidates for bariatric surgery according to the American Society of Metabolic and Bariatric Surgery/The Obesity Society/American Association of Clinical Endocrinologists guidelines (18) that are adjusted for Chinese patients. Pre-surgical examinations were performed before bariatric surgeries. Three follow-ups were conducted at 3, 6, and 12 months after surgery.
For the three verification cohorts, Cohort-2 included 130 patients with morbid obesity from Nanjing Drum Tower Hospital (60 men and 70 women, median age 30 years, median BMI 38.7 kg/m 2 ). Cohort-3 included 137 patients with morbid obesity from Chengdu Third People's Hospital (50 men and 87 women, median age 29 years, median BMI 38.0 kg/m 2 ). Cohort-4 included 289 patients with moderate to morbid obesity from Shanghai East Hospital (81 men and 208 women, median age 30 years, median BMI 36.8 kg/m 2 ). In addition, we collected a normal-weight healthy cohort (Cohort-0, n=1057) along with the main cohort from the main site, to use as controls (223 men and 834 women, median age 30 years, median BMI 21.2 kg/m 2 ). These participants presented to clinics for routine health examinations. Key patient characteristics are shown in Table 1.
Details on the measurement, calculation, and definition of endocrinological and metabolic disorders for each cohort were included in the Supplemental Appendix.

Key Clinical Variables Selection for Machine Learning
Based on the consensus of our study team consisting of multiple expert physicians in obesity/endocrinology, the clinical variables we used to build classification models should be those related to metabolism, hormones, inflammation, and antioxidation, which represent the underlying progression mechanisms of obesity comorbidities. We selected key clinical variables out of hundreds of metabolic parameters based on the following criteria: (i) essential to characterize obesity, (ii) routinely acquired/measured in clinics, and (iii) easy to interpret with a physical meaning. We also intended to select a small number of variables to improve the generalizability of the classification models. Based on these criteria, we performed a data-driven experiment to select potential variables and optimal model parameters for the classification.

Unsupervised Modeling by Clustering Algorithms
We used and compared two clustering algorithms [i.e., k-means (19) and two-step (20)] for machine learning. Clustering was implemented by SPSS Modeler version 18 (IBM, Chicago, USA).
All variables were normalized (mean value of 0 and standard deviation [SD] of 1) before the cluster analysis. The k-means clustering was implemented with different k values (maximum iterations of 30 and change tolerance of 0.00001) and the one with minimum silhouette widths was used in the end. In the twostep clustering, the first step estimates the optimal number of clusters on the basis of silhouette width and the second step performs hierarchical clustering using log-likelihood as a distance measure and Schwarz's Bayesian criterion for clustering. Considering the notable differences of key patient characteristics induced by patient sex, we built a model with two sub-models that were separately trained on female and male patients, and then the classification results of the sub-models were pooled for further analysis.

Clustering on Main Cohort and Verification on Three Other Cohorts
Clustering algorithms were first applied to the main cohort, Cohort-1, and the resulting clusters were used as the main classification model. Verification was performed by applying the same clustering algorithms to Cohort-1A, Cohort-1B, Cohort-2, Cohort-3, and Cohort-4, separately, and the resulting clusters of each verification cohort were compared to the clusters of the main model in terms of patients' distribution percentages and characteristics in each cluster. The denotation of clusters was assigned referring to the characteristics of the three classification variables. In order to further measure the generalizability of the classification models, we also assigned patients in each verification cohort to the clusters derived from the main model, according to the similarity of a patient's characteristics to each of the clusters in the main model. The similarity was calculated as their Euclidian distance (for k-means clustering) or log-likelihood distance (for two-step clustering) from the nearest cluster center derived from the main model. Then sensitivity, specificity, and accuracy for clustering, as well as the inter-cluster Jaccard coefficients (21), were calculated.

Missing Value Imputation
The area under the curve (AUC) of glucose (glucose AUC ) and insulin (insulin AUC ) were calculated using the trapezoidal rule at four data points of 0, 30, 60, and 120 min during oral glucose tolerance test (OGTT) for patients in Cohort-1 and Cohort-2. However, since the four-time-points OGTT was not a routine measurement for patients in Cohort-3 and Cohort-4 (which is not uncommon in certain hospitals), certain time-points of the OGTT data were missing for these patients. Thus, we built a group of linear regression models using the complete OGTT data available in Cohort-1, and employed one to three time-points of OGTT to estimate the four-time-points glucose AUC and insulin AUC . The linear regression models for the estimation of glucose AUC and insulin AUC were trained and tested in 70% and 30%, respectively, of the data from Cohort-1 using stepwise method. The F test was performed with P < 0.05 for inclusion and P > 0.1 for exclusion, and outlier tolerance of 0.0001. The estimates of the glucose AUC and insulin AUC showed an average adjusted R in the test set of 0.954 (range: 0.860-0.996) for glucose AUC and 0.873 (range: 0.643-0.984) for insulin AUC (Tables S1 and S2).

Statistical Analysis
Clinical implications of variables related to metabolism and morbidity were compared with respect to the four obesity clusters. Continuous variables were expressed as the median (interquartile range 25-75%), since most of them were a skewed distribution. Variables not normally distributed were logarithmically or square root transformed before statistical analysis, which was performed with SPSS version 26 (IBM, Chicago, USA). Differences for continuous variables were assessed by performing ANOVA or ANCOVA, as appropriate. Bonferroni correction was used for the post hoc analysis. Differences in ratio variables were assessed by Chi-square test. To identify the odds of obesity comorbidities in different subgroups, a binary logistic regression analysis was performed, and odd ratio (OR) or OR adjusted (AOR) for sex and age and the corresponding 95% confidence interval (CI) were calculated. For all analyses, p values were two-tailed and p < 0.05 was considered statistically significant.

Baseline Obesity Classification
Of the 2094 patients (i.e., the full cohort at the main site), 300 (14%) and 1794 (86%) were overweight and obese, respectively, in terms of BMI. The proportions of metabolic unhealthy patients varied substantially, where 36-93% overweight and 55-97% obese patients were observed according to different criteria (Figure S1A-D). In the BMI-based categorization, while the incidence of several metabolic diseases (e.g., hypertension, metabolic syndromes, and hyperuricemia) increased gently along with the BMI categories, there were no obvious differences in the clinical characteristics among the four BMI subgroups (Figure S1E-F).

Obesity Subtypes Identified by Machine Learning/Clustering
Data-driven experiments selected the following three variables as key clustering factors: 1) glucose AUC , reflecting the severity of disturbances in energy metabolism; 2) insulin AUC , reflecting the compensatory balance of hormones to the increased somatogenic need; and 3) uric acid (UA), reflecting the inflammation and oxidation in the body. In Cohort-1 (the main cohort), k-means yielded four distinct clusters with minimum silhouette widths ( Figure 2A). The two-step clustering showed similar clustering results as k-means (Jaccard similarity 0.831, Figure S2A). The cluster centers were shown in Table 2 and Table S3. Here we report the results on k-means only (see Supplemental Appendix for two-step results). The four clusters that resulted from Cohort-1 were as follows ( Figure 3A HMO-I, and LMO can be grouped with a single notion of metabolic unhealthy obesity (MUO) in comparison to MHO. Figure 3B showed the glucose and insulin curves during OGTT across the four clusters.

Verification of the Obesity Subtyping on Other Cohorts
In each verification cohort, there were also four distinct clusters generated with similar patient distributions and characteristics to the corresponding clusters of the main model ( Figure 2A). In Cohort-4, the proportions of patients clustered into LMO and HMO-I were relatively small, which may potentially have to do with the estimation of missing values for glucose AUC and insulin AUC , respectively.
As shown in Figure 2B and Table 3, in terms of the assignment of patients into the clusters derived from the main model, the mean assignment accuracy was 0.941, ranging from 0.908 to 0.967; the mean Jaccard similarity coefficient was 0.882, ranging from 0.815 to 0.934 for assignment-generated clusters vs. independent k-means-generated clusters, except for two notable  and reasonable differences as explained in the following: (i) A higher proportion of patients were assigned to MUO in Cohort-1B in comparison to Cohort-1A, which actually reflects the difference of patients' severity of obesity in the two subcohorts. (ii) A higher proportion of patients were assigned to HMO-U in Cohort-2, which reflects the higher average UA in this cohort ( Table 1). Similar results were also observed with the two-step clustering methods ( Figure S2, Table S4).

Metabolic Feature Analysis With Respect to the Four Subtypes
As shown in Table 4

Comorbidity Analysis With Respect to the Four Subtypes
Comorbidity analyses were shown in Figure 4 and Table 5.  Figure S4. The summary of the clinical characteristics and suggested treatments with respect to the four subgroups were shown in Table 6.

DISCUSSION
We leveraged machine learning to identify a refined classification of patients with obesity from multiple hospitals. The classification yielded four metabolically distinct clusters (i.e., MHO, HMO-U, HMO-I, and LMO), which showed a high degree of agreement/ reproducibility among the four independent cohorts. This multidimensional classification provides an enhanced capacity over traditional healthy/unhealthy obesity and BMI categorizations to reflect the complexity and heterogeneity of metabolic disorders, thereby having the potential to enable more precise preventions, diagnoses, and therapy planning. To the best of our knowledge, this is the first study to apply unsupervised machine learning to common clinical variables to refine metabolic classification of obesity in a multi-center setting. The algorithms for unsupervised clustering of data are critical for a machine learning study. We used independently two mature algorithms, i.e., k-means and two-step, and observed highly similar classification results, indicating our data clustering is relatively robust. The assignment of patients in the verification cohorts to the main model led to high Jaccard similarity coefficients (range 0.815-0.934), indicating our classification effects are stable when tested on the multiple cohorts Jaccard similarity of greater than 0.750 is considered as a stable clustering (21).
We used three common clinical variables for classification. While this is a relatively large granularity for clustering, the three variables reflect important dimensions (i.e., metabolism, hormone, as well as inflammation and oxidation) in characterizing obesity progression and are critical in providing important interpretation of etiopathogenesis to guide therapies. Four obesity subgroups were yielded from the three variables, and our analyses have revealed clinical insights associated with each subgroup to help us better understand obesity and guide clinical treatment planning. If more clinical variables are used for classification, more refined clustering may be identified. While that we emphasize the importance of applicability and generalizability of an obesity classification modelmore variables for classification could lead to overfitting and may reduce applicability on patients without complete clinical variables. This study using three variables showed promising generalizability and, in future work, further evaluation on a different number of clinical variables on substantially larger cohorts are warranted. In addition, our study used clinically routinely acquired variables for classifications, which can enable a broader utility of such classification models. It was different from previous studies that used lifestyle data (14,16) or data acquired using specific research devices (e.g., hypothalamic blood flow, gastric empty rate, and energy expenditure) (15). The limited body of previous work (14-16) also did not consider the important information on metabolic abnormality. The four subgroups in our study have important implications on treatments. MHO showed a relative healthy metabolic statues and hormone balance, where patients should be motivated to achieve a normal weight for long-term considerations, as risks of metabolic disorders are still higher than normal-weight subjects and may increase over time (22). In contrast, LMO showed the most severe central obesity due to the severe hepatic insulin resistance (23), decompensated insulin secretion, and resultant poor metabolism (diabetes, dyslipidemia, and carotid lipid deposition). These patients may be more vulnerable to atherosclerosis and cardiometabolic diseases (24). Suggested treatments may include management of glucolipid metabolism and restoration of pancreatic b-cell function. HMO-U showed the worst UA metabolism but still relatively healthy glucolipid metabolism. For these patients, UA regulation may be an effective therapy, but note that overtreatment may attenuate the benefits of antioxidation by UA (25). HMO-I showed the worst hepatic and peripheral insulin sensitivity but overcompensated insulin secretion, which to some extent balanced the glucolipid metabolism. Severe insulin resistance may have resulted in the highest incidence of acanthosis nigricans (26) and PCOS (27). Therapies may be directed to relieve hyperinsulinemia for these patients.
Our study has some limitations. First, all patients are of Chinese in China. The applicability of our classification models to patients of other ethnicities requires further evaluation. Second, since this is a multi-center retrospective study, there may be noticeable differences in measurements and lab tests across different institutions. Third, the data imputation for patients with missing OGTT time point data may have introduced inaccurate estimates, while consistent classification results have been observed when using the imputed data. Finally, we acknowledge that this is a proof-of-concept study of using machine learning to explore refined subtype classification of obesity. In future work, further analysis using pooled data of the multi-center cohorts with random data split for training and testing/verification may further evaluate the model's performance. The more important research that we are planning to follow up is to validate the clinical value of the identified subtypes in a prospective setting, that is, to evaluate the treatment and adverse effects of both surgical and non-surgical therapies with respect to the four obesity subtypes. This study provides feasibility data and premises to design future clinical evaluation studies.
In summary, this multi-center retrospective study identified a refined classification of obesity subtypes by mining the clinical characteristics using a machine learning approach. The four subtypes  appeared to be consistent across four independent patient cohorts. This proof-of-concept study provided evidence that precise diagnosis of obesity is feasible, which has a great potential to guide therapeutic planning and decisions for different subtypes of obesity. Prospective studies are warranted to further evaluate the findings of this study.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.  Focus on the management of glucolipid metabolism and restoration of pancreatic b-cell function BMI, body mass index; HMO-I, hypermetabolic obesity hyperinsulinemia subtype; HMO-U, hypermetabolic obesity hyperuricemia subtype; LMO, hypometabolic obesity; MHO, metabolic healthy obesity; NW, normal-weight control; PCO, polycystic ovary; PCOS, polycystic ovary syndrome.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Ethical Committee of Shanghai Tenth People's Hospital. The patients/participants provided their written informed consent to participate in this study.