Prediction model for gestational diabetes mellitus using the XG Boost machine learning algorithm

Objective To develop the extreme gradient boosting (XG Boost) machine learning (ML) model for predicting gestational diabetes mellitus (GDM) compared with a model using the traditional logistic regression (LR) method. Methods A case–control study was carried out among pregnant women, who were assigned to either the training set (these women were recruited from August 2019 to November 2019) or the testing set (these women were recruited in August 2020). We applied the XG Boost ML model approach to identify the best set of predictors out of a set of 33 variables. The performance of the prediction model was determined by using the area under the receiver operating characteristic (ROC) curve (AUC) to assess discrimination, and the Hosmer–Lemeshow (HL) test and calibration plots to assess calibration. Decision curve analysis (DCA) was introduced to evaluate the clinical use of each of the models. Results A total of 735 and 190 pregnant women were included in the training and testing sets, respectively. The XG Boost ML model, which included 20 predictors, resulted in an AUC of 0.946 and yielded a predictive accuracy of 0.875, whereas the model using a traditional LR included four predictors and presented an AUC of 0.752 and yielded a predictive accuracy of 0.786. The HL test and calibration plots show that the two models have good calibration. DCA indicated that treating only those women whom the XG Boost ML model predicts are at risk of GDM confers a net benefit compared with treating all women or treating none. Conclusions The established model using XG Boost ML showed better predictive ability than the traditional LR model in terms of discrimination. The calibration performance of both models was good.


Introduction
Gestational diabetes mellitus (GDM) is the most common metabolic complication to occur during pregnancy and is classed as a mild form of diabetes. It is normally diagnosed at 24-28 weeks' gestation, and is characterized by hyperglycemia (1). The global prevalence of hyperglycemia during pregnancy is approximately 15.8%, and over 80% of cases are due to GDM (2). With the growth of the economy and the transition to a more sedentary lifestyle, the prevalence of GDM in Chinese women continues to increase, and ranges from 14.8% to 24.24% (3)(4)(5). Over time, China has loosened its fertility restrictions, most recently with the replacement of the two-child policy with the three-child policy. Thus, this increase in GDM prevalence can be attributed mainly to the rising rates of pregnant women who are of advanced maternal age.
Hyperglycemia brings about both short-and long-term outcomes, resulting in a significant impact on the health of both pregnant women and their offspring. Several studies in mothers have reported that GDM is associated with adverse pregnancy complications, including preeclampsia, the need for delivery by cesarean section, as well as type 2 diabetes and cardiovascular disease after delivery (6). GDM can also affect their offspring, being associated with a higher prevalence of macrosomia, shoulder dystocia, birth trauma, stillbirth, and, in later life, obesity and metabolic syndrome (7). According to the Developmental Origins of Health and Disease framework for GDM, exposure to intrauterine hyperglycemia before GDM screening at 24-28 weeks' gestation is associated with the abnormal growth and development of the fetus (8). which includes smaller fetuses at 24 weeks' gestation increased abdominal circumference growth rates (9), and hyperinsulinemia (6). Lifestyle interventions during early pregnancy can reduce the risk of GDM by 18%-62% (10,11), but are not effective if initiated at a later stage (12). Thus, we concluded that a hysteretic diagnosis of GDM in the second or third trimester of pregnancy might lead to a narrow time frame for sufficient intervention. Therefore, it is imperative to establish a prediction model for women at risk of GDM to provide early intervention prior to the diagnosis of the condition at 24-28 weeks' gestation.
There is accumulating evidence indicating that models based on multiple risk factors can improve predictive abilities (9). Machine learning (ML) algorithms, as an artificial intelligence technology, have the advantage of presenting high-dimensional predictors constructed to model relatively small datasets with reduced overfit, and demonstrate a powerful selflearning ability to find complex relationships between predictors (9, 13). As major predictors of GDM, demographic characteristics and clinical features contribute to improving the predictive ability of models combined with biomarkers (14,15). Consequently, we aim to present the results of prediction models for GDM based on demographic characteristics, clinical features, and laboratory parameters to make full use of the available variables. In addition, we compare and evaluate the performance of ML and logistic regression (LR) models to show the advantages of each.

Participants
This case-control study of pregnant women was conducted at the Shenzhen Hospital of the Southern Medical University, Shenzhen, China.
Pregnant women were eligible to participate in the study if they met all of the following inclusion criteria: (1) they were aged ≥ 18 years; (2) they had undergone all routine antenatal assessments; (3) they had taken a 75-g oral glucose tolerance test (OGTT) at 24-28 weeks' gestation; and (4) they were willing to participate in this study and to sign the informed consent form. The exclusion criteria were as follows: (1) pre-existing type 1 or type 2 diabetes; (2) a history of severe diseases, such as hypertension or heart disease; and (3) taking medications affecting insulin and blood glucose levels.

Data collection
Information on participants' demographic characteristics was collected by using a structured questionnaire. Clinical features and laboratory parameters in the first trimester were collected from the hospital's electronic medical record system (EMRS).

Diagnosis of GDM
GDM was diagnosed at 24-28 weeks' gestation when any one of the 75-g OGTT values met or exceeded 5.1 mmol/L at 0 h, 10.0 mmol/L at 1 h, and 8.5 mmol/L at 2 h, in accordance with the recommendations set out at the International Association of Diabetes and Pregnancy Study Groups Consensus Panel 2010 (IADPSG).

Statistical analysis
All analyses were performed using IBM ® SPSS ® Statistics version 26.0 software (IBM Corporation, Armonk, NY, USA). Continuous variables of two groups were expressed as means and standard deviations, and analyzed by Student's t-test for normally distributed variables. Categorical variables were described as frequencies (percentages), and evaluated by a chi-squared test. Test results with a p-value of less than 0.05 were considered statistically significant. Results from these tests, clinically relevant findings, and previous literature were used to preliminarily screen the set of variables for potentially meaningful predictors of GDM. Multiple imputations were used to deal with missing data, to avoid selection bias. The prediction model using LR was carried out in R (The R Foundation, Vienna, Austria) using the rms package, and XG Boost ML was carried out by R package (XG Boost, XG Boost Explainer, and MLR).

Prediction models
In this study, we included variables with a p-value of < 0.05 in the univariate analysis, whereas variables indicated in previous literature and clinically meaningful variables were included in the LR analysis (stepwise). ML can present novel or complex combinations of multidomain variables, and also has features that weigh variable importance and reduce overfit (16). Therefore, we incorporated all variables of the univariate analysis into the model using XG Boost ML.
The model for GDM, trained on the training set, was validated in the testing set with the optimal hyperparameters using 10-fold cross-validation.

Model evaluation
The discrimination of the models was assessed using the receiver operating characteristic (ROC) curves and the area under the ROC curve (AUC). The calibration plots and the Hosmer-Lemeshow (HL) test were used to evaluate the calibration of each model. Decision curve analysis (DCA) was introduced to evaluate the clinical use of the models.

Participant characteristics
In total, 925 pregnant women were included in this study (735 in the training set; 190 in the testing set). The alternative 33 variables were collected for each pregnant woman. Table 1 shows the univariate analysis of the demographic characteristics, clinical features, and laboratory parameters of participants with GDM (cases) and participants without GDM (controls) in the training set. Participants with GDM were significantly older and had higher prepregnancy body mass index (BMI) and mean arterial pressure (MAP) than participants without GDM. The average time since the last pregnancy was also longer in this group than in the control group. The percentage of women who had previously GDM and the number with a family history of diabetes mellitus were also significantly higher in the GDM group, but participants in this group were also markedly younger at menarche than those in the non-GDM group (all p-values were < 0.05). Laboratory parameters, including platelet count, white blood cell count, and the levels of glucose in urine, ketone in urine, alanine aminotransferase, thyroid hormone T 3 , fasting plasma glucose, and glycated hemoglobin (HbA 1c ), were also higher in women with GDM than in control participants. The demographic characteristics, clinical features, and laboratory parameters of participants in the training and testing sets are compared in Table 2. Good consistency in the data between the training data set and the testing data set is shown for the majority of the variables.

Predictors of models
Four predictors, previous GDM, age, HbA 1c level, and MAP, were used to construct the predictive model using LR (Table 3). Twenty predictors were finally included to build the model using XG Boost ML. Figure 1 shows the relative importance of the 20 variables included in the predictive model for GDM using XG Boost ML.

Accuracy of prediction models
For the data from the training set, the AUC of the prediction model for GDM using stepwise LR is 0.752, whereas the AUC of the model using XG Boost ML is 0.946; these are shown in Figures 2, 3, respectively. The accuracy of the two models for the data from the training set is 0.786 and 0.875, respectively. The specificity of the model using XG Boost ML was higher than that of the model using traditional LR for the data from both the training and testing sets. However, the sensitivity of the model using XG Boost ML was lower than that of the model using traditional LR, as shown clearly in Table 4.

Calibration of different models
The calibration plots demonstrate the consistency between the predicted values and the real outcomes, which are shown in Figures 4-7. The Hosmer-Lemeshow (HL) test p-values were 0.288 and 0.402 for the training set and testing sets, respectively, in the model using LR, and 0.831 and 0.556 for the training set and testing sets, respectively, in the model using XG Boost ML.

Clinical use
The DCA results for the two models are presented in Figures 8,9. Compared with treating all women and none of the women, the prediction models using LR provide a net benefit between a threshold probability of 6%-63% and 87%-90%. The DCA plot indicated good positive net benefits in the model using XG Boost ML with a threshold probability of between 5% and 92%.

Discussion
Early screening and prediction of the likelihood of pregnant women developing GDM are imperative to the prevention and treatment of this condition (17). We compared two models and found that XG Boost ML models had better performance in terms of discrimination and achieved a larger AUC, which was as high as 0.946. Our results are concordant with a previous study showing that ML algorithms can be more accurate than traditional LR methods (18). The HL test shows that the observed probability is largely consistent with the predicted probability, which implies that both models had good calibration.
Given evidence indicates that, in the situation of no overfitting, a prediction model with a greater number of predictors has an improved prediction ability compared with a model with fewer predictors (19). Similarly, in our study, the XG Boost ML model presents 20 predictors with a higher predictive accuracy than the LR model with four predictors. Furthermore, linear models, such as LR models, highlight a clear linear contribution of each variable for GDM models, making them available for clinical implementation, whereas XG Boost ML models can weight the importance of factors and assess their complex non-linear relationships by boosting, integrating multiple factors, assess their complex non-linear relationships by boosting, and clearly demonstrate the relative contribution of each variable to GDM (18).
A recent relative study has indicated that hematologic and biochemical parameters measured during routine antenatal examination can be used in ML models to predict GDM (20). However, it has not until now been possible to weigh the relative importance of each variable. In this study we have shown that it is possible quantify the likelihood of individual independent risk factors leading to GDM. Another related study (18) developed a ML prediction model based on a large population and weighed the  importance of risk factors, but there was no exploration of biomarkers in early pregnancy in this study; by contrast, this was explored in our study.
In the two models, previous GDM was the most classical predictor, and LR analysis showed that pregnant women with previous GDM are 7.8 times more likely to develop GDM (OR = 7.822; p < 0.05). Furthermore, other model studies have shown (9, 21) that previous GDM increases the risk of GDM in a current pregnancy 13.7-to 21.1-fold (p < 0.05). One review also found that having GDM in a previous pregnancy is the strongest risk factor for GDM, with reported recurrence rates of up to 84% (22). In addition to previous GDM, age, HbA 1c level, and MAP were considered independent factors for GDM in the LR analysis. Previously, age and HbA 1c level have been strongly associated with an elevated risk of GDM (17,21). With increasing age, the fertility and organ function of pregnant women are reduced, and insulin sensitivity and pancreatic b-cell function are decreased, which in turn lead to insulin resistance (IR) and an increased risk of hyperglycemia. HbA 1c level, an identified risk factor, can diagnose the severity of GDM and reflects the average blood glucose level in the past 2 to 3 months, which is significantly related to the degree of IR (23). A previous study revealed that HbA 1c level is a reliable predictor of GDM(OR = 3.11; p < 0.05)and that HbA 1c levels are elevated in women with GDM, although still within the normal range (24), which is consistent with our results. MAP was calculated from one-third systolic blood pressure (SBP) and two-thirds diastolic blood pressure (DBP), both of which are considered to be predictors of GDM (18,25,26). MAP can probably predict GDM because IR is the involved in the pathogenesis of both gestational hypertension (GH) and GDM, and the level of MAP, which can reflect the severity of GH, also stimulates a certain degree of GDM (27).
Another 16 predictors, comprising pre-pregnancy BMI and 15 laboratory parameters routinely measured during antenatal assessment, were confirmed as risk factors by XG Boost ML. Prepregnancy BMI, despite being considered an established predictor of GDM (28), has the lowest predictive ability, probably because of the low frequency of overweight and obesity (among our sample affecting approximately 11.700% and 14.700% of women in the training and testing sets, respectively). Another explanation is that the relationship between BMI and GDM is complex, with women with GDM and a The relative importance of the 20 variables included in the XG Boost ML model for GDM in the training set. BMI, body mass index; GDM, gestational diabetes mellitus; HbA 1c , glycated hemoglobin; XG Boost ML, extreme gradient boosting (XG) machine learning (ML).

FIGURE 2
The AUC of the prediction model for GDM by stepwise LR. AUC, area under the receiver operating characteristic curve; GDM, gestational diabetes mellitus; LR, logistic regression. The AUC of the prediction model for GDM by XG Boost ML. AUC, area under the receiver operating characteristic curve; GDM, gestational diabetes mellitus; XG Boost ML, extreme gradient boosting (XG) machine learning (ML).
high BMI having IR and women with GDM and a low BMI having defective insulin secretion (29).
Existing studies have identified that several laboratory parameters are independent predictors of GDM, such as glycemic markers (e.g., fasting glucose and HBA 1c levels), alanine aminotransferase (ALT) levels, and thyroid function (levels of the thyroid hormones T 3 and T 4 ) (9,18,20); all of these are available clinically in the first trimester of pregnancy. The possible link between these variables and GDM could be explained by the fact that hyperglycemia can change the hemodynamics of the body, and that these variables can reflect the inflammation and immune responses that are highly associated with IR (30). Prior research has identified several blood potential biomarkers, such as platelet count, white blood cell count, and red blood cell count, which were positively correlated with the development of GDM (30). Consistent with a previous study (9), high T 3 and low T 4 levels were identified as being predictors of GDM in our study, strongly confirming the existence of a close relationship between thyroid function and GDM. ALT and AST (aspartate aminotransferase), as markers of hepatocellular damage, were also examined as predictors of GDM in our study. The pathogenesis of GDM is linked with IR, which may in turn be caused by mild ALT and AST elevations (15,31). In summary, the laboratory parameters support the hypothesis that pregnancy blood routine examination is conducive to GDM screening.

Limitations
This study has several limitations. Firstly, this study has limited sample size. Secondly, the fact is that a time external verification was used to verify the extrapolation in a single center. Lastly, there is a lack of complete data for all laboratory parameters and a comparison of multiple ML models. Variables such as clinical features and laboratory parameters are based on retrospective data from the EMRS that may have inevitable selection biases. Further multicenter prospective studies should be carried out to update and validate the models based on a large, population-based sample. Models  The calibration plots of the training set by LR. LR, logistic regression. The calibration plots of the testing set by LR. LR, logistic regression.

Conclusion
In conclusion, a model with four predictors and using traditional LR and a model with 20 predictors and using XG Boost ML were successfully built and used to predict GDM. Compared with traditional LR, the XG Boost ML model can improve the discrimination of a prediction model for GDM and make full use of more predictors. The common laboratory parameters from pregnant women's antenatal assessments can be used to predict the likelihood of their developing GDM.

Data availability statement
The datasets presented in this article are not readily available because the generated datasets belong to hospital. Requests to access the datasets should be directed to XH, 731538045@qq.com. The calibration plots of the training set by XG Boost ML. XG Boost ML, extreme gradient boosting (XG) machine learning (ML). The calibration plots of the testing set by XG Boost ML. XG Boost ML, extreme gradient boosting (XG) machine learning (ML). The DCA of the model using LR. DCA, decision curve analysis; LR, logistic regression. The DCA of the model using XG Boost ML. DCA, decision curve analysis; XG Boost ML, extreme gradient boosting (XG) machine learning (ML).

Ethics statement
This study was approved by the corresponding Hospital Ethics Committee (No.: NYSZYYEC20200032). The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author contributions
XH and XiaolH contributed to the conception and design of the study. XH organized the database. XH and YY performed the statistical analysis. XH wrote the first draft of the manuscript. XH, XiaolH, YY, and JW wrote sections of the manuscript. All authors contributed to the article and approved the submitted version.