Genetic variants of ERBB4 gene and risk of gestational diabetes mellitus: a susceptibility and diagnostic nomogram study

Introduction Gestational diabetes (GDM) is one of the common complications of female pregnancy, which seriously affects the health of mothers and their offspring. So far, the etiology has not been fully clarified. Methods A case-control study was conducted to clarify the relationship between Erb-b2 receptor tyrosine kinase 4 (ERBB4) functional tag genetic variants (rs1595064, rs1595065, rs1595066 and rs6719645) and the risk of GDM. Associations between variants and GDM risk were evaluated with the odds ratios (ORs) and their corresponding 95% confidence intervals (CIs). Subsequently, the false-positive reporting probability (FPRP), multi-factor dimension reduction (MDR) and bioinformatics analysis were adopted to confirm the significant associations. A nomogram model was constructed to predict the risk of GDM. Results Association analysis demonstrated that rs1595066 TT genotype performed a protective effect on GDM risk among all subjects (TT vs. CC: adjusted OR = 0.60, 95% CI = 0.38 - 0.94, P = 0.026; TT vs. CC/CT: adjusted OR = 0.61, 95% CI = 0.40 - 0.95, P = 0.027). Meanwhile, stratified analysis showed that rs1595066 TT can also reduce the GDM risk in age > 30.09 years old, pre-pregnancy BMI > 22.23 Kg/m2, SBP ≤ 110.08 mmHg, etc subgroups. Interactions between rs1595066 and DBP (P interaction = 0.01), FPG (P interaction < 0.001) and HbA1c (P interaction < 0.001) were detected. The FPRP analysis confirmed that association between rs1595066 and GDM risk in subjects of FPG < 4.79 mmol/L (P = 0.199) is true. The MDR analysis showed that rs1595066 was the best single locus model while the 4-loci model was the best multiple factors model to predict GDM risk. Functional prediction revealed that rs1595066 may disturb the stability of miRNA-mRNA binding. The predictive nomogram model has a well consistence and acceptable discriminative ability with a diagnosed AUC of 0.813. Discussion ERBB4 variants can change an individual’s susceptibility to GDM via the interaction of gene-gene, gene-environment and changes in the regulatory effects of miRNAs on ERBB4 expression.


Introduction
Gestational diabetes mellitus (GDM) is one of the most common metabolic complications in women during pregnancy, affecting approximately 1%-14% of pregnant women worldwide (1,2).GDM typically manifests in the second and third trimesters of pregnancy.In China, the incidence of GDM is approximately 14.8% (3).GDM may cause serious complications in mothers and infants, such as gestational hypertension, polyhydramnios, spontaneous abortion, preterm birth, respiratory distress syndrome, small or large for gestational age, fetal macrosomia, and even stillbirth (4)(5)(6).GDM poses a threat to pregnant women and their offspring.
The main risk factors for GDM are often older age at pregnancy, obesity, family history of type 2 diabetes mellitus (T2DM), history of GDM, and previous poor obstetric history.Studies also support a similar pathological mechanism of insulin resistance and b-cell dysfunction in GDM and T2DM (7).However, the specific underlying causes of GDM have not been fully elucidated.Previous studies have shown that the incidence of GDM is directly proportional to the prevalence of T2DM in different ethnic groups.GDM is particularly common among certain female populations, with Asian, Native American, Hispanic, and African American women at a significantly higher risk of developing GDM than non-Hispanic Caucasians (2).Having a family history of diabetes or past history of GDM is an independent risk factor for pregnancies developing GDM, while offspring of GDM patients have a significantly higher risk of developing T2DM (8)(9)(10).The genetic background of an individual plays a crucial role in the development of GDM.
Single-nucleotide polymorphisms (SNPs) are DNA sequence variations caused by a single nucleotide either through conversion or transversion.To the best of our knowledge, SNPs that are significantly associated with diseases can exist at any location in the genome.SNPs located in different functional regions of genes may affect the function of genomic DNA components, mRNA levels produced by transcription, protein translation, and even cause changes in biological traits.SNPs contain crucial information that determines the genetic susceptibility to complex human diseases and have been widely applied in the screening of high-risk populations and disease risk prediction models (11).Currently, a series of GDM susceptibility genes and susceptible SNPs have been identified by candidate gene studies and genome-wide association studies (GWAS) (12)(13)(14)(15)(16).These SNPs either affect disease risk or are significantly associated with genetic susceptibility to GDM through interactions with age, pre-pregnancy body mass index (pre-BMI), blood sugar, and blood lipids.Therefore, it is feasible to construct a predictive model based on significantly associated SNPs and key clinical variables to predict the risk of GDM.
The nomogram model constructed based on the logistic regression algorithm can score the value level of each influencing factor involved according to the size of the regression coefficient and finally read the probability of corresponding outcome events by calculating the total score of each individual (17).The Nomogram model can intuitively express the quantitative relationship between independent risk factors in the model and visualize it in a column chart format, effectively predicting the risk of individual GDM occurrence and providing an important reference for personalized GDM prevention and control.In addition, the evaluation of the effectiveness of building a model is an important step in evaluating the fit and applicability of risk prediction models and serves as a basis for further improving the construction and selection of the best model.The evaluation of nomogram prediction models is mainly reflected through indicators such as discrimination and calibration (18).In this study, a logistic regression model was used to evaluate the predictive ability of the model by applying the area under the receiver operating characteristic curve (AUC) to the subjects.
In this study, four variants (rs1595064, rs1595065, rs1595066, and rs6719645) located at the ERBB4 gene were genotyped in 554 GDM patients and 641 normal pregnant women, to explore the genetic susceptibility of ERBB4 variants to GDM.In addition, we constructed a nomogram predictive model based on the identified significantly associated SNPs and key clinical variables for predicting GDM in the early stages of pregnancy.

Subject selection
All subjects were recruited from the Affiliated Hospital of Guilin Medical University in September 2014 to April 2016, including 554 GDM cases with a mean age of 31.55 ± 4.76 years old and 641 healthy pregnancies aged 28.83 ± 4.13 years old.GDM was identified through the diagnosis criteria proposed by the International Association of Diabetes and Pregnancy Study Group (IADPSG) based on at least one changed threshold in following 75 g oral glucose tolerance test (OGTT): fasting plasma glucose (FPG) (≥5.1 mmol/L); 1 hour (1 h) blood glucose (≥10.0 mmol/L); 2 h blood glucose (≥8.5 mmol/L).Included volunteers were required to fulfill the following requirements: have lived in Guilin area for more than 2 years, have no mutual family relationship and is a singleton pregnancy this time.Pregnancies that have developed endocrine diseases, serious systemic diseases, history of pre-pregnancy type 1 or type 2 diabetes, long-term use of drugs affecting glucose metabolism, or other pregnancy complications before pregnancy will be excluded.The screening flowchart of the subjects is shown in Figure 1.The Ethics Committee of Guilin Medical University approved this study and all participants provided informed consent.

DNA extraction, candidate variants selection, and genotyping
Genomic DNA was extracted from EDTA-treated whole blood using a DNA extraction kit (Aidlab Biotechnologies Co., Ltd, China) and stored at −80°C prior to PCR.
The Sequenom MassARRAY platform was used for candidate variants genotyping.The PCR master mix was composed of 1 ml template DNA (20 ng/ml-100 ng/ml), 1.850 ml ddH 2 O, 0.625 ml of 1.25× PCR buffer (15 mmol/L MgCl 2 ), 0.325 ml of 25 mmol/L MgCl 2 , 0.1 ml of 25 mmol/L dNTP mix, 1 ml of 0.5 mmol/L primer mix, and 0.1 ml of 5 U/ml HotStar Taq polymerase.The reaction was conducted at 94°C for 15 min, followed by 45 cycles at 94°C for 20 s, 56°C for 30 s, and 72°C for 1 min, with a final incubation at 72°C for 3 min.The following steps were shrimp alkali enzyme purification (SAP) reaction, single-base extension reaction, resin purification, and chip sampling, respectively.Original data and genotyping plots were obtained using TYPER 4.0.
However, owing to the unsuccessful genotyping of rs3748962, it will no longer be analyzed subsequently, as shown in Figure 2.

Statistical analysis
Metrological data that conform to a normal distribution are expressed as mean ± standard deviation (x ± sd), and Student's ttests were used for comparison between the two groups, while nonnormally distributed data are represented by the median (interquartile range, IQR), and non-parametric tests were used for comparison between the two groups.The chi-square (c 2 ) test was used to compare of categorical variables, and the c 2 goodness-of-fit method was used to test whether the genotype frequency distribution followed by the Hardy-Weinberg equilibrium Flowchart of recruitment of subjects in a case-control study.(HWE).Odds ratios (ORs) and their corresponding 95% confidence intervals (CIs) were calculated using binary logistic regression to evaluate the associations between variants and GDM risk.
Stratified analysis was also performed to detect the correlation between positive SNP and GDM risk in different subgroups based on the mean value of the variables.Data analysis was performed using IBM SPSS Statistics 28 for Windows (IBM Corp., Armonk, NY, USA), and two-sided test P<0.05 was considered statistically significant.
To account for chance associations, false-positive report probability (FPRP) analysis was used to assess false-positive association findings (30).The FPRP cutoff value of 0.2 and a prior probability level of 0.1 were preset to detect an OR of 2 or 0.5, which is most likely to be associated with genotypes.Only P FPRP values<0.2,were considered true.
Multifactor dimension reduction (MDR) software (version 3.0.2) was adopted to investigate the interactions between variants with a preset 100-fold cross-validation, the best testing-balanced accuracy (TBA), and cross-validation consistency (CVC) (31).As p r e d i c t e d b y t h e S N P i n f o W e b S e r v e r ( h t t p s : / / manticore.niehs.nih.gov/snpinfo/snpfunc.html) (29), three loci of the ERBB4 gene (rs1595064, rs1595065, and rs1595066) were located at the miRNA binding sites; thus, an appropriate functional analysis was conducted.
The nomogram model can read the probability of the corresponding outcome events of an individual with a total score composed of scores incorporating factors according to the logistic regression coefficient (32,33).Based on the scoring of the key clinical traits and interacting ERBB4 variants, a predictive nomogram model was constructed to assess GDM risk in Selection and genotyping of ERBB4 variants.(A) ERBB4 variant selection using the LD Tag SNP selection tool with a r 2 >0.8, (B-E) Genotyping plots of ERBB4 variants using the Sequenom MassARRAY platform.
pregnant women.The discriminant ability of the predictive model was evaluated by drawing a receiver operating characteristic (ROC) curve and calculating the area under the curve (AUC).A calibration curve was applied to assess the accuracy of the predictive model and 1,000 bootstrap resamples were performed for internal validation.
In addition, decision curve analysis (DCA) was conducted to determine the clinical utility of the model by quantifying the net benefit at different threshold probabilities.

The characteristic of subjects
The clinical and biochemical data of the patients are presented in Table 1.Compared to healthy controls, the levels of GDM cases were higher in age, pre-BMI, and the levels of SBP, DBP, TG, HbA1c, and FPG, 75 g OGTT at 1 h, and 2 h blood glucose (P<0.001).

Association of functional variants and GDM risk
After adjusting for age and pre-BMI, the logistic regression analysis demonstrated that, compared to the CC genotype, rs1595066 TT could significantly decrease GDM risk by 40% (adjusted OR = 0.60, 95% CI = 0.38-0.94,P = 0.026).In the recessive model, compared to the CC/CT genotypes, the TT genotype also had a protective effect on individual susceptibility to GDM (adjusted OR = 0.61, 95% CI = 0.40-0.95,P = 0.027).However, there was no significant association between the rs1595064, rs1595065 and rs6719645 and GDM risk observed, as shown in Table 2.

FPRP analysis
Under the preset threshold value of 0.2 and a prior probability of 0.1, the FPRP analysis was used to evaluate the noteworthy associations between rs1595066 and GDM risk.However, the only association between rs1595066 and GDM risk in subjects with FPG ≤4.79 mmol/L seems to be genuine (P FPRP = 0.199), and the other positive associations observed may be obtained by chance and should be accepted cautiously, as shown in Table 4.

Functional analysis by computer tool
As rs1595066 is a 3′ UTR variant located at the microRNA binding site, functional analysis by SNP function prediction (FuncPred) in the SNPinfo Web Server was performed in our study.The predictive results indicated that ERBB4 was regulated by a series of miRNAs (Table 6).The minimum free energy of hybridization (MFE) was altered in hsa-miR-221, hsa-miR-520g, and hsa-miR-520h when the allele of the mRNA forward sequence changed from C to T. Meanwhile, it seems that allelic status can disrupt miRNA binding sites, causing miRNAs (hsa-miR-548l, hsa-miR-106a, hsa-miR-17, and hsa-miR-20a) that bind to specific sequences of ERBB4 to no longer bind.The above findings revealed that rs1595066 may change individuals′ susceptibility to GDM by disturbing miRNA binding to ERBB4 mRNA.

Construction and validation of the nomogram prediction model
This study used DBP, FPG, HbA1c, and the recessive model of rs1595066 as independent predictive factors to construct a nomogram for GDM prediction (Figure 3A).Based on the scores of the four factors, the total score of the prediction model was obtained, and a high total score indicated a high risk of developing GDM.As shown in Figure 3B, the calibration curve was validated to be almost coincident with the ideal line, indicating a good consistency between the predicted probability and the actual observed probability with a P-value of 0.837.The area under the ROC curve (AUC) was 0.813, with a sensitivity of 80.5% and a specificity of 69.8%, demonstrating a well-accepted predictive and discriminative performance (Figure 3C).As shown in DCA (Figure 3D), when the risk threshold was between 0.12 and 0.82, that predictive nomogram model will provide higher clinical net benefit compared with the "treat all" or "treat none" strategies.

Discussion
GDM can cause adverse health events in both mothers and babies in the short-and long-term.It is an independent risk factor for the long-term risk of T2DM, metabolic syndrome, cardiovascular disease, multiple tumors, and kidney disease in mothers.Offspring may increase the risk of adverse health consequences, such as T2DM, obesity, impaired neurodevelopmental outcomes, and even eye diseases.Studies have confirmed that GDM and T2DM are both multifactorial diseases and share some epidemiological risk factors, such as obesity, insulin resistance, impairment of b-cell function, race, medical history, and family history (34,35).GDM is affected by numerous environmental exposure factors, susceptibility genes, genetic variations, and even their gene-gene and geneenvironment interaction effects, and/or effect modifications.
Currently, candidate gene studies and GWAS have identified many genetic susceptibility variants for GDM.A GWAS by Cho et al. investigated numerous SNPs known to be associated with an increased risk of GDM in Koreans (36).Our previous candidate gene association studies have also confirmed that angiotensinconverting enzyme 2 (ACE-2), retinoic acid X receptor-g (RXR-g), CDK5 regulatory subunit-associated protein 1-like 1 (CDKAL1), melatonin receptor 1 B (MTNR1B), and transcription factor 7    Establishment and validation of nomogram model for predicting GDM risk.(A) A nomogram model constructed using FPG, HbA1c, DBP, and rs1595066, based on binary logistic regression.The value of each variable was scored on a point scale from 0 to 100, after which the scores for each variable were added.That sum is located on the total points axis, which enables us to predict the probability of GDM risk; (B) the calibration curve was evaluated by Hosmer-Lemeshow goodness-of-fit testing with a p-value of 0.837; (C) the area under the receiver operating characteristic curve (ROC) and the scoring system identifying GDM; (D) the decision curve analysis (DCA) with higher clinical bet benefit in probability threshold in 0.12 to 0.82.
analog-2 (TCF7L2), which either affect GDM risk or are significantly associated with genetic susceptibility to GDM through interactions with age, pre-BMI, and blood lipids (12)(13)(14)(15).ERBB4 activates multiple downstream signaling pathways by binding to specific ligands, participating in the regulation of insulin sensitivity and glucose intake, and maintaining energy balance in the body (20-22).Studies have shown that the ERBB4 variant rs7588550 can be significantly associated with the risk of type 1 and type 2 diabetic nephropathy by affecting the ERBB4 expression (23,24).However, there have been no studies on the relationship between ERBB4 variants and GDM risk.
In the present study, four functional tag SNPs were genotyped to explore the genetic susceptibility of ERBB4 variants to GDM.Significant effects of rs1595066 on the risk of GDM were found in the Guilin population of China, especially in subjects over 30.09 years old, with BMI ≥22.23 Kg/m 2 , SBP higher than 110.08 mmHg, etc.This suggests that ERBB4 variants could also alter an individual's susceptibility to GDM by modulating key physiological and biochemical variables within the body.In addition, MDR analysis also confirmed that rs1595066 is significantly associated with the risk of GDM and is the best one-factor model for predicting GDM risk.Meanwhile, a complex gene-gene combination effect was also detected, and all four studied SNPs made the best multi-loci model for predicting GDM risk.This finding indicates that personal differences suffering in GDM may be affected by SNP-SNP or SNP-environment interactions.Based on this, the nomogram model established by the environmental factors and genetic variants (FPG, DBP, HAb1c, and rs1595066) demonstrated good calibration and discrimination ability, and DCA also showed promisingly clinical application value.This predictive model will be helpful for individuals to prevent GDM early in pregnancy.
MicroRNAs (miRNAs) can bind to specific sites in the 3′-UTR of target mRNA, playing a vital role in gene expression regulation, while SNPs of target gene miRNA-binding sites may alter the binding affinity of potential miRNAs to target mRNA (37).Thus, changes in the stability of miRNA-mRNA combinations are likely to have a profound impact on the transcriptional and post-transcriptional regulation of gene expression (38,39).Studies have suggested that the PTPRD gene miR-450a binding site SNP rs56407701 can significantly affect the susceptibility of Chinese Han women to GDM during pregnancy by regulating PTPRD expression (40), while the miR-323b-5p-binding site rs1063192 in the CDKN2B gene was significantly associated with GDM.Further analysis showed that subjects with the CC genotype exhibited increased glucose levels at 1 h and 3 h, higher insulin levels at 3 h during an OGTT, as well as lower TC and LDL-c levels compared with TT genotype carriers.In the present study, from the in silico analysis, we can see that rs1595066 is located at ERBB4 gene miRNA binding sites, and seems to have the effect of creating a new or disrupting an existing binding site, or leading to significant changes in miRNA-mRNA binding minimum free energy (MFE) under different alleles (41).Thus, the miRNA-binding site SNP may play multiple roles in regulating the body's genetic susceptibility to complex human diseases.
This study had some limitations.First, this was a hospital-based case-control study.Therefore, there will inevitably be a bias in research object selection and retrospective data bias.Second, potential confounding factors of GDM, such as smoking status, poor obstetrics, malnutrition, and socioeconomic factors, were not assessed.They are likely to interfere with the objectivity of the associations and strengths between the studied ERBB4 variants and GDM onset.Third, although a relatively large sample study design was adopted in this study, the very low frequency of genotypes tested in the studied variants may still limit the statistical performance, especially in subgroup analysis.Finally, this study only explored the potential biological functions of positively associated variants using bioinformatics tools, but molecular experimental research has not been conducted.

Conclusion
ERBB4 variants, such as rs1595066, were significantly associated with the onset of GDM.Interactions between variants or variantsclinical variables and the binding stability of miRNAs to target ERBB4 gene sequence changing may be the mechanisms by which ERBB4 gene variants affect the susceptibility of pregnant women to GDM.This nomogram model will be useful for the early prevention of GDM.

TABLE 1
Demographic and clinical characteristics in cases and controls (mean ± sd).

TABLE 2
Association analysis between ERBB4 SNPs and GDM risk.
a, Genotype distribution difference tested by c 2 ; b, Adjusted for age, BMI in logistic regression models.The symbol Rs # : Denotes variant variables.Bold values indicate that the differences are statistically significant.

TABLE 3
Stratified analysis of rs1595066 and GDM risk., Variables' stratification was based on their mean value, respectively; b, Adjusted for age, pre-BMI in logistic regression models; c, Test for multiplicative interaction obtained from logistic regression models.Bold values indicate that the differences are statistically significant. a

TABLE 4
FPRP analysis for the positive associations of rs1595066 and GDM risk.

TABLE 5
MDR analysis for ERBB4 SNPs to predict GDM risk.

TABLE 6
Prediction for miRNA binding sites and altered MFE by rs1595066.