Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Endocrinol., 18 July 2025

Sec. Reproduction

Volume 16 - 2025 | https://doi.org/10.3389/fendo.2025.1544724

This article is part of the Research TopicLifestyle and Environmental Factors and Human FertilityView all 24 articles

Machine learning algorithm based on combined clinical indicators for the prediction of infertility and pregnancy loss

Rui ZhangRui ZhangYuanbing GuoYuanbing GuoXiaonan ZhaiXiaonan ZhaiJuan WangJuan WangXiaoyan HaoXiaoyan HaoLiu YangLiu YangLei ZhouLei ZhouJiawei GaoJiawei GaoJiayun Liu*Jiayun Liu*
  • Department of Clinical Laboratory Medicine, Xijing Hospital, Fourth Military Medical University, Xi’an, China

Background and objectives: Diagnosis and treatment of infertility and pregnancy loss are complicated by various factors. We aimed to develop a simpler, more efficient system for diagnosing infertility and pregnancy loss.

Methods: This study included 333 female patients with infertility and 319 female patients with pregnancy loss, as well as 327 healthy individuals for modeling; 1264 female patients with infertility and 1030 female patients with pregnancy loss, as well as 1059 healthy individuals for validating the models. The average age and basic information were matched between the groups. Three methods were used for screening 100+ clinical indicators, and five machine learning algorithms were used to develop and evaluate diagnostic models based on the most relevant indicators.

Results: Multivariate analysis revealed significant differences in several factors between the patients and the control group. 25-hydroxy vitamin D3 (25OHVD3) was the factor exhibiting the most prominent difference, and most patients presented deficiency in the levels of this vitamin. 25OHVD3 is associated with blood lipids, hormones, thyroid function, human papillomavirus infection, hepatitis B infection, sedimentation rate, renal function, coagulation function, and amino acids in patients with infertility. The model for infertility diagnosis included eleven factors and exhibited area under the curve (AUC), sensitivity, and specificity values higher than 0.958, 86.52%, and 91.23%, respectively. The model for potential pregnancy loss was also developed using five machine learning algorithms and was based on 7 indicators. According to the results obtained from the testing set, the sensitivity was higher than 92.02%, the specificity was higher than 95.18%, the accuracy was higher than 94.34%, and the AUC was higher than 0.972.

Conclusion: The simplicity, good diagnostic performance, and high sensitivity of the models presented here may facilitate early detection, treatment, and prevention of infertility and pregnancy loss.

1 Introduction

Infertility is a condition defined as being unable to conceive after having regular unprotected sex for at least one year (1). Previous reports have indicated that 17% of women and 9.4% of men in the USA have used an infertility service (2). Infertility not only affects the mental health of the afflicted individual but is also a potential risk factor for many cancers (3). The causes of female infertility are varied, including physiological, psychological, behavioral, and genetic factors (4). The process of diagnosing female infertility is therefore complex and time-consuming (5). The first step is to evaluate the medical history of the patient and to perform a physical examination, followed by laboratory tests (6). The function of the ovaries, fallopian tubes, and uterus is then evaluated, and the clinician finally makes a diagnosis by combining all of this information and relying on his or her experience (7). It takes at least 1–2 years from trying to conceive for about 1 year to confirm the diagnosis of infertility in the hospital. Therefore, early diagnosis of the condition is essential to shorten the time to successful conception. While imaging plays an indispensable role in assessing anatomic abnormalities, tubal obstruction, and ovarian reserve, it is not suitable for large-scale screening (8). Moreover, the diagnosis and treatment of infertility are complicated by various factors that must be considered (9). Therefore, there is a need to establish a simpler clinical screening index to be used for early prevention and intervention in cases of female infertility.

Pregnancy loss is defined as the natural termination of a pregnancy prior to fetal viability, and it includes spontaneous, missed, and incomplete abortions, as well as a molar pregnancy (10). Pregnancy loss is a common problem among women of childbearing age, with a reported incidence of 10-30% out of all detected pregnancies (11). A failed pregnancy can occur at any stage of pregnancy for a variety of reasons and may be associated with key physiological changes in the embryo, the uterine environment, and the ovaries (12, 13). Women with a history of pregnancy loss have higher rates of psychological conditions and chronic diseases (14). Although most cases of miscarriage are sporadic, some couples experience repeated miscarriages, which is a challenging situation to be addressed clinically (15). The causes of sporadic and recurrent miscarriage are multiple, but the risk of adverse pregnancy outcomes in almost all cases is influenced by prior obstetric history (16). Ultrasound is critical in the assessment and management of pregnancies of unknown location and can be of help in the differential diagnosis of early miscarriage, pregnancy of unknown location, and ectopic pregnancy (17). However, it cannot be used to assess the causes of unexplained pregnancy loss. The diagnostic criteria, treatment, preventive measures, and prediction methods for recurrent pregnancy loss vary globally as there is no international consensus on a definition (11, 18). Although the effects on couples are well documented, pregnancy loss is an understudied disorder with no precise diagnostic model or obvious treatment. Therefore, it is important to develop a precise and simple model to predict pregnancy loss in advance.

In general, the risk of recurrence of sporadic early pregnancy loss is low (approximately 12% to 14%) (15). However, pregnancy outcomes affect the proportion of recurrent miscarriages (19). Women affected by pregnancy loss have a 60% to 70% chance of a successful pregnancy; therefore, pregnancy loss is not the same as infertility (20). However, how to distinguish between these two conditions has rarely been addressed in the literature.

The current application of machine learning (ML) in healthcare highlights the potential to enhance disease diagnosis and clinical care, thus achieving early warning, improving patient outcomes, and increasing the diagnostic efficiency of clinicians (21, 22). The detection of laboratory indicators has made a significant contribution to disease diagnosis, and their synergy with ML algorithms can provide superior diagnostic accuracy and reduce false positives (23). However, its adoption in clinical practice for the diagnosis of infertility and pregnancy loss has not yet been realized, and the evaluation of ML-based diagnostic technologies in terms of infertility and pregnancy loss outcomes remains an ongoing endeavor.

The current definition of infertility acknowledges the importance of the total amount of time during which the patient has sought to become pregnant and the negative impact of age. To reduce the time to intervention and improve prognosis, the present study aimed to establish a simple and efficient method that provides early warning of infertility and potential pregnancy loss. We also systematically investigated the effect of 25-hydroxy vitamin D3 (25OHVD3) on infertility and pregnancy loss and whether the combination of 25OHVD3 and other clinical indicators can aid in the diagnosis of these conditions.

2 Materials and methods

2.1 Sample collection

In this study, we collected data from female patients who visited Xijing Hospital (Xi’an, China) from January 1, 2022, to June 1, 2023. All patients underwent medical history evaluation and physical examination, as well as clinical laboratory and ultrasound tests. All the included patients were diagnosed by gynecologists and infertility specialists and had a clear diagnosis that followed the appropriate guidelines (2426). The participants were divided into two groups (333 patients diagnosed with infertility and 319 patients diagnosed with failed pregnancy). In addition, a third group (control) of 327 age-matched healthy women was included in the study. The first group of patients included cases of infertility related to conditions in the fallopian tubes, cervix, uterus, and ovaries, as well as cases in which the cause of infertility was unknown. The second group included patients with a history of abortion or ectopic pregnancy but who had not been diagnosed with infertility. The inclusion and exclusion criteria, as well as a flow chart for patient recruitment, are shown in Figure 1. To assess the validity of our model, we also collected data from 2,294 female patients treated at the Fertility and Infertility Center of Xijing Hospital from January 1, 2015, to January 1, 2022, with a clear diagnosis of infertility (1264) and pregnancy loss (1030), as well as from 1059 age-matched healthy women. The research protocol was approved by the Ethics Committee of Xijing Hospital (KY20212027-C-1), and informed consent was obtained via telephone interviews because of the retrospective nature of the study.

Figure 1
Flowchart comparing outpatients of the fertility counseling department from January 2022 to June 2023. Panel A details 1,534 cases while Panel B describes 10,803 cases. Both panels separate patients into groups: those with infertility, history of pregnancy loss, and healthy controls. Each group has specific inclusion and exclusion criteria, with subdivisions for primary and secondary infertility. Panel B uses larger numbers for similar categories, reflecting broader data collection, and corresponds to verifying clinical models.

Figure 1. Patient inclusion and exclusion criteria, and flowchart for recruitment. (A) Flowchart for recruiting patients to establish diagnostic models. (B) Flowchart for recruiting patients to verify the diagnostic models.

2.2 Data collection

Serum levels of 25OHVD3 and 25-hydroxy vitamin D2 (25OHVD2) were analyzed using high performance liquid chromatography-mass spectrometry (HPLC-MS/MS). All the laboratory tests were analyzed by the Clinical Laboratory Department of Xijing Hospital, and the results were stored in the Laboratory Information System (LIS). All consultation and basic information were obtained from the Hospital Information System, LIS, and follow-up telephone interviews. All the data were collected by more than three independent technicians and statisticians in accordance with the standardization requirements. All electronic data involved in this study were stored on independent USB drives. The USB flash drive and paper document information were stored in a confidential cabinet, which had dual locks and the keys were independently kept by two fixed individuals. USB flash drives could only be used on designated, protected, and confidential computers. The confidential cabinet and confidential computer were managed and protected by dedicated personnel to ensure that the data and information of participants were not leaked. The data included basic patient information, demographic information, physical examination results, diagnosis, infertility period, smoking status, alcohol consumption, and other information (Table 1).

Table 1
www.frontiersin.org

Table 1. Patient characteristics.

2.3 Sample pretreatment for 25OHVD2 and 25OHVD3 detection

For HPLC-MS/MS detection of 25OHVD2 and 25OHVD3, 500 μL of internal standard solution was added to 100 μL of serum, following which the homogeneous solution was shaken and mixed for 1 min and centrifuged at 15,000 rpm for 10 min. The supernatant was then transferred for N2 drying. For the derivatization reaction, a 4-phenyl-1,2,4-triazoline-3,5-dione solution was added to the drying sample and incubated at 25°C for 30 min. The derivatization solution was subjected to N2 drying, following which 50 µL of methanol was added, mixed for 1 min, and centrifuged at 15,000 rpm for 10 min at 25°C. The supernatant was prepared for HPLC-MS/MS detection.

2.4 HPLC-MS/MS analysis of 25OHVD2 and 25OHVD3

In this study, 25OHVD2 and 25OHVD3 levels were analyzed using an HPLC-MS/MS system equipped with an Agilent 1200 HPLC system (Agilent1200, Waldbronn, Germany) and an API 3200 QTRAP MS/MS system (Sciex, Darmstadt, Germany). Mobile phase A consisted of an aqueous solution containing 1% formic acid and 1% ammonium formate, while phase B consisted of a methanol solution containing 1% formic acid and 1% ammonium formate. The optimized gradient elution was operated at a flow rate of 0.6 ml/min: 0 to 0.1 min, 70% B; 0.1 to 0.6 min, 70–95% B; 0.6 to 3.1 min, 95% B; and 3.1 to 4.0 min, 95–70% B. An autosampler was set to inject 20 μL at each step. MS data were detected via electrospray ionization (ESI) in a positive ion mode, and the remaining parameters were as follows: multiple reaction monitor scan type; collection ion pair: 25(OH)VD2: 619.3/298.3, d3-25(OH)VD2: 622.3/301.3, 25(OH)VD3: 607.3/298.3, d3-25(OH)VD3: 610.3/301.3; ion spray voltage, 5.5kV; ion source temperature, 600°C; curtain gas (CUR), 40.0 psi; nebulizer gas (GS1), 55.0 psi; declustering potential (DP), 40 V; entrance potential (EP), 4.0 V; collision energy (CE), 27; and collision cell exit potential (CXP), 3.0.

2.5 Feature selection and establishment of the diagnostic models

100+ clinical indicators were listed in Supplementary Table S1. All the missing values were supplemented by mean values. All data had not been normalized. Spearman correlation, recursive feature elimination (REF), and mutual information (MI)) were selected as methods of feature selection for the model of diagnosis. Spearman correlation analysis is used to assess the monotonic relationship between two continuous or ordered variables (27). It is used to characterize the correlation between two variables that have ordinal or distributional characteristics that cannot be described in terms of mean and standard deviation. MI is a metric that quantifies the dependence and relationship between two variables and represents the amount of information provided by one probabilistic variable about the other (28). RFE is the main representative of wrap-around feature selection, which brings classification algorithms into the process of feature selection to eliminate redundancy between features and output the best combination of features (29). To establish effective diagnostic models, thirty indicators with the highest contribution values were screened and cross-validated using these three methods. Common indicators were used to build the model. The selected features will vary with the number of samples and different screening methods, so we use multiple ML algorithms and metrics to simultaneously establish, validate, and evaluate our diagnostic model. Gaussian naive bayes (GNB), K-nearest neighbors (KNN), decision tree (DT), logistic regression (LR), and eXtreme gradient boosting (XGBoost) were used to develop and evaluate diagnostic models based on the common indicators. The performance of our model was improved by using a ten-fold crossover (train: 9, test:1, random allocation). External datasets were independently validated to enhance the generalizability of the models.

2.6 Statistical analysis

SPSS 23.0 (IBM, Armonk, NY, USA) was used for data analysis. Quantitative data are expressed as mean ± standard deviation. Group comparisons were made using the chi-square test for categorical variables and one-way analysis of variance (ANOVA), and the Kruskal-Wallis test for continuous variables. The multiple comparisons using the Bonferroni correction to reduce the risk of Type I error, and statistical significance was set at p<0.05/N, where N represents the number of comparisons. R Software (version 3.6.2, R Statistical Computing Project) was used for data visualization. The Python language was used for indicator screening and for diagnostic model building and evaluation. Orthogonal partial least squares discriminant analysis (OPLS-DA) was performed to determine overall differences between the groups. Risk factors for infertility and pregnancy loss were evaluated using logistic regression analysis. The diagnostic performance of the model was analyzed using receiver operating characteristic (ROC) curves.

3 Results

3.1 Patient characteristics

As shown in Figure 1, a total of 1534 patients were initially included in this study for modeling, and 360 were initially excluded according to the exclusion criteria. A total of 522 women were initially included in the healthy control group, with 195 excluded according to the telephone follow-up results, and the remaining 327 were finally enrolled. The average ages of patients with infertility (n=333), those with pregnancy loss (n=319), and healthy individuals (n=327) were 30.40 ± 14.72, 30.71 ± 14.10, and 31.15 ± 14.83 years, respectively, with no significant differences among the groups. In the infertility group, 75.08% of the patients had primary infertility and 24.92% had secondary infertility. In the secondary infertility group, 10.85% of the patients had been pregnant once, 48.19% had been pregnant twice, and 40.96% had been pregnant three times or more. In the pregnancy loss group, 25.71% of the patients had experienced an abortion once, 52.98% had experienced it twice, and 21.31% had experienced it three times or more. The main characteristics of each group are presented in Figure 1A and Table 1. Drinking and smoking were more common in the infertility group compared to the control group. The clinical indicators of each group are also presented in Supplementary Table S1. The basic information, inclusion and exclusion criteria for the 2294 patients and 1059 healthy individuals included in the validation set were also shown in Figure 1B.

3.2 Distribution of measured indicators in each group

As shown in Supplementary Figure S1A, the overall difference in 100+ clinical indicators clearly distinguished the infertility group from the normal control group. Among female patients, significant differences in the levels of the following indicators were observed between the group of patients with infertility and the control group: 25OHVD2, 25OHVD3, prothrombin time (PT), luteinizing hormone (LH), erythrocyte sedimentation rate (ESR), hyaline cast (Hy. CAST), high-density lipoprotein (HDL-C), thrombin time (TT), anticardiolipin antibodies (ACA), creatinine (CRE), homocysteine (HCY), mucous strands (MUCUS), NonSEC, mean corpuscular hemoglobin concentration (MCHC), triglyceride (TG), progesterone (PROG), estradiol (E2), urea (BUN), urinary epithelial cells (EC), gamma-glutamyl transferase (GGT), aspartate aminotransferase (AST), red blood cells (RBC), cystatin C (CysC), and pathocast (Path CAST) (VIP>1.00; Supplementary Figure S1B). Notably, 25OHVD3 was the indicator exhibiting the biggest differences between these two groups.

Similarly, an overall difference of 100+ clinical markers clearly distinguished the pregnancy loss group from the normal control group (Supplementary Figure S1C). Fourteen clinical indicators, including anti-thyroid peroxidase antibody (TPOAb), monocytes (MONO), neutrophilic granulocytes (NEUT), eosinophils (EO), human papillomavirus 59 (HPV59), red blood cell distribution width (RDW), Path CAST, red blood cell specific volume (HCT), free thyroxine 4 (FT4), human papillomavirus 81 (HPV 81), urine potential of hydrogen (UPH), albumin (ALB), basophils (BASO), and alanine aminotransferase (ALT), were significantly different between the group of patients with pregnancy loss and the control group (Supplementary Figure S1D).

3.3 Distribution of 25OHVD levels in each group

Levels above 30, 20–30, and below 20 ng/ml are regarded as normal, inadequate, and deficient, respectively (30). The study participants were divided into three groups according to their 25OHVD levels. The percentage of patients included in each group is shown in Figure 2. Deficiency, insufficiency, and sufficiency in 25OHVD were observed in 75.68%, 18.32%, and 6.01% of the patients with pregnancy loss, respectively (Figure 2A). Among the patients with infertility, the rates of 25OHVD deficiency, insufficiency, and sufficiency were 81.19%, 14.42%, and 4.39%, respectively (Figure 2B). Remarkably, we found that although 85% of the patients with infertility, 75% of the patients with pregnancy loss, and 61% of the healthy women in the control group had been supplemented with vitamin D (Figure 2C), the concentration of 25OHVD3 analyzed using ANOVA in the first two groups was much lower than that observed in healthy individuals (Figure 2D). Although we did not find a dose-response relationship between vitamin D level categories and infertility risk, our results showed that 48.84% of patients with infertility (25OHVD3<20ng/mL) chose in vitro fertilization technology, but only 4.07% of the patients successfully became pregnant (Figure 2E). In contrast, 57.32% of patients with infertility (25OHVD3>20ng/mL) chose in vitro fertilization technology, but 12.77% of the patients successfully became pregnant (Figure 2F).

Figure 2
(A) Pie chart showing 25-(OH)D deficiency: 75.67%, insufficiency: 18.32%, sufficiency: 6.01%. (B) Pie chart with deficiency: 81.19%, insufficiency: 14.42%, sufficiency: 4.39%. (C) Bar chart of vitamin D supplementation, higher in controls. (D) Scatter plot of 25(OH)D3 levels, higher in controls, with significant differences. (E) Pie chart with 4.07% success rate in a group of 123. (F) Pie chart with 12.77% success rate in a group of 47.

Figure 2. Serum 25OHVD3 levels. (A) Deficiency, insufficiency, and sufficiency percentage in 25OHVD of the patients with infertility. (B) Deficiency, insufficiency, and sufficiency percentage in 25OHVD of the patients with pregnancy loss. (C) supplementation of female groups. (D) Vitamin D concentration in different study groups. *, p < 0.05; ****, p < 0.0001. (E) successful pregnancy rate of patients with infertility (25OHVD3<20ng/mL) choosing in vitro fertilization technology. (F) successful pregnancy rate of patients with infertility (25OHVD3>20ng/mL) choosing in vitro fertilization technology.

3.4 Model development and diagnostic performance

Eleven indicators were selected via three methods (Spearman, REF, MI) as candidates for the model of infertility diagnosis: High-density lipoprotein (HDL), TG, 25OHVD3, PT, ACA, 25OHVD, HCY, urine bacterial count (BACT), TPOAb, E2, and hepatitis B core antibody (Anti-HBc) (Figure 3A). Five ML algorithms were used to establish and evaluate the model based on these 11 indicators. The results showed that the sensitivity for the training set was higher than 86.52%, the specificity was higher than 91.23%, the accuracy was higher than 89.70%, and the area under the curve (AUC) of the ROC was higher than 0.958 (Figure 3B; Table 2. The sensitivity for the testing set was higher than 81.81%, the specificity was higher than 88.08%, the accuracy was higher than 84.70%, and the AUC of the ROC was higher than 0.928 (Figure 3C; Table 2). The sensitivity for the validation set was higher than 74.81%, the specificity was higher than 84.00%, the accuracy was higher than 79.00%, and the AUC of the ROC was higher than 0.825 (Figure 3D; Table 2). XGboost performed the best by comparing the results of training sets, test sets, and validation sets in these five ML. Learning curve could illustrate the impact of the number of training samples on the model performance (Figure 3E). The results indicated that the ML algorithm used in this study did not exhibit overfitting or underfitting. The model has basically reached the performance platform and does not require additional data for further training. The SHAP model of the eleven indicators in the diagnostic model revealed that 25OHVD and 25OHVD3 contributed the most, followed by E2, Anti-HBc, TPOAb, HCY, TG, HDL, BACT, ACA, and PT (Figure 3F).

Figure 3
(A) A Venn diagram showing overlap between RFE, Spearman, and MI feature selection methods. (B) Heatmap of feature correlation with colors from light to dark blue indicating lower to higher correlation. (C) ROC curve comparing five models, labeled from highest to lowest AUC: XgBoost, DT, LR, GNB, KNN. (D) Another ROC curve with the same models and AUC scores. (E) Line chart showing AUC scores for training and cross-validation across various training examples, with red and green lines. (F) Bar chart showing feature importance using SHAP values, with 25OHD having the highest impact.

Figure 3. (A) Wayne diagrams for the MI, REF, and Spearman methods used to screen candidates for differentiating patients with infertility from healthy individuals. Hotspot map of candidate indicators for this differentiation. (B) Receiver operating characteristic (ROC) curve for the training set in the model for infertility diagnosis. (C) ROC curve for the test set in the model for infertility diagnosis. (D) ROC curve for the validation set in the model for infertility diagnosis. (E) Learning curve for the training set in the model for infertility diagnosis. (F) SHAP model for 11 indicators of the model for infertility diagnosis.

Table 2
www.frontiersin.org

Table 2. Evaluation of the model for infertility and control, pregnancy loss and control, infertility and pregnancy loss.

Seven indicators were selected via the three methods as candidates for the model of pregnancy loss: EC, TPOAb, HDL, testosterone (TESTO), 25OHVD3, PT, and 25OHVD (Figure 4A). Five ML algorithms were used to evaluate the model based on these seven indicators. The results showed that the sensitivity for the training set was higher than 92.02%, the specificity was higher than 95.18%, the accuracy was higher than 94.34%, and the AUC of the ROC curve was higher than 0.972 (Figure 4B; Table 2). The sensitivity for the testing set was higher than 90.78%, the specificity was higher than 88.02%, the accuracy was higher than 92.88%, and the AUC of the ROC curve was higher than 0.948 (Figure 4C; Table 2). The sensitivity for the validation set was higher than 87.25%, the specificity was higher than 86.88%, the accuracy was higher than 90.65%, and the AUC of the ROC was higher than 0.900 (Figure 4D; Table 2). XGboost performed the best by comparing the results of training sets, test sets, and validation sets in these five ML algorithms. The results of the learning curve indicated that the ML algorithm used in this study did not exhibit overfitting or underfitting (Figure 4E). The model had basically reached the performance platform and did not require additional data for further training. The SHAP model of the 7 indicators in the diagnostic model revealed that 25OHVD3 and 25OHVD contributed the most, followed by TESTO, HDL, PT, TPOAb, and EC (Figure 4F).

Figure 4
(A) A Venn diagram showing feature overlaps among RFE, Spearman, and MI methods. (B) Heatmap of feature correlations among 25OHVD3, 25OHVD, FT, HDL, TPO-Ab, EC, and TESTO. (C, D) ROC curves for different models: XgBoost, DT, LR, GNB, and KNN showing sensitivity vs. 1-specificity. (E) Line graph showing training and cross-validation accuracies versus training examples. (F) Bar chart illustrating mean SHAP values for features like 25OHVD3, 25OHVD, TESTO, HDL-C, PT, TPO-Ab, and EC.

Figure 4. (A) Wayne diagrams for MI, REF, and SPEARSON methods used to screen candidates for differentiating patients with pregnancy loss from healthy individuals. Hotspot map of candidate indicators for this differentiation. (B) ROC curve for the training set in the model for the prediction of pregnancy loss. (C) ROC curve for the test set in the model for the prediction of pregnancy loss. (D) ROC curve for the validation set in the model for the prediction of pregnancy loss. (E) Learning curve for the training set in the model for the prediction of pregnancy loss. (F) SHAP model for 7 indicators of the model for the prediction of pregnancy loss.

In addition, we tried to develop a model capable of distinguishing between patients with infertility and those with predicted pregnancy loss. Eight indicators (E2, LDL, Non SEC, BU, ACA, Antiβ2-G1, FSH, LH) were selected via the three methods as candidates for this model (Figure 5A). Five ML algorithms were used to evaluate the diagnostic model based on these eight indicators. The results showed that the sensitivity for the training set was higher than 54.05%, the specificity was higher than 80.60%, the accuracy was higher than 75.00%, and the AUC of the ROC was higher than 0.767 (Figure 5B; Table 2). The sensitivity for the testing set was higher than 52.58%, the specificity was higher than 80.10%, the accuracy was higher than 70.56%, and the AUC of the ROC was higher than 0.806 (Figure 5C; Table 2). The sensitivity for the validation set was higher than 52.93%, the specificity was higher than 79.92%, the accuracy was higher than 67.97%, and the AUC of the ROC was higher than 0.761 (Figure 5D; Table 2). XGboost performed the best by comparing the results of training sets, test sets, and validation sets in these five ML algorithms. The results of the learning curve indicated that the ML algorithm used in this study did not exhibit overfitting or underfitting (Figure 5E). The model had basically reached the performance platform and did not require additional data for further training. The SHAP model of the eight indicators in the diagnostic model revealed that LDL and Non SEC contributed the most, followed by FSH, LH, ACA, BU, Antiβ2-G1, and E2 (Figure 5F).

Figure 5
(A) Venn diagram showing overlap between RFE, Spearman, and MI methods for feature selection with various numbers in each section. (B) Heatmap displaying correlation coefficients among several variables, with color-coded intensities. (C, D) ROC curves for different machine learning models, illustrating sensitivity vs. one minus specificity with different AUC values. (E) Line plot showing AUC performance as a function of training examples, contrasting training and cross-validation. (F) Bar chart of features ranked by SHAP values, indicating their average impact on model output, with LDL-C and Non SEC as prominent factors.

Figure 5. (A) Wayne diagrams for the MI, REF, and SPEARSON methods used to screen candidates for differentiating patients with infertility from those with pregnancy loss. Hotspot map of candidate indicators for this differentiation. (B). ROC curves for the training set in the model for the differentiation of patients with infertility and with pregnancy loss. (C) ROC curve for the test set in the model for the differentiation of patients with infertility and with pregnancy loss. (D) ROC curve for the validation set in the model for the differentiation of patients with infertility and with pregnancy loss. (E) Learning curve for the training set in the model for the differentiation of patients with infertility and with pregnancy loss. (F) SHAP model for 8 indicators of the model for the differentiation of patients with infertility and with pregnancy loss.

3.5 Significantly different indicators in infertility risk factor assessment

To identify potential indicators of infertility risk, we used a binary logistic regression analysis to assess the relationship between these indicators and infertility. As shown in Supplementary Table S2, ESR60M, HDL, PT, 25OHVD3, LH, TT, CysC, ACA, HCY, CRE, MCHC, GGT, and MUCUS were statistically significant risk factors for female infertility.

25OHVD3 exhibited the most marked difference in cases of infertility. To investigate its role in the occurrence and development of infertility, we also looked at its correlation with a variety of clinical indicators. Intriguingly, 25OHVD3 was correlated with HPV31, HPV35, HPV26, ESR, thyroglobulin (TgZ), T4, E2, TG, Anti-HBc, HCY, BASO, CRE, and PROG (Figure 6A). 25OHVD2 was also correlated with HPV45, HPV55, HPV56, platelet (PLT), mean corpuscular hemoglobin (MCH), and prolactin (PRL) (Figure 6A). In patients with pregnancy loss, 25OHVD3 was not significantly associated with any of these clinical markers (Figure 6B). 25OHVD2 was significantly correlated with E2 and PROG (Figure 6B).

Figure 6
Two correlation matrices labeled A and B display the relationships between various health-related variables. Both matrices use a color gradient from blue to red to indicate correlation strength and direction, with numerical values annotated. The significance level is set at 0.05. A color bar on the right provides a reference for correlation values, ranging from negative one to positive one.

Figure 6. (A) Correlation graph of 25OHVD2, 25OHVD3, and 25OHVD in infertility with other clinical indicators. (B) Correlation graph of 25OHVD2, 25OHVD3, and 25OHVD in pregnancy loss with other clinical indicators.

4 Discussion

Many couples expecting to become pregnant struggle with infertility, the risk of which is reported to be equal for male and female patients (31). The causes of infertility are multiple (32). Previous reports have indicated that the causes of infertility are unknown in approximately 30% of infertile couples (33). The age of the female partner is among the factors that have been associated with unexplained infertility (34). Given the multitude of factors that must be considered, developing a system that can aid in the efficient, early, and accurate diagnosis of infertility remains both necessary and clinically challenging (35).

Our analyses identified positive correlations between the levels of BUN, ACA, HCY, MCHC, GGT, EC, TG, E2, CEA, and AST and female infertility, while negative correlations were observed between those of 25OHVD3, ESR60M, HDL-C, PT, TT, LH, CysC, CRE, MUCUS, and globulin and female infertility. Our results further indicate that HPV infection, abnormal coagulation function, thyroid dysfunction, abnormal blood lipid metabolism, 25OHVD deficiency or insufficiency, anemia, and abnormal liver function were risk factors for miscarriage and infertility. Over the past decade, our understanding regarding the benefits of vitamin D has improved significantly, particularly with regard to its non-skeletal functions (36). The vitamin D receptor (VDR) is expressed in most organs, suggesting that the roles of vitamin D extend beyond its functions in regulating calcium homeostasis and bone health (37). Numerous studies have reported associations between poor vitamin D status and cancer, allergies, immune disorders, cardiovascular metabolic diseases, irritable bowel syndrome, autism, muscle function, and brain function (36, 38). Given the numerous reports on the effects of vitamin D on other systems in the body, recent research has also focused on its role in human fertility (39). Vitamin D (cholecalciferol) has no biological activity; it must be activated by 25-hydroxylation in the liver, which converts cholecalciferol to the main circulating metabolite, 25-hydroxyvitamin D (25OHVD) (40, 41). Renal 1a-hydroxylase then converts 25OHVD to an active metabolite, 1,25(OH)2D, which binds to and activates VDR (42). The best method for assessing vitamin D status is to measure the serum concentration of 25OHVD, as it has a longer cyclic half-life and higher serum concentration than 1,25(OH)2D (43). The role of immunity in infertility and miscarriage has been demonstrated (44). Vitamin D signaling can regulate a variety of immune responses by regulating the differentiation and cycle of T cells, B cells, neutrophils, DC cells and other immune cells (45, 46). In addition, the enzyme CYP27B1, which produces the vitamin D hormone form 1,25(OH)2D, are expressed throughout the immune system. Notably, CYP27B1 expression in immune cells is independent of calcium homeostatic inputs (46). The importance of 1,25(OH)2D signaling in the regulation of the immune system is further emphasized by the numerous signaling pathways that control CYP27B1 expression in various immune cell types (47). In addition, vitamin D activates autophagy in a variety of cell types, including keratinocytes, hepatocytes, and endothelial cells, in response to cellular injury and oxidative stress (48). Therefore, we propose the hypothesis that vitamin D signaling could affect infertility and miscarriage by modulating immunity to influence the number of mature oocytes and the rate of blastocyst formation. In addition, Kinuta et al. found that VDR deficient mutant mice exhibited significant gonadal dysfunction, leading to high gonadotropin-induced hypogonadism and decreased ovarian aromatase activity (49). Therefore, VD is an important factor for the complete function of the gonads. While research regarding the relationship between vitamin D and fertility has yielded promising results, vitamin D deficiency has been associated with numerous diseases, meaning that its specificity for disease diagnosis remains poor (50). However, to the best of our knowledge, few studies have focused on whether the combination of 25OHVD and other clinical indicators can be useful in the diagnosis of infertility.

25OHVD3 was not only one of the indicators that showed the most marked difference in cases of infertility, but was also identified by all three of the methods used. To investigate its role in the occurrence and development of infertility, we also looked at its correlation with a variety of clinical indicators. Intriguingly, 25OHVD3 is associated with blood lipids, hormones, thyroid function, HPV infection, hepatitis B infection, sedimentation rate, renal function, coagulation function, and amino acids. However, in patients with pregnancy loss, although 25OHVD3 was also one of the most prominent indicators and was also identified by all three of the methods used, the correlation with HPV infection, coagulation function, platelet, thyroid function and other indicators disappeared. These results suggest that 25OHVD3 has a unique role in infertility, and its pathogenesis remains to be studied. The pathogenesis of infertility and that of pregnancy loss are indeed different, but the nature of the differences needs to be further studied.

Some authors have also attempted to develop new diagnostic methods for infertility. Cheng et al. established a cardiometabolic index (CMI) for diagnosing infertility (AUC=0.60, 95%CI: 0.56-0.65); the improved CMI index combined with BMI had a better predictive effect on infertility (AUC=0.722, 95%CI: 0.676-0.767) (51). Jiang et al. studied the plasma exosomes of 75 patients with polycystic ovary syndrome (PCOS) and used miR-126-3p, miR-146a-5p, miR-20b-5p, miR-106a-5p, and miR-18a-3p to distinguish PCOS patients from control individuals. The AUC of the ROC curve was 0.781 (52). However, there is still room for improvement in terms of infertility diagnosis. Since our results demonstrate that 25OHVD3 plays a role in the development of infertility, diagnosis based on multiple factors may have greater clinical significance. Similarly, we observed excellent diagnostic performance for the eleven factors included in our model for the diagnosis of female infertility, with AUC, sensitivity, and specificity values higher than 0.958, 86.52%, and 91.23%, respectively. In addition, we developed a diagnostic model to distinguish between infertility and pregnancy loss. Although the sensitivity of the models established by GNB, KNN, and DT ML in the verification set is not high (52.93%, 55.14%, and 69.30%, respectively), however the sensitivity of the models established by LR, and XgBoost ML in the verification set is 76.05%, and 85.32%, respectively, indicating that the models established by LR, and XgBoost, ML can be used to distinguish fertility from pregnancy loss. To the best of our knowledge, there are currently few diagnostic models for distinguishing infertility from pregnancy loss, and our results can fill in this gap. Moreover, the sensitivity and specificity of our model were markedly higher than those estimated for routine parameters and for most models that have been reported so far. The models we have developed are relatively simple, as the data can be obtained via routine laboratory analyses. Moreover, the present results may aid in the development of new indices for the diagnosis, treatment, and prevention of infertility. In addition, the models are suitable for use in large-scale screening to provide early warning of infertility, which can help ensure that patients do not miss the window of opportunity for treatment. Despite these advantages mentioned above, the performance of the discrimination model between infertility and pregnancy loss is low. The reason may be due to the overlap of clinical manifestations or the similarity of pathologic mechanisms in some of the two diseases, such as hormonal disorders, thyroid abnormalities and other clinical manifestations in patients with the two diseases. The introduction of other more differentiated indicators may be one of the ways to improve the diagnostic model, which needs further research.

5 Limitations

This study has some limitations. Firstly, we investigated the medical records from a single hospital located in one of the more developed cities in western China. Most of the patients and healthy individuals came from city. Due to the longer treatment time, most of the patients who came to our hospital for treatment and physical examination had better living conditions and higher education level. Therefore, the data inevitably have a certain bias. Secondly, vitamin D is affected by factors that include dietary intake and sunlight exposure, among others. Although our study considered daily intake, the levels of vitamin D also fluctuate seasonally, and therefore this potential indicator would need to be validated in different populations and during different seasons, something that was beyond our current means. Lastly, since population lifestyles are largely influenced by the region considered, and more than 99% of the participants in this study were from Western China, caution must be exercised while extrapolating the results to other regions of the world. In our future research, we will expand the number of enrolled participants and cooperate with multiple hospitals in China to verify the effect of our diagnostic model.

6 Conclusion

We sought to determine whether combining 25OHVD3 with other clinical indicators could increase its value in the diagnosis of infertility and pregnancy loss. Our results demonstrated that 25OHVD3 was the factor exhibiting the most marked difference between patients with infertility and the control group, and between patients affected by pregnancy loss and the control group. 25OHVD3 has a role in the occurrence and development of infertility. Both of the models we developed using five machine learning algorithms exhibited superior performance. These models are advantageous in that they are relatively simple, as the data can be obtained via routine laboratory analyses. Ultimately, the good performance and high sensitivity of the models presented here may facilitate early detection of infertility and pregnancy loss, in turn enabling timely diagnosis and treatment within the optimal reproductive window. Despite our promising findings, further studies involving larger populations are required to verify the practicality of our models and whether they can yield a clear clinical benefit, as well as the most appropriate methods for stratifying candidate patients.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Ethics statement

The studies involving humans were approved by Ethics Committee of Xijing Hospital (KY20212027-C-1), and informed consent was obtained via telephone interviews because of the retrospective nature of the study. The studies were conducted in accordance with the local legislation and institutional requirements.

Author contributions

RZ: Formal Analysis, Writing – original draft. YG: Resources, Writing – review & editing. XZ: Resources, Writing – review & editing. JW: Writing – review & editing, Resources. XH: Data curation, Resources, Writing – review & editing. LY: Resources, Validation, Writing – review & editing. LZ: Investigation, Writing – review & editing. JG: Data curation, Writing – review & editing. JL: Funding acquisition, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This study was funded by the Shaanxi Province Innovation Capability Supporting Program (Jiayun Liu, 2021LXZX3-01), Key Research and Development Plan of Shaanxi Province (Jiayun Liu, 2021ZDLSF06-06).

Acknowledgments

The authors would like to thank the technical staff members who assisted with this project.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fendo.2025.1544724/full#supplementary-material

References

1. Vander Borght M and Wyns C. Fertility and infertility: definition and epidemiology. Clin Biochem. (2018) 62:2–10. doi: 10.1016/j.clinbiochem.2018.03.012

PubMed Abstract | Crossref Full Text | Google Scholar

2. Lai JD, Fantus RJ, Cohen AJ, Wan V, Hudnall MT, Pham M, et al. Unmet financial burden of infertility care and the impact of state insurance mandates in the United States: analysis from a popular crowdfunding platform. Fertility sterility. (2021) 116:1119–25. doi: 10.1016/j.fertnstert.2021.05.111

PubMed Abstract | Crossref Full Text | Google Scholar

3. Hanson B, Johnstone E, Dorais J, Silver B, Peterson CM, and Hotaling J. Female infertility, infertility-associated diagnoses, and comorbidities: A review. J assisted Reprod Genet. (2017) 34:167–77. doi: 10.1007/s10815-016-0836-8

PubMed Abstract | Crossref Full Text | Google Scholar

4. Driscoll MA, Davis MC, Aiken LS, Yeung EW, Sterling EW, Vanderhoof V, et al. Psychosocial vulnerability, resilience resources, and coping with infertility: A longitudinal model of adjustment to primary ovarian insufficiency. Ann Behav medicine: Publ Soc Behav Med. (2016) 50:272–84. doi: 10.1007/s12160-015-9750-z

PubMed Abstract | Crossref Full Text | Google Scholar

5. Kessler LM, Craig BM, Plosker SM, Reed DR, and Quinn GP. Infertility evaluation and treatment among women in the United States. Fertility sterility. (2013) 100:1025–32. doi: 10.1016/j.fertnstert.2013.05.040

PubMed Abstract | Crossref Full Text | Google Scholar

6. Szamatowicz M and Szamatowicz J. Proven and unproven methods for diagnosis and treatment of infertility. Adv Med Sci. (2020) 65:93–6. doi: 10.1016/j.advms.2019.12.008

PubMed Abstract | Crossref Full Text | Google Scholar

7. Penzias A, Azziz R, Bendikson K, Falcone T, Hansen K, Hill M, et al. Obesity and reproduction: A committee opinion. Fertility sterility. (2021) 116:1266–85. doi: 10.1016/j.fertnstert.2021.08.018

PubMed Abstract | Crossref Full Text | Google Scholar

8. He Q, Zhou Y, Zhou W, Mao C, Kang Q, Pan Y, et al. Nomogram Incorporating Ultrasonic Markers Of endometrial Receptivity to Determine the Embryo-Endometrial Synchrony after in Vitro Fertilization. Front Endocrinol. (2022) 13:973306. doi: 10.3389/fendo.2022.973306

PubMed Abstract | Crossref Full Text | Google Scholar

9. Brugh VM 3rd, Nudell DM, and Lipshultz LI. What the urologist should know about the female infertility evaluation. Urologic Clinics North America. (2002) 29:983–92. doi: 10.1016/s0094-0143(02)00087-3

PubMed Abstract | Crossref Full Text | Google Scholar

10. Nijjar S, Jauniaux E, and Jurkovic D. Definition and diagnosis of cesarean scar ectopic pregnancies. Best Pract Res Clin obstetrics gynaecology. (2023) 89:102360. doi: 10.1016/j.bpobgyn.2023.102360

PubMed Abstract | Crossref Full Text | Google Scholar

11. The L. Miscarriage: worldwide reform of care is needed. Lancet (London England). (2021) 397:1597. doi: 10.1016/s0140-6736(21)00954-5

PubMed Abstract | Crossref Full Text | Google Scholar

12. Regan L and Rai R. Thrombophilia and pregnancy loss. J Reprod Immunol. (2002) 55:163–80. doi: 10.1016/s0165-0378(01)00144-9

PubMed Abstract | Crossref Full Text | Google Scholar

13. Giakoumelou S, Wheelhouse N, Cuschieri K, Entrican G, Howie SE, and Horne AW. The role of infection in miscarriage. Hum Reprod Update. (2016) 22:116–33. doi: 10.1093/humupd/dmv041

PubMed Abstract | Crossref Full Text | Google Scholar

14. Kersting A and Wagner B. Complicated grief after perinatal loss. Dialogues Clin Neurosci. (2012) 14:187–94. doi: 10.31887/DCNS.2012.14.2/akersting

PubMed Abstract | Crossref Full Text | Google Scholar

15. Tise CG and Byers HM. Genetics of recurrent pregnancy loss: A review. Curr Opin obstetrics gynecology. (2021) 33:106–11. doi: 10.1097/gco.0000000000000695

PubMed Abstract | Crossref Full Text | Google Scholar

16. Schummers L, Oveisi N, Ohtsuka MS, Hutcheon JA, Ahrens KA, Liauw J, et al. Early pregnancy loss incidence in high-income settings: A protocol for a systematic review and meta-analysis. Systematic Rev. (2021) 10:274. doi: 10.1186/s13643-021-01815-1

PubMed Abstract | Crossref Full Text | Google Scholar

17. Scibetta EW and Han CS. Ultrasound in early pregnancy: viability, unknown locations, and ectopic pregnancies. Obstetrics gynecology Clinics North America. (2019) 46:783–95. doi: 10.1016/j.ogc.2019.07.013

PubMed Abstract | Crossref Full Text | Google Scholar

18. Brady PC. New evidence to guide ectopic pregnancy diagnosis and management. Obstetrical gynecological survey. (2017) 72:618–25. doi: 10.1097/ogx.0000000000000492

PubMed Abstract | Crossref Full Text | Google Scholar

19. Dimitriadis E, Menkhorst E, Saito S, Kutteh WH, and Brosens JJ. Recurrent pregnancy loss. Nat Rev Dis Primers. (2020) 6:98. doi: 10.1038/s41572-020-00228-z

PubMed Abstract | Crossref Full Text | Google Scholar

20. de Bennetot M, Rabischong B, Aublet-Cuvelier B, Belard F, Fernandez H, Bouyer J, et al. Fertility after tubal ectopic pregnancy: results of a population-based study. Fertil Steril. (2012) 98:1271–6. doi: 10.1016/j.fertnstert.2012.06.041

PubMed Abstract | Crossref Full Text | Google Scholar

21. Charu V, Liang JW, Mannalithara A, Kwong A, Tian L, and Kim WR. Benchmarking clinical risk prediction algorithms with ensemble machine learning for the noninvasive diagnosis of liver fibrosis in nafld. Hepatol (Baltimore Md). (2024) 80:1184–95. doi: 10.1097/hep.0000000000000908

PubMed Abstract | Crossref Full Text | Google Scholar

22. Uwimana A, Gnecco G, and Riccaboni M. Artificial intelligence for breast cancer detection and its health technology assessment: A scoping review. Comput Biol Med. (2024) 184:109391. doi: 10.1016/j.compbiomed.2024.109391

PubMed Abstract | Crossref Full Text | Google Scholar

23. Huang YC, Liu TC, and Lu CJ. Establishing a machine learning dementia progression prediction model with multiple integrated data. BMC Med Res Method. (2024) 24:288. doi: 10.1186/s12874-024-02411-2

PubMed Abstract | Crossref Full Text | Google Scholar

24. Penzias A, Azziz R, Bendikson K, Cedars M, Falcone T, Hansen K, et al. Fertility evaluation of infertile women: A committee opinion. Fertility sterility. (2021) 116:1255–65. doi: 10.1016/j.fertnstert.2021.08.038

PubMed Abstract | Crossref Full Text | Google Scholar

25. Practice Committee of the American Society for Reproductive Medicine. Definitions of infertility and recurrent pregnancy loss: A committee opinion. Fertil Steril. (2020) 113:533–5. doi: 10.1016/j.fertnstert.2019.11.025

PubMed Abstract | Crossref Full Text | Google Scholar

26. Pfeifer S, Butts S, Dumesic D, Fossum G, Gracia C, Barbera AL, et al. Diagnostic evaluation of the infertile female: A committee opinion. Fertility sterility. (2015) 103:e44–50. doi: 10.1016/j.fertnstert.2015.03.019

PubMed Abstract | Crossref Full Text | Google Scholar

27. Liu Y, Mo W, Wang H, Shao Z, Zeng Y, and Bi J. Feature selection and risk prediction for diabetic patients with ketoacidosis based on mimic-iv. Front Endocrinol. (2024) 15:1344277. doi: 10.3389/fendo.2024.1344277

PubMed Abstract | Crossref Full Text | Google Scholar

28. Kim SH, Park SY, Seo H, and Woo J. Feature selection integrating shapley values and mutual information in reinforcement learning: an application in the prediction of post-operative outcomes in patients with end-stage renal disease. Comput Methods programs biomedicine. (2024) 257:108416. doi: 10.1016/j.cmpb.2024.108416

PubMed Abstract | Crossref Full Text | Google Scholar

29. Deng F, Zhao L, Yu N, Lin Y, and Zhang L. Union with recursive feature elimination: A feature selection framework to improve the classification performance of multicategory causes of death in colorectal cancer. Lab investigation; J Tech Methods Pathol. (2024) 104:100320. doi: 10.1016/j.labinv.2023.100320

PubMed Abstract | Crossref Full Text | Google Scholar

30. Rizzoli R. Vitamin D supplementation: upper limit for safety revisited? Aging Clin Exp Res. (2021) 33:19–24. doi: 10.1007/s40520-020-01678-x

PubMed Abstract | Crossref Full Text | Google Scholar

31. Agarwal A, Baskaran S, Parekh N, Cho CL, Henkel R, Vij S, et al. Male infertility. Lancet (London England). (2021) 397:319–33. doi: 10.1016/s0140-6736(20)32667-2

PubMed Abstract | Crossref Full Text | Google Scholar

32. Boutari C, Pappas PD, Mintziori G, Nigdelis MP, Athanasiadis L, Goulis DG, et al. The effect of underweight on female and male reproduction. Metabolism: Clin Exp. (2020) 107:154229. doi: 10.1016/j.metabol.2020.154229

PubMed Abstract | Crossref Full Text | Google Scholar

33. Collée J, Mawet M, Tebache L, Nisolle M, and Brichant G. Polycystic ovarian syndrome and infertility: overview and insights of the putative treatments. Gynecological endocrinology: Off J Int Soc Gynecological Endocrinol. (2021) 37:869–74. doi: 10.1080/09513590.2021.1958310

PubMed Abstract | Crossref Full Text | Google Scholar

34. Wall DJ, Reinhold C, Akin EA, Ascher SM, Brook OR, Dassel M, et al. Acr appropriateness criteria® Female infertility. J Am Coll Radiology: JACR. (2020) 17:S113–s24. doi: 10.1016/j.jacr.2020.01.018

PubMed Abstract | Crossref Full Text | Google Scholar

35. Duffy JMN, Bhattacharya S, Bhattacharya S, Bofill M, Collura B, Curtis C, et al. Standardizing definitions and reporting guidelines for the infertility core outcome set: an international consensus development study. Fertility sterility. (2021) 115:201–12. doi: 10.1016/j.fertnstert.2020.11.013

PubMed Abstract | Crossref Full Text | Google Scholar

36. Charoenngam N and Holick MF. Immunologic effects of vitamin D on human health and disease. Nutrients. (2020) 12:2097. doi: 10.3390/nu12072097

PubMed Abstract | Crossref Full Text | Google Scholar

37. Giustina A, Bilezikian JP, Adler RA, Banfi G, Bikle DD, Binkley NC, et al. Consensus statement on vitamin D status assessment and supplementation: whys, whens, and hows. Endocrine Rev. (2024) 45:625–54. doi: 10.1210/endrev/bnae009

PubMed Abstract | Crossref Full Text | Google Scholar

38. Ismailova A and White JH. Vitamin D, infections and immunity. Rev endocrine Metab Disord. (2022) 23:265–77. doi: 10.1007/s11154-021-09679-5

PubMed Abstract | Crossref Full Text | Google Scholar

39. Banks N, Sun F, Krawetz SA, Coward RM, Masson P, Smith JF, et al. Male vitamin D status and male factor infertility. Fertility sterility. (2021) 116:973–9. doi: 10.1016/j.fertnstert.2021.06.035

PubMed Abstract | Crossref Full Text | Google Scholar

40. Putri Susilo AF, Syam HH, Bayuaji H, Rachmawati A, Halim B, Permadi W, et al. Free 25(Oh)D3 levels in follicular ovarian fluid top-quality embryos are higher than non-top-quality embryos in the normoresponders group. Sci Rep. (2024) 14:29023. doi: 10.1038/s41598-024-71769-6

PubMed Abstract | Crossref Full Text | Google Scholar

41. Yang H, Lu Y, Zhao L, He Y, He Y, and Chen D. The mediating role of serum 25-hydroxyvitamin D on the association between reduced sensitivity to thyroid hormones and periodontitis in chinese euthyroid adults. Front Endocrinol. (2024) 15:1456217. doi: 10.3389/fendo.2024.1456217

PubMed Abstract | Crossref Full Text | Google Scholar

42. Carlberg C, Raczyk M, and Zawrotna N. Vitamin D: A master example of nutrigenomics. Redox Biol. (2023) 62:102695. doi: 10.1016/j.redox.2023.102695

PubMed Abstract | Crossref Full Text | Google Scholar

43. Jain SK, Justin Margret J, Abrams SA, Levine SN, and Bhusal K. The impact of vitamin D and L-cysteine co-supplementation on upregulating glutathione and vitamin D-metabolizing genes and in the treatment of circulating 25-hydroxy vitamin D deficiency. Nutrients. (2024) 16:2004. doi: 10.3390/nu16132004

PubMed Abstract | Crossref Full Text | Google Scholar

44. Ikemoto Y, Kuroda K, Nakagawa K, Ochiai A, Ozaki R, Murakami K, et al. Vitamin D regulates maternal T-helper cytokine production in infertile women. Nutrients. (2018) 10:902. doi: 10.3390/nu10070902

PubMed Abstract | Crossref Full Text | Google Scholar

45. Wu L, Kwak-Kim J, Zhang R, Li Q, Lu FT, Zhang Y, et al. Vitamin D level affects ivf outcome partially mediated via th/tc cell ratio. Am J Reprod Immunol (New York NY: 1989). (2018) 80:e13050. doi: 10.1111/aji.13050

PubMed Abstract | Crossref Full Text | Google Scholar

46. Bikle DD, Patzek S, and Wang Y. Physiologic and pathophysiologic roles of extra renal cyp27b1: case report and review. Bone Rep. (2018) 8:255–67. doi: 10.1016/j.bonr.2018.02.004

PubMed Abstract | Crossref Full Text | Google Scholar

47. Artusa P and White JH. Vitamin D and its analogs in immune system regulation. Pharmacol Rev. (2025) 77:100032. doi: 10.1016/j.pharmr.2024.100032

PubMed Abstract | Crossref Full Text | Google Scholar

48. Bhutia SK. Vitamin D in autophagy signaling for health and diseases: insights on potential mechanisms and future perspectives. J Nutr Biochem. (2022) 99:108841. doi: 10.1016/j.jnutbio.2021.108841

PubMed Abstract | Crossref Full Text | Google Scholar

49. Kinuta K, Tanaka H, Moriwake T, Aya K, Kato S, and Seino Y. Vitamin D is an important factor in estrogen biosynthesis of both female and male gonads. Endocrinology. (2000) 141:1317–24. doi: 10.1210/endo.141.4.7403

PubMed Abstract | Crossref Full Text | Google Scholar

50. Huang HY, Lin TW, Hong ZX, and Lim LM. Vitamin D and diabetic kidney disease. Int J Mol Sci. (2023) 24:3751. doi: 10.3390/ijms24043751

PubMed Abstract | Crossref Full Text | Google Scholar

51. Cheng H, He X, and Jin X. The relationship between cardiometabolic index and infertility in american adults: A population-based study. Front Endocrinol. (2024) 15:1424033. doi: 10.3389/fendo.2024.1424033

PubMed Abstract | Crossref Full Text | Google Scholar

52. Jiang X, Li J, Zhang B, Hu J, Ma J, Cui L, et al. Differential expression profile of plasma exosomal micrornas in women with polycystic ovary syndrome. Fertility sterility. (2021) 115:782–92. doi: 10.1016/j.fertnstert.2020.08.019

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: infertility, pregnancy loss, machine learning, 25OHVD3, diagnosis

Citation: Zhang R, Guo Y, Zhai X, Wang J, Hao X, Yang L, Zhou L, Gao J and Liu J (2025) Machine learning algorithm based on combined clinical indicators for the prediction of infertility and pregnancy loss. Front. Endocrinol. 16:1544724. doi: 10.3389/fendo.2025.1544724

Received: 13 December 2024; Accepted: 26 June 2025;
Published: 18 July 2025.

Edited by:

Richard Ivell, University of Nottingham, United Kingdom

Reviewed by:

Marta Méndez, Hospital Clinic of Barcelona, Spain
Xiushan Feng, Fujian Medical University, China

Copyright © 2025 Zhang, Guo, Zhai, Wang, Hao, Yang, Zhou, Gao and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jiayun Liu, amlheXVuQGZtbXUuZWR1LmNu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.