Risk prediction model based on machine learning for predicting miscarriage among pregnant patients with immune abnormalities

Introduction: It is known that patients with immune-abnormal co-pregnancies are at a higher risk of adverse pregnancy outcomes. Traditional pregnancy risk management systems have poor prediction abilities for adverse pregnancy outcomes in such patients, with many limitations in clinical application. In this study, we will use machine learning to screen high-risk factors for miscarriage and develop a miscarriage risk prediction model for patients with immune-abnormal pregnancies. This model aims to provide an adjunctive tool for the clinical identification of patients at high risk of miscarriage and to allow for active intervention to reduce adverse pregnancy outcomes. Methods: Patients with immune-abnormal pregnancies attending Sichuan Provincial People’s Hospital were collected through electronic medical records (EMR). The data were divided into a training set and a test set in an 8:2 ratio. Comparisons were made to evaluate the performance of traditional pregnancy risk assessment tools for clinical applications. This analysis involved assessing the cost-benefit of clinical treatment, evaluating the model's performance, and determining its economic value. Data sampling methods, feature screening, and machine learning algorithms were utilized to develop predictive models. These models were internally validated using 10-fold cross-validation for the training set and externally validated using bootstrapping for the test set. Model performance was assessed by the area under the characteristic curve (AUC). Based on the best parameters, a predictive model for miscarriage risk was developed, and the SHapley additive expansion (SHAP) method was used to assess the best model feature contribution. Results: A total of 565 patients were included in this study on machine learning-based models for predicting the risk of miscarriage in patients with immune-abnormal pregnancies. Twenty-eight risk warning models were developed, and the predictive model constructed using XGBoost demonstrated the best performance with an AUC of 0.9209. The SHAP analysis of the best model highlighted the total number of medications, as well as the use of aspirin and low molecular weight heparin, as significant influencing factors. The implementation of the pregnancy risk scoring rules resulted in accuracy, precision, and F1 scores of 0.3009, 0.1663, and 0.2852, respectively. The economic evaluation showed a saving of ¥7,485,865.7 due to the model. Conclusion: The predictive model developed in this study performed well in estimating the risk of miscarriage in patients with immune-abnormal pregnancies. The findings of the model interpretation identified the total number of medications and the use of other medications during pregnancy as key factors in the early warning model for miscarriage risk. This provides an important basis for early risk assessment and intervention in immune-abnormal pregnancies. The predictive model developed in this study demonstrated better risk prediction performance than the Pregnancy Risk Management System (PRMS) and also demonstrated economic value. Therefore, miscarriage risk prediction in patients with immune-abnormal pregnancies may be the most cost-effective management method.


Introduction
Miscarriage is one of the most common pregnancy complications in obstetrics and gynaecology.In China, termination of pregnancy at less than 28 weeks of gestation with a foetus weighing less than 1,000 g is still defined as miscarriage (Writing Group Of Chinese Expert, 2020), and ESHRE defines miscarriage as pregnancy loss before 24 weeks of gestation (Bender et al., 2018).The maintenance and progression of pregnancy is a complex process governed by multiple developmental factors (Tasadduq et al., 2021).Pregnancy is associated with mechanisms that regulate the immune response at the maternal-fetal interface, and when pregnancy is combined with autoimmune abnormalities, the recurrence of autoimmune disease in some patients is associated with a significant incidence of adverse pregnancy outcomes (Ford and Schust, 2009;Robinson, 2014).Autoimmune diseases predispose to women of childbearing age (Chinese Society of Rheumatology of the Chinese Medical AssociationNational Clinical Research Center for Dermatologic and Immunologic DiseasesChinese Systemic Lupus Erythematosus Treatment and Research Group, 2020).aPLs is a general term for a group of autoantibodies that target phospholipids and/or phospholipid-binding proteins as antigens.aPLs in the diagnostic criteria include lupus anticoagulant (LA), anti-cardiolipin antibody (aPL), and anti-β2 glycoprotein 1 (β2-GP1) (Tektonidou et al., 2019).A retrospective study revealed approximately 9% aPL (positivity in patients with autoimmune diseases who experienced pregnancy loss (Han et al., 2017).In the group of patients with autoimmune diseases and autoantibody abnormalities, autoimmune abnormalities often increase the risk of adverse pregnancy outcomes (Carp et al., 2012).Systemic lupus erythematosus (SLE) is an autoimmune-mediated, diffuse connective tissue disease highlighted by immune inflammation (Chinese Medical Association Rheumatology Branch, 2010).Antiphospholipid syndrome (APS) is a non-inflammatory autoimmune disease that is characterised by arterial and venous thrombosis, morbid pregnancy (early miscarriage in pregnancy and stillbirth in mid-late pregnancy) and thrombocytopenia, and the presence of aPL, which may be present singly or in combination.The presence of aPL in the serum, which may be present singly or in combination (Chinese Medical Association Rheumatology Branch, 2011).Despite exceptions such as SLE or APS, the impact of immune abnormalities on reproductive health has been largely overlooked in clinical practice and research.Consequently, early and precise identification of patients at risk of miscarriage is crucial, necessitating timely intervention.Personalized medicine holds the potential to revolutionize the standard of care, shifting from generic guidelines to computational models based on individual patient data.This approach aims to develop more convenient diagnostic tools for women with immunologically abnormal pregnancies, accurately identify high-risk patients, and proactively intervene to support them in carrying their pregnancies to full term.
Currently, there are various criteria for assessing the risk of pregnancy in women in obstetrics.Scholars in England have created the Obstetric Early Warning Score tool (Singh et al., 2012).In 2017, the China Health and Family Planning Commission issued the Norms for the Assessment and Management of Maternal Pregnancy Risks (National Health and Family Planning Commission of the People's Republic of China, 2017).Pregnancy risk management systems or early warning scoring systems are commonly used to assess the risk of pregnant patients in the past (Subbe et al., 2001;Jing et al., 2016;Coomarasamy et al., 2021).However, despite their use for risk assessment, these criteria have some limitations.For instance, they often fail to provide early warning of risk in early pregnancy.Moreover, the pregnancy risk management system relies on a risk classification system rather than a point system, which is hindered by individual differences and the subjective influence of the evaluator.This has led to insufficient clinical application, staffing and training challenges, and non-uniform development of healthcare (Practice Committee of the American Society for Reproductive Medicine, 2012; Ping et al., 2016;Zhang et al., 2023).The relationships between clinical characteristics and biomarkers in extensive studies are intricate and highly heterogeneous, making it challenging for clinicians to predict miscarriage risk using standardized scores.Consequently, there is still a lack of evidence in clinical practice to quantify the association of risk and risk outcomes, hindering the creation of miscarriage risk prediction tools that not only generate risk predictions but also provide interpretable rules to support clinicians' understanding of the resulting risk pathways.Such tools could lead to improved diagnosis, treatment choices, and overall health system efficiencies.
In recent years, machine learning has been increasingly utilized to predict pregnancy outcomes in expectant mothers.By modelling information based on causal and/or statistical data, machine learning can potentially unveil hidden dependencies between environmental factors and diseases within extensive datasets (Bratic et al., 2018).Our systematic search of Pubmed for studies on machine learning applications in predicting pregnancy outcomes yielded 28 relevant studies.These studies demonstrated the use of machine learning algorithms such as Logistic Regression (LR), Artificial Neural Network (ANN), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM) to predict pregnancy outcomes (Lakshmi, 2016;Malacova et al., 2020;Khatibi et al., 2021;Vaulet et al., 2022).While these models incorporated various factors, including age, maternal history, gestational age, BMI, biomarkers, and immunological factors, several challenges and limitations persist.Firstly, some prediction models exhibited lower than optimal Area Under the Curve (AUC) values, indicating a more general prediction performance.Secondly, certain studies only accounted for single-factor variations, neglecting the potential impact of immune factors, underlying patient conditions, and other relevant variables on pregnancy outcomes.Additionally, some prediction models were excessively complex in their operational steps, consuming substantial resources, and although statistically acceptable, their results were challenging to interpret for clinical application.Nevertheless, the interpretability of unsupervised machine learning results and their seamless implementation in clinical practice remain crucial.Despite the significant advantages, the development of pregnancy outcome prediction models using advanced machine learning algorithms is still relatively uncommon.This presents a new frontier for health professionals and policymakers, emphasizing the need for computational methods based on large patient datasets to advance the field.
The study aimed to develop a predictive model for assessing the risk of miscarriage in patients with immunologically abnormal pregnancies, to pinpoint high-risk factors contributing to pregnancy loss, and to investigate the impact of fundamental patient characteristics, biomarkers, medication regimens, and underlying disease characteristics on pregnancy outcomes.By stratifying outpatients with a high risk of miscarriage, early warning can aid in the clinical management of patients.Moreover, identifying major risk factors can support clinicians in making informed medical decisions and implementing proactive pharmacological interventions, crucial for preventing or reducing the likelihood of miscarriage.Additionally, for low-risk patients, this can help minimize unnecessary treatment costs and shorten the duration of treatment, which is of utmost importance.

Data acquisition
The prediction modelling was conducted at Sichuan Provincial People's Hospital from October 2018 to October 2022, and we included all patients with complete demographic and clinical data from the Obstetrics and Rheumatology departments during this period.Out of 1668 samples.Medical data of these patients were retrospectively extracted from the electronic medical record (EMR) system.These medical data were generated and stored in the EMR system during diagnostic and laboratory tests.Medical data were generated and stored in the EMR system during diagnostic and laboratory testing.Face-to-face interviews were conducted with participating patients.Participants were selected based on the following criteria: 1) aged 18-45 years; 2) with a history of rheumatological immunity or abnormal autoantibody values; 3) on regular medication as prescribed by the doctor; and 4) with access to the patient's complete course of treatment and pregnancy outcome.They were excluded if 1) treatment did not proceed as originally planned; 2) serious adverse reactions occurred during treatment; or 3) failure to obtain patient treatment and pregnancy outcomes.Miscarriage or spontaneous abortion resulting in foetal death before 20 weeks of gestation was defined as unintended termination of pregnancy.Ethical approval was obtained through the Ethics Committee of Sichuan Provincial People's Hospital (Approval #2023-264).A detailed inclusion-exclusion flow chart is shown in Figure 1.

Research process of traditional evaluation methods
Biomarkers that are associated with the disease in the course of previous studies are scored, and before scoring is performed, a detailed history, physical and diagnostic screening programme is required, and according to the Maternal Pregnancy Risk Assessment Scale proposed by the National Health and Family Planning Commission of the People's Republic of China (National Health and Family Planning Commission of the People's Republic of China, 2017), the patient's review is to include an assessment of rheumatological disease characteristics, and values are assigned according to each parameter, with a base score of 25% being given to yellow-risk-only entries, 50% being given to orange-risk and or yellow-risk entries, and 75% being given to red-risk entries, with details of the extra points given for each entry as shown in Table 1.

Cost-effectiveness analysis
The economic costs of the patient's illness include direct costs: including direct medical costs and direct non-medical costs.1) Direct medical costs: outpatient and emergency room costs, inpatient costs, retail drug costs, 2) Direct non-medical costs: other direct non-medical costs incurred by the patient and his/ her companion such as travelling and nutritional costs for the visit to the doctor.By retrieving the details of outpatient and inpatient costs of 76 patients with immunologically abnormal pregnancies and 68 patients with miscarriages from the information department, an economic evaluation was conducted by calculating the average value of the costs.Transportation and nutritional costs were evaluated economically by telephone follow-up.Indirect costs: include the loss of labour productivity due to short-term and long-term disability and premature death, and the cost of lost labour for accompanying patients.The labour force population is defined statistically as the population in the 18-50 age group.The number of days of short-term and long-term incapacity and lost labour of companions is calculated for the current year only.Using the human capital method, the indirect economic burden refers to the economic impact on women resulting from pregnancy, which we evaluate based on productivity.According to the Chengdu Municipal Bureau of Statistics, the city's GDP per capita in 2022 will be RMB 98,100 (Chengdu Daily, 2022).According to the Regulations on Population Planning and Maternity in Sichuan Province, the number of days of maternity leave is 158 days (Sichuan Daily, 2022).The methodology takes into account the different levels of productivity of each age group by giving it a certain weighting, with a productivity weighting of 0.75 for ages 15 to 49 (Jia et al., 1999).The approximate indirect cost of pregnancy is 31848.90.Details of the economics evaluation are shown in Tables 2, 3.

Model building process 2.4.1 Data pre-processing
Data pre-screening consists of three steps: 1) removing variables with more than 90% missing data; 2) removing variables with more than 90% of individual values; and 3) removing columns with coefficients of variation less than 0.01.Any variable that meets the above criteria will be considered less informative and will be excluded.

Data partition and dataset building
We utilized 80% of the data for training the model through random splitting, reserving the remaining 20% for testing the model's performance.Inevitably, missing data occurred in practice.In cases where suspicious or missing data, including multiple missing values, were identified in the patient's clinical characteristics section, the patient was contacted by telephone to rectify or supplement the information.To mitigate the adverse effects of data imbalance on prediction performance, we employed two data sampling methods.These included the Synthetic Minority Over-Sampling Technique (SMOTE), which artificially generates new samples from underrepresented categories through interpolation, and the Support Vector Machine Synthetic Minority Over-Sampling Technique (SVMSMOTE), which utilizes Support Vector Machines (SVMs) to identify the samples used for generating new samples.
Feature selection was carried out using two methods.Firstly, the Least Absolute Shrinkage and Selection Operator (Lasso) was employed to penalize and discard unimportant variables (those with coefficients close to zero) by introducing penalty parameters through linear regression with L1 regularization, to evaluate the importance of variables and generate results.Secondly, Ridge regression (Ridge) was used, adding L2 regularized linear regression to limit the direction of change of the model coefficients, thereby minimizing the model coefficients and addressing the overfitting problem of the model.Variable importance was assessed based on the output of Lasso and Ridge (variable importance score), with a high score indicating that the variable improves prediction accuracy.

Model development
By employing these two sampling methods and two feature selection techniques, we derived four datasets from the training set.Subsequently, we applied seven machine learning algorithms to each dataset, resulting in a total of 28 models.These included logistic regression (LR), Random Forest (RF), Content-Based Recommendations (CB), Support Vector Classifier (SVC), Multilayer Perceptron (MLP), Extreme Gradient Boosting (XGB), and K-nearest neighbour (KNN).These algorithms, well-suited for binary classification, were trained and applied to develop predictive models.Integrated algorithms have consistently demonstrated greater effectiveness and stability in predictive modelling compared to individual classification methods.Details regarding the parameters of the models developed using different algorithms are presented in Table 4.

Model explanation
Additionally, SHAP, a Python "model interpretation" package, was utilized to interpret the output of the machine learning models.Inspired by cooperative game theory, SHAP constructs an additive explanatory model considering all features as "contributors."For each prediction sample, the model generates a prediction value, and the SHAP value represents the contribution of each feature in the sample.The impact of each variable on the predictive model was assessed through SHapley's additive interpretation (SHAP).

Model evaluation
The model underwent training in the training set to minimize the loss function.Internal validation was carried out using a 10-fold crossvalidation method across 28 datasets, with 10 independent replicates collected between the metrics.Subsequently, external validation was conducted using the test set.The model's prediction performance was assessed using various metrics, including the area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, F1 score, Brier score, specificity, and area under the precision-recall curve (AUPRC).The best performing model was chosen as the predictive model.Furthermore, a multifactor analysis was executed to elucidate the combined contribution of different variables, sampling methods, screening methods, and machine learning algorithms.The details of the modelling process are depicted in Figure 2.

Sample size validation
The AUC of the best model was employed to evaluate the impact of sample size on model performance.The training set was partitioned into 10 sub-samples, with one sub-sample serving as validation data and the remaining nine sub-samples used for training.Cross-validation was repeated 10 times, each time with a different sub-sample, ensuring that the results were averaged or using other combinations to determine the optimal sample size.

Statistical analysis
Continuous variables were represented as means and standardized tables, while categorical variables were presented as frequencies and percentages.Statistical analysis was conducted using Stats in Python 3.8, and model development was carried out using Sklearn in Python 3.8.

Population demographics
In summary, our study encompassed 565 patients, with 50 explanatory variables selected.These variables comprised four basic characteristic items, 12 pregnancy disorder items, 19 abnormal antibody value items, and 14 pregnancy medication items.Among the patients, adverse pregnancy outcomes were observed in 90 cases (15.93%), with 11 (12.22%)experiencing biochemical pregnancies, 63 (70.00%)The mean age of the patients was 30.1 ± 4.1 years, and the average number of previous abortions was 1.4 ± 1.3.Additionally, 371 patients were found to have autoimmune diseases (65.7%).
Detailed patient demographic and clinical information, serving as independent variables, and pregnancy outcomes, acting as the dependent variable, are presented in Table 5.

Implementation results of traditional methods
In terms of pregnancy risk scoring, a risk threshold of >25% was defined, resulting in 76 True Positives (TP), 94 True Negatives (TN), 381 False Positives (FP), and 14 False Negatives (FN).The accuracy of the traditional method, as indicated by Precision and F1 scores, was 0.3009, 0.1663, and 0.2852, respectively.

Economic evaluation
The actual cost can be calculated using the formula: Therefore, the model value is calculated as Model value is 7,485,865.7.This indicates that pregnancy risk prediction is still likely to be the most cost-effective management method.

Dataset pre-screening
Following data pre-processing criteria screening, 35 variables were removed, leaving 15 variables retained for analysis.These retained variables encompassed age, history of previous miscarriages, rheumatological-immune disease comorbidities, number of underlying medical conditions, number of pregnancy complications, antinuclear antibodies, anti-SSA antibodies, anti-RO52 antibodies, total number of medications during pregnancy, hydroxychloroquine, glucocorticosteroids, aspirin, low-molecular heparin, progesterone, and number of other medications used.The main flow chart of the study.
Frontiers in Pharmacology frontiersin.org

Model evaluation
A total of 28 models underwent validation in the test set, serving as external validation, and model performance metrics were generated.The top four models were identified based on their AUC values.The best-performing model (Model 1) was achieved by employing SMOTE as the sampling method, Ridge as the feature filtering method, and XGboost as the machine learning algorithm.This model exhibited AUC, AUPRC, Accuracy, Precision, Recall, and F1_score specificity values of 0.9209, 0.9395, 0.8469, 0.8778, 0.8061, and 0.8404, respectively.The parameter details of the models developed using different algorithms are presented in Figure 3.

Model explanation
During external validation, the predictive models were evaluated based on their SHAP values, as depicted in Figure 4A.The wider the blue area, the greater the influence of the variable on the result.The top five most influential features were found to be the total number of medications used, the number of other medications used, the use of aspirin during pregnancy, and the use of low molecular weight heparin during pregnancy.Figure 4B represents the values of these characteristics on a spectrum, showcasing the calculated SHAP values for each characteristic in every sample.The variables are ranked in descending order by aggregating the SHAP values for each sample.For instance, a higher value for the total number of medications administered corresponds to a lower SHAP value.

Sample size assessment
The adequacy of the sample size was assessed using the resampling bootstrapping method, with the results displayed in Figure 5.As the size of the sample data in the model increases from small to large, a noticeable upward trend is observed in the AUC value.However, when the sample size falls within the range of 30%-60%, the curve exhibits fluctuations.Once the sample size reaches 60%, the curve tends to flatten.These findings suggest that expanding the sample size may impact the prediction model's performance, and the model's performance could potentially be enhanced with the addition of new samples.

Principal findings
A total of 565 patients with immunologically abnormal pregnancies were included in this study.Utilizing two data sampling methods and two feature screening methods, four datasets were acquired.Subsequently, a total of 28 machine learning models were developed employing seven machine learning algorithms.The best model exhibited an AUC of 0.9209, Accuracy of 0.8469, Precision of 0.8778, Recall of 0.8061, F1 score of 0.8404, and AUPRC of 0.9395.Retrospective validation indicated the model's overall clinical performance to be commendable.Compared to traditional maternal pregnancy risk assessment, the predictive model demonstrated enhanced performance in forecasting the likelihood of miscarriage in patients with immunologically abnormal pregnancies.Furthermore, it proved to be more user-friendly and practical for clinical implementation.An economic assessment revealed cost savings of RMB 7.48 million post-implementation, signifying the model's economic value.The study suggested that risk prediction of miscarriage might be the most cost-effective management approach.Unlike previous studies that primarily focused on live birth or miscarriage probabilities, this research introduced a tool for predicting the likelihood of miscarriage in patients with immunologically abnormal pregnancies.This tool enables clinicians to dynamically assess a patient's risk of miscarriage during different gestational periods, empowering them to tailor treatment based on individualized high-risk predictive factors.This personalized approach can potentially mitigate the risk of miscarriage while offering some economic relief.
In this study, we discovered that the total number of medications used during pregnancy had a positive impact on miscarriage and pregnancy complications.Conversely, the use of aspirin and low molecular heparin was associated with a Variable contribution to the model by SHAP Value.(A)Contribution of each feature value in one sample.(B) Summary of SHAP value of each variable.Note: "Nomu" represents the total number of medications used in pregnancy, "Nod" represents a total number of other medications, "Apc" represents aspirin use in pregnancy, "Lmwh" represents low molecular heparin use in pregnancy, "Gcs" represents glucocorticoid use in pregnancy, and "Prog" represents glucocorticoid use in pregnancy."Hcq" stands for hydroxychloroquine in pregnancy.

FIGURE 5
The impact of sample data size on model performances (mean ± SD).
reduced risk of miscarriage.Previous studies have highlighted that comorbidities during pregnancy, such as pre-eclampsia, severe vomiting, abnormal thyroid function, cholestasis, gestational diabetes mellitus, lupus nephritis, and high platelet counts, are linked to an increased risk of miscarriage (Smyth et al., 2010;Zhang et al., 2017;Turgut et al., 2022).Consequently, a higher number of comorbidities during pregnancy typically correlates with a greater risk of miscarriage.However, our study yielded a contrary conclusion, with the number of pregnancy complications showing a negative association with the risk of miscarriage.We acknowledge that a higher number of comorbidities in pregnancy is more risky in practical terms, our study came to the opposite conclusion, with the number of pregnancy complications negatively correlating with the risk of miscarriage.After discussion with clinical experts, patients with more pregnancy comorbidities received more treatment and attention during subsequent pregnancies, more frequent pregnancy follow-up, and more timely medication monitoring, which may have somewhat altered the course of pregnancy outcomes.In addition, the interactions between the diseases we analysed and the limited sample size of the study may have led to potential overfitting of the data, which could have influenced the bias given to the prediction of the outcome.Previous studies have demonstrated a connection between the breakdown of autoimmune disease tolerance and alterations in reproductive health, thereby impacting the clinical wellbeing of patients (Nielsen and Christiansen, 2005;Bowman et al., 2015).In our current study, we observed no significant variances in the indicators and outcomes between patients with autoimmune disease pregnancies and those with pregnancies characterized solely by abnormal autoantibodies through intergroup evaluation.Consequently, we inferred that the quantity of administered medications was positively associated solely with the activity of autoimmune disease in the patient.A greater number of medications signified a heightened autoimmune state during pregnancy and an increased likelihood of miscarriage.
Huang et al., 2021 utilized a machine algorithm to combine reproductive immune parameters and classify patients with recurrent pregnancy loss (RPL) into different risk categories, aiming to create a model for predicting pregnancy outcomes at various gestational periods based on genetic markers or common indicators.However, there was considerable variation in the performance of the prediction models.On the other hand, Shi et al., 2022 successfully employed adaptive simulation modelling algorithms to utilize clinical data from patients with recurrent spontaneous abortion (RSA), vitamin D levels, and thyroid function to explore optimal parameters and sub-features during support vector machine (SVM) evolution.However, the study had a relatively small sample size (n = 136).In comparison to our present study, Shi et al. reported superior predictive performance with an accuracy of 92.998%, Matthews correlation coefficient (MCC) of 0.92425, sensitivity of 93.286%, and specificity of 93.064%.Nevertheless, our study encompassed a larger sample size, employed a wider array of algorithms, and conducted retrospective and external validation of the model.Additionally, the integration algorithm utilized in our study aggregated the outputs of the five best models in the training model (evaluated based on the area under the curve, AUC) using the voting principle, resulting in improved predictive model performance.
In studies about the prediction of miscarriage risk in women with immunologically abnormal pregnancies, various methods have been employed to investigate high-risk factors in pregnant women.However, there are still shortcomings in clinical practice, model performance evaluation, and the practical application of prediction tools, including application complexity (Bruno et al., 2020;Benner et al., 2022;Huang et al., 2022;Macrohon et al., 2022;Hao et al., 2023;Luo and Zhou, 2023).Additionally, there is a lack of studies conducting retrospective or prospective validation of pregnancy risk prediction models and evaluating the economic aspects of these models.The findings indicate that the model holds significant clinical value.The prediction model in this study was developed based on prior research, considering the practical implementation in each healthcare institution, patient cooperation, and the scientific validity of the model's predictive outcomes, thereby enhancing healthcare efficiency.
In conclusion, the ultimate aim of this study is to predict the risk of miscarriage in patients with immunoabnormal pregnancies and to provide assistance to clinicians in evaluating the risk of miscarriage.However, the model may not apply to the normal population because the inclusion of the population and the inclusion of the characteristics of the model mainly focus on patients with immunoabnormal pregnancies, and the parameters of the normal population in the screening of the characteristics of the normal population already did not meet the conditions of the screening were excluded, and in the subsequent steps of the data fitting and other steps, the data of the immunoabnormal population were even far from the data of the immunoabnormal pregnancy population.In addition, the features we included in the initial session and the subsequent adjustment of parameters were designed to predict risk, to prompt clinicians to judge the risk of a patient's pregnancy and to choose whether or not to intervene in that patient, and were not designed to predict the effectiveness of the intervention.Secondly, for patients with immunoabnormal pregnancies that are assessed as high risk by this model, clinicians can intervene with aggressive pharmacological treatment and enhanced monitoring (on the one hand, monitoring for foetal developmental abnormalities for early detection and treatment, and on the other hand, monitoring for adverse drug reactions to ensure efficacy and safety of the medication), which can reduce the risk of abortion and thus save costs.

Limitations
This study has several limitations that need to be considered.Firstly, the entire dataset used in this study was obtained from medical centers, which may have limitations in terms of patient data.In addition, because this was a retrospective study with a relatively large patient base, dispersed residences, and most of them were not followed up in the hospital for a long period, the time of medication intervention and discontinuation were relatively difficult to count in detail and accurately, which may have biased the model performance.Therefore, further external validation using multicenter research data is necessary before clinical implementation.

Conclusion
This study represents a significant milestone in identifying patients at high and low risk of miscarriage during the treatment of immunoabnormal pregnancies using a model.The model is both suitable and easy to apply across various healthcare settings.We are in the process of developing a system integrated with a multidisciplinary immunopregnancy clinic, to provide clinicians with a more convenient and precise risk assessment tool.This approach aims to enhance access to accurate predictive models characterized by valuable predictors in a clinical setting.Ultimately, this will assist clinicians in diagnosing the risk of patients with immunerelated pregnancies and initiating timely preventive treatment, thereby contributing to an improved prognosis for patients with immune-related pregnancies.

+
Actual cost Number of abortions * Cost of abortions ( ) Number of live births * Cost of live births ( ) prevention cost2 + abortion cost + FP * prevention cost1 + live birth cost Model value Actual cost − model cost Actual cost is 29,523,520.8.Based on the model, the values obtained are TN = 250, FN = 55, TP = 228, and FP = 32.The model cost can be calculated as follows: Model cost is 22,037,655.1.

FIGURE 3
FIGURE 3 Summary of model performance: (A) Area Under the Curve (AUC) results for the top five models.(B) Precision-Recall (P-R) results for the top five models.(C) Decision Curve Analysis (DCA) results for the top five models.(D) Calibration curve results for the top five models.(E) Detailed performance metrics for the top five models.Note: SMO_RD_XGB means sampled by SMOTE, Ridge for feature selection, XGBoost as a model constructed by machine learning algorithm, SSMO_RD_MLP means sampled by SVMSMOTE, Ridge for feature selection, XGBoost as a model constructed by machine learning algorithm, SMO_RD_LR means sampled by SMOTE sampling, Ridge for feature selection, logistic regression as the model constructed by the machine learning algorithm, SMO_LA_LR meaning sampling via SMOTE, Lasso for feature selection, Logistic Regression as the model constructed by the machine learning algorithm, SSMO_LA_LR meaning Sampling through SVMSMOTE, Lasso for feature selection, Logistic Regression as a model constructed by machine learning algorithm.

TABLE 1
Risk assessment rules.

TABLE 2
Details of the cost of each type of disease.

TABLE 4
The detailed information of 4 datasets.

TABLE 5
The detailed information of participants.

TABLE 5 (
Continued) The detailed information of participants.

TABLE 5 (
Continued) The detailed information of participants.

TABLE 5 (
Continued) The detailed information of participants.

TABLE 5 (
Continued) The detailed information of participants.Represents important variables for modeling.The table is based on descriptive statistics of basic patient information, where continuous variables we analysed by Mean±SD, Median, Minimum, and maximum; dichotomous variables we analysed by percentage of Yes, No; multivalued ordinal variables in this study were only related to antibody value abnormalities, which were also expressed by percentage of each.