Development and Validation of Multi-Stage Prediction Models for Pre-eclampsia: A Retrospective Cohort Study on Chinese Women

Objective The aim of this study is to develop multistage prediction models for pre-eclampsia (PE) covering almost the entire pregnancy period based on routine antenatal measurements and to propose a risk screening strategy. Methods This was a retrospective cohort study that included 20582 singleton pregnant women with the last menstruation between January 1, 2013 and December 31, 2019. Of the 20582 women, 717 (3.48%) developed pre-eclampsia, including 46 (0.22%) with early-onset pre-eclampsia and 119 (0.58%) preterm pre-eclampsia. We randomly divided the dataset into the training set (N = 15665), the testing set (N = 3917), and the validation set (N = 1000). Least Absolute Shrinkage And Selection Operator (LASSO) was used to do variable selection from demographic characteristics, blood pressure, blood routine examination and biochemical tests. Logistic regression was used to develop prediction models at eight periods: 5–10 weeks, 11–13 weeks, 14–18 weeks, 19–23 weeks, 24–27 weeks, 28–31 weeks, 32–35 weeks, and 36–39 weeks of gestation. We calculated the AUROC (Area Under the Receiver Operating Characteristic Curve) on the test set and validated the screening strategy on the validation set. Results We found that uric acid tested from 5–10 weeks of gestation, platelets tested at 18–23 and 24–31 weeks of gestation, and alkaline phosphatase tested at 28–31, 32–35 and 36–39 weeks of gestation can further improve the prediction performance of models. The AUROC of the optimal prediction models on the test set gradually increased from 0.71 at 5–10 weeks to 0.80 at 24–27 weeks, and then gradually increased to 0.95 at 36–39 weeks of gestation. At sensitivity level of 0.98, our screening strategy can identify about 94.8% of women who will develop pre-eclampsia and reduce about 40% of the healthy women to be screened by 28–31 weeks of pregnancy. Conclusion We developed multistage prediction models and a risk screening strategy, biomarkers of which were part of routine test items and did not need extra costs. The prediction window has been advanced to 5–10 weeks, which has allowed time for aspirin intervention and other means for PE high-risk groups.


INTRODUCTION
Pre-eclampsia is a pregnancy related syndrome defined as newly occurred hypertension at or after 20 weeks of gestation, accompanied by proteinuria or other organs damage (1). The incidence of pre-eclampsia worldwide is 0.2-9.2% (2). A study including 112,386 pregnant women in China showed that the incidence of pre-eclampsia in China was approximately 2.87% in 2011 (3). Every year, approximately 76 thousands women and half million infants died from pre-eclampsia worldwide. Pre-eclampsia can have adverse effects on pregnant women, for instance, causing damage to the liver and kidney systems (4). If left untreated, it can lead to pulmonary edema, eclampsia, brain damage, and even maternal death (5)(6)(7). Pregnant women and their children affected by pre-eclampsia are at increased risk for long-term cardiovascular and chronic diseases, including chronic hypertension, stroke, metabolic syndrome, and cognitive impairment (8)(9)(10)(11)(12)(13)(14)(15)(16).
In the first trimester, taking pharmacologic interventions (e.g., aspirin) for high-risk pregnant women can reduce the risk of early-onset and preterm pre-eclampsia (17,18). It can reduce the incidence of adverse perinatal outcomes by intensive monitoring and selecting the appropriate time of delivery during the second or third trimester (19). Early identification of high-risk groups will help to take interventions in advance. Therefore, it is of great significance to develop risk prediction models for pre-eclampsia.
However, pre-eclampsia related prediction models have been developed mainly in developed countries (20). In recent years, there were some studies developing prediction models of preeclampsia based on Chinese population (21)(22)(23)(24)(25)(26). These studies were mainly based on the hospital electronic medical data system and carried out in eastern China (three in Shanghai and one in Tianjin), screening pregnant women in the first and second trimester of pregnancy. However, these studies primarily focused on specific high-risk groups or used expensive biomarkers beyond the scope of routine testing, which limits their generalizability. In addition, these studies did not consider pre-eclampsia subtypes (23)(24)(25)(26), possible bias caused by the process of variable selection (23,24,26), and insufficient number of outcome events (21)(22)(23)25).
In this study, based on 20582 pregnant women in Beijing, we aimed to develop multistage prediction models covering almost the entire pregnancy period by selecting valuable predictors from routine antenatal measurements, and a risk screening strategy based on the optimal models.

Study Population
This was a Peking University Retrospective Birth Cohort in Tongzhou based on the hospital information system, including singleton pregnant women having prenatal care, delivery and outcome records, with last menstruation between 1 January 2013 and 31 December 2019 and with delivery gestational weeks no less than 28 weeks in Tongzhou Maternal and Child Health Care Hospital of Beijing. We further selected pregnant women with the latest record of deliveries, and with at least one, two and two antenatal examination records in the first, second, and third trimester, respectively. We excluded pregnant women using assisted reproductive technology, having systemic lupus erythematosus, or chronic hypertension, or gestational hypertension without pre-eclampsia. Also, we excluded pregnant women who lacked blood pressure measurements, blood routine examination and biochemical tests records at 5-10 weeks of gestation. Finally, we included 20,582 women in our study. The inclusion and exclusion criteria can be seen in Figure 1.
The studies involving human participants were reviewed and approved by Institutional Review Board of Peking University Health Science Center (No. IRB00001052-21023).

Maternal Characteristics, Medical History and Biomarkers
We extracted maternal characteristics, medical history, blood pressure measurements, blood routine examination and biochemical tests from the electronic data system of Tongzhou Maternal and Child Health Hospital of Beijing. Maternal characteristics included maternal height (cm), pre-pregnancy weight (kg), pre-pregnancy BMI, pre-gestational diabetes mellitus, ethnicity, parity, gravidity, abortion history, family history of hypertension, family history of diabetes, maternal age, husband age. Blood pressure measurement records included systolic blood pressure (SBP) and diastolic blood pressure (DBP) measurement records and we calculated mean arterial pressure (MAP) by MAP = (SBP+2×DBP)/3. Blood routine examination and biochemical tests included 48 biomarkers,including hemoglobin, mean red blood cell volume, platelet, red blood cell, white blood cell, alanine aminotransferase, aspartate aminotransferase, urea nitrogen, and calcium, etc. Lists of all biomarkers were shown in Supplementary Tables 1a,b.

Maternal Outcomes
Based on the Guidelines on Diagnoses and Treatments of Hypertensive Disorders in Pregnancy by Chinese Society of Obstetrics and Gynecology, pre-eclampsia is defined as hypertension first appeared after 20 weeks of pregnancy, systolic blood pressure ≥140 and/or diastolic blood pressure ≥90 mmHg, with proteinuria or with any of the following organs or systems involved: heart, lung, liver, kidney and other important organs, or abnormal changes in blood system, digestive system and nervous system, placenta and fetus involved, etc (27). According to the time of diagnosis, we divided pre-eclampsia into earlyonset pre-eclampsia (<34+0 weeks of gestation), preterm preeclampsia (<37+0 weeks of gestation), late-onset pre-eclampsia (≥34+0 weeks of gestation) and term pre-eclampsia (≥37+0 weeks of gestation).

Preprocessing Variables and Imputing Missing Values
The pregnant women's height, pre pregnancy weight, pre pregnancy BMI, pregnant women's age and husband's age were used as continuous variables. We divided pre-gestational diabetes mellitus, family history of hypertension, abortion history, family history of diabetes into yes or no and we divided ethnicity into Han nationality or other nationalities. Parity was divided into primiparous or multiparous, and gravidity was divided into the first pregnancy, the second pregnancy, the third and more pregnancies. We divided the pregnancy season into spring, summer, autumn and winter according to the date of the last menstruation after verification.
We also kept blood pressure, routine blood and biochemical tests as continuous variables. If pregnant women had more than one measurement record during a certain period, we took the average value of these records. Therefore, regardless of missing values, each pregnant woman should have eight records of mean arterial pressure and biomarker measurements throughout pregnancy. There was one measurement record in each of the eight time periods. In particular, pregnant women rarely did biochemical tests in the second trimester of pregnancy. Therefore, there was no such test value in the 14-18 weeks, 19-23 weeks, and 24-27 weeks, so the variable was marked as missing.
The missing rate of demographic characteristic variables was <2%. We used the median to impute the missing values of continuous variables and the mode to impute the missing values of categorical variables. For the mean arterial pressure measurements, blood routine examination and biochemical tests (except the second trimester of pregnancy), the missing values were imputed by the values in the previous period. For example, missing values at 32-35 weeks of gestation were imputed by the measurements at 28-31 weeks of gestation. A pregnant woman has a prenatal examination at 32-35 weeks of pregnancy, but for some reason, she lacked a blood routine examination. Then, this missing value can be replaced by the value tested during 28-31 weeks. In clinical application, this method is easy for health professionals to understand and use it. If a pregnant woman is unable to get biochemical indicators for some reasons, this method can be used to quickly impute the missing value. In addition, by using previous measurements, this method can avoid reverse causality. The missing data imputed by the last observation carried forward (LOCF) method was used in many clinical antidepressant trials and other high level clinical studies (29)(30)(31).

Development and Validation of Prediction Model
We sampled from women with and without pre-eclampsia to form a training set (N = 15665), a test set (N = 3917) and a validation set (N = 1000). The incidence rate of pre-eclampsia was almost the same in each data set (3.5%). We developed prediction models on the training set and verify the model on the test set. Based on the calculation results from the training set and test set, a screening strategy was formed and tested on the validation set. We did univariate analysis on demographic characteristics to compare the differences between each group and the control group (women without any types of pre-eclampsia), for example, comparison of women with early-onset pre-eclampsia and without any types of pre-eclampsia. We used Chi-test for categorical variables (Fisher exact test when the counts in some cells was fewer than five) and Kruskal-Wallis Rank Sum test for continuous variables.
We developed prediction models by using Logistic Regression on the imputed training set and calculated the area under receiver operator curve (AUROC) on the test set to reflect the ability of prediction. We used LASSO technique to select the variables with the highest priority and the second priority to join in the model according to the order in which the coefficients enter into the model along the solution path.
Least Absolute Shrinkage And Selection Operator (LASSO) is a shrinking technique used to do variable selection. It has been used to do variable selection and avoid overfitting for development of pre-eclampsia prediction models in previous studies (25,32). LASSO was described as follows. For more details, we recommended readers to related materials (33,34).
The objective of LASSO is to find the proper coefficients to minimize the loss function: where X i = (x i1 , x i2 , . . . . . . , x iN ) is the vector of observed variables of the subject i, y i is the outcome of the subject i, y i ǫ {0, 1}, y i = 1 if the subject i developed pre-eclampsia and y i = 0 if the subject i did not develop pre-eclampsia. |β j | is the absolute value of the coefficient of the variable x ij . λ is a tuning parameter: when it is big enough, all coefficients are zeros and the coefficients gradually turn to non-zero with the decrease of λ. l y i X i is the log-likelihood function of the logistic regression, given by the following equation: We determined a basic variable set, which was forced into models, and then selected the variables that can improve AUROC on the test set from the demographical variables, mean arterial pressure and biochemical markers in proper order in four stages. We developed predictions on each period (5-10 weeks, 11-13 weeks, 14-18 weeks, 19-23 weeks, 24-27 weeks, 28-31 weeks, 32-35 weeks and 36-39 weeks). The four stages of developing models for each period were described as follows.

Stage 1
As reported in previous studies, pre-pregnancy BMI, pregestational diabetes mellitus (PGDM), parity, family history of hypertension and maternal age were highly associated with preeclampsia, thus these five variables were forced into models and denoted as "basic variables" (1,35,36). Then, we added other demographical variables and medical history into the model, including maternal height, pre-pregnancy weight (kg), ethnicity, gravidity, abortion history, husband age (years), and pregnancy season. Last, we selected the optimal set of variables with the highest AUROC denoted as Stage1-Optimal Variables.

Stage 2
Based on Stage1-Optimal Variables, we added MAPs measured at baseline (5-10 weeks) and the current period and then tested whether MAPs could improve the ability of prediction. For example, for women took prenatal examinations during 28-31 weeks, we added MAPs measured at 5-10 weeks and 28-31 weeks into the model. The optimal set of variables with the highest AUROC was denoted as Stage2-Optimal Variables.

Stage 3
Based on Stage2-Optimal Variables, we added biomarkers measured at baseline (5-10 weeks) and the current period. By using LASSO, we selected the optimal set of variables with the highest AUROC denoted as Stage3-Optimal Variables. As mentioned before, we did not add biochemical tests measured in the second trimester, because it was rarely tested in the hospital.

Stage 4
At the current period, we further added all MAPs measured in previous periods based on Stage2-Optimal Variables and added all biomarkers measured in previous periods based on Stage3-Optimal Variables to check whether adding other variables could improve the AUROC. We excluded women who had used aspirin during pregnancy and conducted sensitivity analysis by redeveloping optimal prediction models to test whether the valuable biomarkers we found could still improve the predictive ability of models (37).
After developing the best prediction model at each stage, we selected the appropriate risk cutoff values to make the sensitivity of the prediction models reach 0.95, 0.96, 0.97, 0.98, and 0.99 in each period. Then, we screened the population in the validation set with different sensitivities to form the best screening strategy.
We used R software (38) to do all calculations. The "glmnet" package in R was used to fit logistic regressions via LASSO (39). Details about how to perform it in R are available at vignettes of the "glmnet" package. Table 1 showed demographical characteristics of women with and without pre-eclampsia. Compared with those without preeclampsia, women with pre-eclampsia had higher body weight P-values were calculated when the reference group is the one for women without any pre-eclampsia.

Comparison of Characteristics Between Women With and Without Pre-eclampsia
Frontiers in Public Health | www.frontiersin.org and body mass index (BMI), and had higher proportion of pregestational diabetes, primipara, first pregnancy, family history of hypertension, family history of diabetes, pregnancy in spring and winter, and gestational diabetes. The differences between the two groups were statistically significant. Similar results were seen in women with early-onset and preterm pre-eclampsia. However, there were not obvious difference in the proportion of pre-gestational diabetes mellitus, family history of hypertension, family history of diabetes, and pregnancy season between women with and without early-onset pre-eclampsia, between women with and without preterm pre-eclampsia. For early-onset and preterm pre-eclampsia, AUROC gradually increased from about 0.73 to 0.89 and 0.80 at 28-31 weeks of gestation, respectively. Generally, based on the Stage 1-Optimal Variables, the addition of MAPs increased AUROC for all preeclampsia and its subtypes, and the AUROC was higher for earlyonset and preterm preeclampsia than that for late-onset and term pre-eclampsia. The Stage 2-Optimal Variables included Stage 1-Optimal Variables and MAPs measured at baseline (5-10 weeks) and at the current prenatal examination period.

Results of Stage 3
As can be seen in Tables 2A,B, based on Stage 2-Optimal Variables, we found that the AUROC of all types of preeclampsia can be slightly improved by adding uric acid test in 5-10 weeks and 11-13 weeks of gestation. At 18-23 weeks and 24-27 weeks of gestation, the addition of platelets tested at baseline and the current period can improve AUROC for all types of pre-eclampsia (0.79-0.86). Also, we found that adding uric acid and alkaline phosphatase tested in the current period at 28-31 weeks, 32-35 weeks and 36-39 weeks of pregnancy can significantly improve the prediction ability for all types of pre-eclampsia, especially for pre-eclampsia occurred after 31 weeks (including late-onset and term pre-eclampsia). For all preeclampsia, the AUROC increased from approximately 0.78 to 0.86 (28-

Results of Stage 4
As can be seen in Supplementary Tables 3a,b, we found that adding the MAPs measured in all previous periods to Stage 2-Optimal Variables and adding biomarkers tested in all previous periods to Stage 3-Optimal Variables almost did not improve AUROC. In order to prevent over fitting, we chose Stage 3-Optimal Variables to develop our final predictions models.

Results of Sensitivity Analysis
There were 104 and 28 women using aspirin in the training set (N = 15665) and test set (N = 3917), respectively. For the optimal prediction models in each stage, the results were very similar in the datasets with and without women who had used aspirin, which were shown in Supplementary Tables 4a-e. Although a few women had used aspirin during pregnancy, it was less likely to affect our results.

Best Prediction Model in Each Period
The Best Prediction Models in Each Period Are Shown as Below The risk score for pre-eclampsia of each woman is calculated by odds/(1+odds), where odds = e Y .

Screening Strategy
The screening strategy was shown in Table 3 and Figure 2. We selected the 0.01, 0.011, 0.008, 0.009, 0.01, 0.003, 0.001, and 0.002 as risk cutoff values for each stage to make the sensitivity reach 0.98 (Table 3). Figure 2 showed the screening results of the validation set using the optimal prediction models with the sensitivity of 0.98. From 5-10 weeks of gestation, 1000 women participated in prenatal examination. Risk scores were calculated by the optimal prediction models for each pregnant woman participating in prenatal examination. Pregnant women exceeding the risk cutoff value were classified as high-risk group, and the risk assessment will be carried out in the next prenatal examination period; Otherwise, they were classified as the lowrisk group and will not be examined in subsequent prenatal examinations. According to this method, after the prenatal examination of 14-18 weeks of pregnancy, 796 women were divided into the high-risk group, of which 35 will develop preeclampsia, and 204 women were divided into the low-risk group, of which 0 will develop pre-eclampsia. The number of women who need to continue screening decreased by about 20%. After the prenatal examination at 32-35 weeks of gestation, 556 women were divided into high-risk group, of which 28 will develop preeclampsia; 434 women were divided into the low-risk group, of which 3 will develop pre-eclampsia. The number of women who need continue screening decreased by about 45%.

Summary
In this study, we established multi-stage pre-eclampsia risk prediction models throughout pregnancy based on 20582 pregnant women in China. We sequentially select valuable variables from demographical characteristics, mean arterial pressure, blood routine examination, and biochemical biomarkers to the prediction models. We found that uric acid tested from 5-10 weeks of gestation, platelets tested at 18-23 and 24-31 weeks of gestation, and alkaline phosphatase tested at 28-31, 32-35 and 36-39 weeks of gestation can further improve the prediction performance of models. The AUROC of the optimal prediction models on the test set gradually increased from 0.71 at 5-10 weeks to 0.80 at 24-27 weeks, and then gradually increased to 0.95 at 36-39 weeks of gestation. Based on the optimal prediction models, we established a multi-stage screening strategy from 5-10 weeks of pregnancy, which can add about 94.3% of women who will develop pre-eclampsia and reduce about 40% of the healthy women to be screened by 28-31 weeks of pregnancy.

Comparison With Previous Studies
Some studies used demographic characteristics and medical history to build prediction models. The AUROC of prediction models developed by David Wright et al. for all pre-eclampsia, preterm pre-eclampsia, and early-onset pre-eclampsia were 0.76, 0.79, and 0.81, respectively, which were higher than 0.68, 0.74, and 0.73 in our study (40). Similar results were seen in models given by LCY Poon et al. (41). This may be because more high-risk groups were included in these two studies,   (44).
We found that uric acid tested from 5-10 weeks of gestation, platelets tested at 18-23 and 24-31 weeks of gestation, and alkaline phosphatase tested at 28-31, 32-35 and 36-39 weeks of gestation can further improve the prediction performance of models. There are some potential explanations for uric acid, platelets, and alkaline phosphatase to improve prediction performance. Platelets and uric acid have predictive values for pre-eclampsia. A systematic review of 69 studies showed that pregnant women with pre-eclampsia had a higher average platelet volume than the normal (45). Placental related diseases are associated with the transitional activation of maternal platelets, such as preeclampsia (46). Therefore, the detection of platelet function and over activation has a certain predictive value for preeclampsia (47). There was a strong association between uric acid and pre-eclampsia (48). The increased blood pressure can lead to organ damage, such as liver or kidney damage. Renal injury may increase the level of serum uric acid, and then increase the risk of pre-eclampsia (49,50). It was reported that pregnant women with subsequent preeclampsia had elevated uric acid levels as early as 10 weeks of gestation (51). This phenomenon is basically consistent with the findings of this study. For alkaline phosphatase, some studies showed that the level of alkaline phosphatase was higher in women with pre-eclampsia than that in normal women (52,53). Alkaline phosphatase may be related to pre-eclampsia, but it is necessary to further study its predictive value.

Comparison With Chinese Population-Based Research
In recent years, there were some studies developing prediction models of pre-eclampsia based on Chinese population (21)(22)(23)(24)(25)(26). These studies were mainly based on the hospital electronic medical record data system and mainly carried out in eastern China (three in Shanghai, one in Tianjin and one in Shanxi), screening pregnant women in the first and second trimester of pregnancy. Two studies predicted early-onset pre-eclampsia where the detection rate was between 40.7%−73.2% with the false positive rate of 10%. Four studies predicted all pre-eclampsia (undifferentiated subtypes), with AUROC from 0.86 to 0.98.
However, the populations and predictors used in these studies were quite different. Jiang et al. used demographic characteristics and biochemical markers (a complete blood count, serum albumin, serum uric acid, 24-h urinary protein, antinuclear antibodies, anti-double-stranded DNA antibody, antiphospholipid antibodies, etc.) to achieve an AUROC of 0.975 for pregnant women with systemic lupus erythematosus (23). Chen et al. predicted the risk of early-onset preeclampsia for pregnant women with twin pregnancy, where the AUROC was 0.82 (95% CI: 0.76-0.88), and the detection rate was 40.7% (false positive 10%) (22). Wang et al. developed a prediction model for pre-eclampsia using 31 blood flow related parameters such as vascularization index (VI), blood flow index (FI) and vascularization blood flow index (VFI) related to uterus and placenta, and the AUROC reached 0.877 (25). However, these studies aimed at specific high-risk groups, or used expensive indicators that were not within the scope of routine testing, so it may not be conducive to promotion. In addition, these studies also have other limitations: not consider pre-eclampsia subtypes (23)(24)(25)(26), possible bias caused by variable screening process (23,24,26), and insufficient number of outcome events (21)(22)(23)25).

Compared With the Screening Strategies of Previous Studies
Wallis et al. proposed a multi-stage screening strategy based on the British women: the first screening was conducted at < 18 weeks of gestation, and the second to fifth screening was conducted by using mean arterial pressure at 20, 25, 28, and 31 weeks of gestation. We and Wallis et al. tested the screening strategy based on the same number of pregnant women. Compared with the screening strategy proposed by Wallis et al., we advanced the prediction window as early as 5-10 weeks. After completing the screening at 28-31 weeks, we reduced the number of people to be screened by about 45%, while Wallis et al. reduced by about 35%. In particular, we extended the screening period to 36-39 weeks, which may effectively screen term pre-eclampsia and even later pre-eclampsia. Although the two strategies target different populations and it is necessary to compare the screening strategies of the two sides in the same population in the future, our screening strategy has the potential for improvement.

Advantages, Clinical Value and Limitations of the Study
We have several advantages in our study. First, our sample size and the number of outcomes were larger than that in previous prediction models based on Chinese population. Second, compared with the stepwise regression used in many studies, we used LASSO for variable selection, so it is less likely to over fitting, and we tested our model on the randomly divided test set and validation set. Third, we found that on the basis of demographic characteristics, medical history, and mean arterial pressure, the addition of uric acid can improve the prediction ability of the model from the first trimester; the addition of platelet and alkaline phosphatase can improve the prediction ability of the model in the second and third trimester, respectively. We did not test the predictive value of uric acid in the second trimester, because pregnant women rarely did biochemical tests in this period. We suggest that pregnant women in the second trimester do additional uric acid tests in the future. In particular, at 28-31 weeks, 32-35 weeks and 36-39 weeks of pregnancy, the AUROC of the prediction model for all pre-eclampsia reached 0.86, 0.89 and 0.95, respectively, which has the value in predicting late-onset and term pre-eclampsia. Fourth, to our knowledge, our study is the first study to develop multi-stage prediction models and propose a screening scheme based on the Chinese women. The screening time covers almost the entire pregnancy period: the earliest to 5-10 weeks of pregnancy and the latest to 36-39 weeks of pregnancy.
Several clinical values can be seen in our study. First, we found that uric acid, platelets and alkaline phosphatase can improve the predictive ability of the model, which are part of the routine test items and it does not need extra costs. Second, a screening system has been developed for the Chinese population and policy environment, and the prediction window has been advanced to 5-10 weeks, which allows time for aspirin intervention and other means for high-risk groups. There were strong evidences supporting that aspirin is the only drug preventing pre-eclampsia (54). Some guidelines recommended women with high or moderate clinical risk factors to use aspirin starting at the first or second trimester (1,35,36,55). ACOG recommended using aspirin starting between 12 and 28 weeks of gestation, ideally before 16 weeks, and ISSHP also supported it was ideally used before 16 weeks (1, 55). A clinical trial involving 14361 women suggested that using low-dose aspirin starting at 6-13 weeks of gestation could reduce risk of preterm preeclampsia and perinatal mortality in low-income and middleincome countries (56). The study advanced the recommended time of aspirin use to 6 weeks of gestation. Third, for women identified at high risks during the second or third trimester, and missing the best window for using aspirin, clinicians could take intensive surveillance or hospitalization, and carefully select the best delivery time for pregnant women. Some previous studies developed prediction models in the second or third trimester (43,(57)(58)(59)(60). It is useful to develop multi-stage prediction models covering the first, second, and third trimester. Forth, in the third trimester of pregnancy, the prediction ability of late-onset and term pre-eclampsia was improved by adding uric acid and alkaline phosphatase. Although late-onset and term preeclampsia have a lower hazard than early-onset, preterm preeclampsia, the former two subtypes have a higher incidence rate. Doctors can treat pre-eclampsia by delivery, however, the early birth of newborns may still be detrimental to their future growth.
However, the study has several limitations. First, the study had a large sample size and a large number of outcomes, but the incidence rate of early-onset pre-eclampsia was small, because the incidence rate of early-onset pre-eclampsia was particularly low. We used LASSO for variable selection, so it was less likely to cause over fitting. Second, there were missing values in biomarkers, but it was in line with the actual clinical situation. We used a simple imputation method, that is, using the previous tested values to impute the missing values. This method is easy for health professionals to use. Third, we did not conduct external validation. But we conducted internal validation on the test set and validation set by randomly dividing the data set. Moreover, we used LASSO to select variables, which reduced the possibility of over fitting. In the future, our model awaits validation on other data sets.

DATA AVAILABILITY STATEMENT
The data analyzed in this study is subject to the following licenses/restrictions the data that support the findings of this study are available from Tongzhou Maternal and Child Health Care Hospital of Beijing but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Tongzhou Maternal and Child Health Care Hospital of Beijing. Requests to access these datasets should be directed to Haijun Wang, whjun@pku.edu.cn.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Institutional Review Board of Peking University Health Science Center (No. IRB00001052-21023). Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.