Non-invasive prediction of preeclampsia using the maternal plasma cell-free DNA profile and clinical risk factors

Background Preeclampsia (PE) is a pregnancy complication defined by new onset hypertension and proteinuria or other maternal organ damage after 20 weeks of gestation. Although non-invasive prenatal testing (NIPT) has been widely used to detect fetal chromosomal abnormalities during pregnancy, its performance in combination with maternal risk factors to screen for PE has not been extensively validated. Our aim was to develop and validate classifiers that predict early- or late-onset PE using the maternal plasma cell-free DNA (cfDNA) profile and clinical risk factors. Methods We retrospectively collected and analyzed NIPT data of 2,727 pregnant women aged 24–45 years from four hospitals in China, which had previously been used to screen for fetal aneuploidy at 12 + 0 ~ 22 + 6 weeks of gestation. According to the diagnostic criteria for PE and the time of diagnosis (34 weeks of gestation), a total of 143 early-, 580 late-onset PE samples and 2,004 healthy controls were included. The wilcoxon rank sum test was used to identify the cfDNA profile for PE prediction. The Fisher’s exact test and Mann–Whitney U-test were used to compare categorical and continuous variables of clinical risk factors between PE samples and healthy controls, respectively. Machine learning methods were performed to develop and validate PE classifiers based on the cfDNA profile and clinical risk factors. Results By using NIPT data to analyze cfDNA coverages in promoter regions, we found the cfDNA profile, which was differential cfDNA coverages in gene promoter regions between PE and healthy controls, could be used to predict early- and late-onset PE. Maternal age, body mass index, parity, past medical histories and method of conception were significantly differential between PE and healthy pregnant women. With a false positive rate of 10%, the classifiers based on the combination of the cfDNA profile and clinical risk factors predicted early- and late-onset PE in four datasets with an average accuracy of 89 and 80% and an average sensitivity of 63 and 48%, respectively. Conclusion Incorporating cfDNA profiles in classifiers might reduce performance variations in PE models based only on clinical risk factors, potentially expanding the application of NIPT in PE screening in the future.


Introduction
Preeclampsia (PE) is a pregnancy complication defined by new onset hypertension and proteinuria or other maternal organ damage (e.g., kidney, liver or brain) after 20 weeks of gestation (1)(2)(3)(4).The overall prevalence of PE is about 2-8% (2,5).PE and its related complications of hypertensive disorders are the second leading cause of maternal mortality (6,7).Depending on the time of diagnosis (34 weeks of gestation), PE is divided into early-and late-onset PE (8)(9)(10).As the disease progresses, the main treatment for PE remains the termination of pregnancy.However, premature delivery can lead to adverse consequences for the fetus.Previously clinical trials showed using low-dose aspirin in early pregnancy may reduce the risk of PE in pregnant women at high risk (11).Therefore, there is a need for an early and accurate PE prediction method, especially for early-onset PE.
To date, the history of PE, chronic hypertension and diabetes mellitus, older age, higher body mass index (BMI) and in vitro fertilization (IVF) are known to increase the risk of PE in pregnant women (2,12,13).However, models based on these clinical risk factors alone usually showed low accuracy or high false positives (14, 15).The Fetal Medicine Foundation (FMF) has proposed a combined predictive model incorporating mean arterial pressure (MAP), uterine artery pulsatility index (UtA-PI), serum placental growth factor (PLGF) and pregnancy-associated plasma protein-A (PAPPA) for the evaluation of PE risk in the first trimester (2,(16)(17)(18)(19).With a false positive rate (FPR) of 10%, the detection rates were 60.7 and 21.3% for preterm and term PE in mainland China, respectively.The demanding and complex measurement technique of UtA-PI makes the method difficult to be widely used, so a more general and convenient method is needed to predict pregnant women at high risk of PE.
Cell-free DNA (cfDNA) was first discovered in 1948 (20).It is mainly derived from apoptotic cells in healthy or diseased individuals.In pregnant women, about 10-15% of cfDNA originates from placental trophoblasts (21).Transcriptionally active gene promoters are nucleosome-depleted regions in the placenta.Naked DNA fragments are more susceptible to degradation as they enter the circulatory system.Lower levels of cfDNA derived from these promoters are exhibited.Thus, the abundance of cfDNA in maternal plasma could reflect gene expression in the placenta.Abnormal gene expression in the placenta might cause placental dysfunction.Therefore, cfDNA in the plasma of pregnant women might have the potential to be used as a biomarker for the diagnosis and prediction of diseases caused by placental dysfunction during pregnancy.Recently, studies have determined the promoter profiling, the concentration and DNA methylation of maternal plasma cfDNA can be used to predict pregnancy complications (22)(23)(24)(25)(26)(27)(28).However, it remains unknown whether the non-invasive prenatal testing (NIPT) data used to screen for fetal chromosomal abnormalities could also be used for PE prediction.
Therefore, we aimed to uncover cfDNA biomarkers with predictive value for PE by analyzing NIPT data.We evaluated whether the predictive performances of PE classifiers based on cfDNA coverages of gene promoter regions and clinical risk factors could be validated in internal and external validation datasets and showed reasonable accuracies, which might extend the use of NIPT for PE screening in the future.

Participants
In the study, we retrospectively collected NIPT data of 2,727 pregnant women aged 24-45 years who underwent NIPT at 12 + 0 ~ 22 + 6 weeks of gestation at four hospitals in China from 2019 to 2021, including Zhuhai Center for Maternal and Child Health Care, Shenzhen Baoan Women's and Children's Hospital, Changsha Hospital for Maternal and Child Health Care and Jiangmen Central Hospital.This study was approved by the Ethics Committees of Beijing Genomics Institute (BGI) and these four hospitals (BGI-IRB 22026, LLSC-2022-01-04-04-KS, 2022030 and [2022]02), and all the participants provided their informed consent at the time of NIPT.
PE patients were diagnosed according to the following criteria: maternal systolic blood pressure is ≥140 mmHg and/or diastolic blood

cfDNA extraction, library preparation and sequencing
Five milliliters of maternal peripheral blood were collected into Streck Cell Free DNA BCT ® blood collection tubes (Streck, La Vista, NE, United States).Two hundred microliters of maternal plasma were used for cfDNA extraction, library construction and sequencing as described previously (29).Briefly, end repairing and adding a non-template dA tail to cfDNA were carried out and DNA amplification products were obtained by polymerase chain reaction (PCR).Then, the amplified double-stranded DNA was thermally denatured to single-strand DNA, cyclized and made into DNA nanoballs (DNBs).Finally, the DNBs were loaded onto chips and sequenced on the BGISEQ-500 sequencing platform (BGI, Shenzhen, China).The average depth of coverage is 0.1X.

Low-coverage whole-genome sequencing data processing
Raw reads were aligned to the hg38 human reference using BWA aln (30).After extracting uniquely mapped reads based on Sequence Alignment/Map (SAM) tags, removing PCR duplicates was performed using SAMTOOLS (31).

Analysis of the cfDNA profile
To identify the cfDNA profile in NIPT data that could stably predict pregnant women at high risk of early-or late-onset PE in different hospitals, we analyzed maternal plasma cfDNA coverages at primary transcription start sites (pTSSs), called pTSS coverage (Supplementary Figure 1A).For each transcript downloaded from GENCODE (v37, 32), we used the pTSS as a promoter region, which is a 2 kb region centered on the transcription start site.Read coverages at pTSSs were calculated from aligned BAM files using BEDtools (ver.2.29.2, 33) and normalized using the reads per kilobase per million mapped reads (RPKM) method.Then, we calculated the multiple of the median (MoM) value of each pTSS coverage by dividing the raw value by the median value of control samples in each dataset.

pTSS coverage
The number of mapped reads on the TSS The t i i = ×10 9 o otal number of mapped reads × 2000

Pathway enrichment analysis
The Kyoto Encyclopedia of Genes and Genomes (KEGG) and WikiPathways analyses were performed using g: Profiler (34).The p < 0.05 by computing multiple testing correction was considered statistically significant.

Experimental design
We followed the workflow encompassing sample allocation, classifier construction, and classifier evaluation.We first calculated pTSS coverages of 2,727 NIPT data as described above and performed the principal components analysis (PCA) using pTSS coverages.Samples whose the mean values of the first and second principal components differed within three standard deviations were retained.The PCA excluded 3 early-onset PE, 22 late-onset PE and 15 control samples (Figure 1; Supplementary Figure S1B).Then, we chose 899 samples from the Zhuhai Center for Maternal and Child Health Care as an external validation set.The remaining 1,788 NIPT data were used as the discovery dataset to identify cfDNA coverages at gene promoters regions and clinical risk factors relevant to PE prediction.In the discovery stage, the differential analysis of pTSS coverages was performed 1,000 times, each time using 70% samples randomly sampled from the discovery dataset.The wilcoxon rank sum test was used to identify significantly differential pTSS coverages between PE samples and healthy controls.The pTSS coverage was considered as a candidate cfDNA biomarker if the p-value (the upper limit of 95% CI for 1,000 times) < 0.05 and had the same trend of change.For clinical characteristics, MoM values of maternal age and BMI were calculated.The parity, past medical history and method of conception were discrete values.The past medical history was the sum of number of histories of chronic hypertension, PE, systemic lupus erythematosus (SLE) and antiphospholipid syndrome, on a scale of 0-4.The method of conception was defined as 1 for IVF and 0 for natural pregnancy.To retain the independence of different hospitals, the samples from the same hospital were assigned to a single dataset.In the classifier construction stage, machine learning methods (logistic regression, LR and random forest, RF) were performed to construct PE classifiers using samples from Jiangmen Central Hospital.Receiver operating characteristic (ROC) analysis was used to evaluate the performance of each classifier.In the evaluation stage, classifiers were validated in two internal validation datasets (800 samples from Shenzhen Baoan Women's and Children's Hospital, 584 samples from Changsha Hospital for Maternal and Child Health Care) and an external validation dataset (899 samples from Zhuhai Center for Maternal and Child Health Care).The 95% confidence intervals of area under curves (AUCs) of the training set and validation sets were calculated from 1,000 bootstrap samples.The sample size of each random sampling was equal to the size of the original dataset.

Statistical analyses
The Fisher's exact test was used to compare categorical variables between PE samples and healthy controls, such as gravidity, parity, past medical history and method of conception.The Mann-Whitney U-test was used to compare continuous variables, such as maternal age and BMI.The DeLong's test was used to compare AUCs.p < 0.05 was considered statistically significant.

Clinical characteristics for study cohorts
We retrospectively collected NIPT data of pregnant women at 12 + 0 ~ 22 + 6 weeks of gestation from 2019 to 2021 at four hospitals The workflow for developing early-and late-onset PE classifiers.We collected 2,727 NIPT data from pregnant women at 12 + 0 ~ 22 + 6 weeks of gestation from four hospitals.The cfDNA profiles which were cfDNA coverages at gene promoter regions were identified.Before the construction and evaluation of classifiers for PE prediction, 40 samples were removed based on the PCA analysis.The remaining 2,687 NIPT data were used as the training and three validation datasets.Samples from the same hospital were assigned to a dataset.In the discovery stage, clinical risk factors and the cfDNA profile used to develop PE classifiers were screened.In the classifiers construction stage, samples from Jiangmen Center Hospital were used to build classifiers by the machine learning method.In the evaluation stage, the classifiers were further validated in three validation datasets.in China.A total of 2,727 individuals with complete clinical information met the enrollment criteria (Figure 1).This study included four datasets: a training dataset, two internal validation datasets and one external dataset.The demographic and pregnancy details of pregnant women in the training and validation datasets were shown in Table 1 and Supplementary Table S1.As expected, pregnant women with older maternal age, higher BMI, past medical histories (chronic hypertension, PE, SLE or antiphospholipid syndrome) or IVF were more likely to develop early-or late-onset PE in at least two datasets.By contrast, parous women had a decreased risk of developing LPE in three datasets (Table 1; Supplementary Table S1).
The maternal age, BMI, past medical histories, parity and IVF were subsequently used to construct classifiers for predicting early-and late-onset PE.

Characterization of differential cfDNA profiles between PE and control samples
To find promoters with stably differential cfDNA coverages between PE and control samples, we repeated the differential analysis 1,000 times.Each time, 70% of the samples in training and internal validation datasets were randomly sampled and used.In total, pTSS coverages of 117 and 137 promoters were significantly higher and lower in early-onset PE samples compared to healthy controls, respectively (Figure 2A).Similarly, there were 266 and 344 promoters with significantly differential higher and lower pTSS coverages in lateonset PE samples compared to healthy controls (Figure 2B).Of these, pTSS coverages of CACNB2 and NRF1 gene promoters were lower in both early-and late-onset PE samples.
Then, we performed the KEGG pathway and WikiPathways enrichment analyses using genes with significantly differential cfDNA coverages in promoters between early-or late-onset PE and control samples.The results of pathway analyses showed that these genes were enriched in multiple pathways associated with PE (Figures 2C,D).For early-onset PE, genes with significantly differential cfDNA coverages in promoters enriched in "Hedgehog signaling pathway, " "Hippo signaling pathway, " "AMPK signaling pathway, " "Apelin signaling pathway, " "Autophagy, " "Oxytocin signaling pathway, " "Wnt signaling pathway" and "VEGFA-VEGFR2 signaling." For late-onset PE, the results of pathways enriched in "HIF-1 signaling pathway, " "Ras signaling pathway, " "PI3K-Akt signaling pathway" and "MAPK signaling pathway".

Development and validation of classifiers based on the cfDNA profile and clinical risk factors
For early-onset PE prediction, the cfDNA profile and clinical risk factors were combined to construct the early-onset PE classifier (C EPE ).The cfDNA profile included pTSS coverages of FOSL2, CAMKK2, CCND1, ITPR1, PRKACB and WNT7B genes, which played roles in least three PE-associated pathways, in addition to pTSS coverages of CACNB2 and NRF1 genes (Supplementary Figure S2).The clinical risk factors were maternal age, BMI, parity, past medical histories and method of conception.Similarly, the cfDNA profile, which was the ratio of pTSS coverages of FLT3LG and EGF genes   (Supplementary Figure S3), and the same clinical risk factors as C EPE were used to construct the late-onset PE classifier (C LPE ).
In the training dataset, the C EPE and C LPE based on the logistic regression (LR) or random forest (RF) model achieved accuracies of 0.87 and 0.90, respectively (Table 2).The AUCs of C EPE and C LPE were 87 and 96% (Figures 3A,B).To further evaluate the accuracies of classifiers for predicting early-and late-onset PE, we validated C EPE and C LPE in two internal and one external validation datasets (Figure 1).The C EPE exhibited AUCs ranging from 80 to 90% in three datasets (Figure 3A).For C LPE , the three validation datasets showed AUCs of 76, 74 and 72%, respectively (Figure 3B).With a false positive rate (FPR) of 10%, the average detection rates for predicting early-and late-onset PE were 63 and 48%, and the average PPVs were 33 and 55% in four datasets, respectively (Tables 2, 3).
In addition, classifiers based on the combination of the cfDNA profile and clinical risk factors showed higher AUC values in predicting early-or late-onset PE than based on clinical risk factors or the cfDNA profile alone in training and validation datasets (Tables 2, 3).Some of these reached statistical significance by DeLong's test.These results showed the combination of the cfDNA profile and clinical risk factors enhanced performances of earlyand late-onset PE prediction.

Discussion
In this study, we analyzed cfDNA coverages in gene promoter regions using NIPT data, namely pTSS coverage, and clinical risk factors from 2,727 pregnant women.We found that the combination of differential pTSS coverages and clinical risk factors could effectively distinguish PE patients from normal controls by machine learning methods.The early-and late-onset PE classifiers (C EPE and C LPE ) showed good performances for early-and late-onset PE prediction in four datasets (average AUCs of 0.84 and 0.80), with average accuracies of 89 and 80% and average detection rates of 63 and 48% at a 10% false positive rate.These results showed the C EPE and C LPE outperformed the FMF's competing risk model in predicting PE for the Chinese (19).
Clinical risk factors in C EPE and C LPE included maternal age, BMI, parity, past medical histories and method of conception, exhibited significant differences in early-or late-onset PE samples compared to healthy controls.The relationship between these factors and the risk of PE had been previously evaluated (2,12,13).An effective method for disease prediction requires high accuracy, low cost, and easy application.While predicting PE using basic clinical information may be more accessible and cost-effective for clinicians, our investigation revealed variations in the performance of PE models based solely on clinical risk factors across different hospitals.The AUCs for early-and Differential pTSS coverages between early-or late-onset PE and controls.Volcano plots showed promoters with differential cfDNA coverages between early-onset PE and healthy samples (A) or between late-onset PE and healthy samples (B).The red, blue, and grey dots indicated pTSS coverages were significantly higher, lower, and non-significant in early-or late-onset PE compared to control samples, respectively.Barplots showed the results of pathway enrichment analysis based on genes with differential pTSS coverages between early-onset PE and healthy samples (C) or between late-onset PE and healthy samples (D).KEGG, Kyoto Encyclopedia of Genes and Genomes; WP, WikiPathways.2, 3).This shortcoming was mitigated by incorporating the cfDNA profile obtained through NIPT data.Although the addition of cfDNA profiles in external validation increased the AUC by only 1% compared to predictions based on clinical risk factors, the AUC for the early-onset PE classifier exceeded 0.79 in each hospital.Herein, we demonstrated NIPT data could be applied to predict pregnant women at high risk for PE.The predictive test that is a by-product of the NIPT would be inexpensive and could provide valuable supplementary clinical information (unrelated to aneuploidy screening).Even if its diagnostic performance is not excellent, it could assist clinicians in selecting pregnant women who need reassessment using additional techniques, especially for those without clinical risk factors.Therefore, our study has the potential to expand the value of NIPT for screening patients with PE.While in some countries, NIPT is currently restricted to high-risk populations, the expanding utilization of NIPT data might have the potential to prompt initiatives in certain countries to provide financial support for NIPT through healthcare insurance or public health programs.This would ensure that a greater number of mothers can benefit from the advantages provided by NIPT.Moreover, as the cost of sequencing decreases in the future, the sequencing depth of NIPT continues to increase.We believe that the prediction effect of the NIPT data on PE will be further improved.
The cfDNA profile in C EPE model contained eight genes, of which pTSS coverages of FOSL2, CAMKK2, CCND1, ITPR1, PRKACB, WNT7B, CACNB2 and NRF1 genes were stably differential between pregnant women with early-onset PE and healthy controls.CAMKK2, CCND1, ITPR1, PRKACB and WNT7B genes were enriched in more than two pathways associated with PE, including "Hedgehog signaling pathway" (35), "Hippo signaling pathway" (36), "AMPK signaling pathway" (37), "Apelin signaling pathway" (38), "Autophagy" (39), "Oxytocin signaling pathway" (40), "Wnt signaling pathway" (41) and "VEGFA-VEGFR2 signaling" (42).The AMPK signaling pathway has been repeatedly reported to be associated with PE based on omics data (37,43).FOSL2 is a transcription factor in the progression of angiogenesis.Down-regulated expression of FOSL2 gene in PE placentas could cause placental vascular dysfunction, which is consistent with higher cfDNA coverage at its pTSS (44).The cfDNA profile in C LPE contained pTSS coverages of FLT3LG and EGF genes.These two genes are enriched in "Ras signaling pathway, " "PI3K-Akt signaling pathway" and "MAPK signaling pathway" (45,46).Of these, the PI3K-Akt and MAPK signaling pathways are critical for trophoblast function and implicated in underlying causes of preeclampsia, such as dysfunction of the placental endothelial nitric oxide synthase.The EGF family regulates the development of trophoblast and plays a role in trophoblast cell invasion.The EGF levels in plasma and serum have been verified to be significantly decreased (47,48).
The strength of this study lies in the development and validation of PE classifiers within a larger population compared to other studies, albeit being a retrospective case-control study.The established earlyand late-onset PE classifiers were validated in three datasets which means changes of these differential pTSS coverages and clinical risk factors are more common in patients with PE.Our study has some limitations.First, the C EPE and C LPE were constructed and validated from case-control cohorts of four hospitals.Although we retained the population differences between hospitals, the number of hospitals in this study was relatively small.It is necessary to validate these classifiers in a larger number of hospitals in the future.Second, this study is performed in China in specific maternal and environmental characteristics.It is imperative to validate these classifiers in different populations before advocating its use as a universal screening method.The classifiers established by integrating cfDNA profiles could mitigate the performance variations observed in PE models based on clinical risk factors alone.This could potentially broaden the application of NIPT in PE screening in the future.

5 )
Data were shown as median values (interquartile range) of clinical characteristics or numbers (percentages) of samples.*, ** and *** represented the different significance of p < 0.05, p < 0.01 and p < 0.001 between early-onset PE and control samples, respectively.EPE, early-onset preeclampsia; BMI, body mass index; PMH, past medical history; GA, gestational age; IVF, in vitro fertilization.

Yu et al. 10 .
3389/fmed.2024.1254467Frontiers in Medicine frontiersin.orglate-onset PE models based on clinical risk factors in three validation datasets ranged from 0.70 to 0.79 and from 0.68 to 0.74, respectively (Tables The data were presented in terms of area under curves (AUCs), accuracies, sensitivities, specificities, positive predictive values (PPVs), and negative predictive values (NPVs) of the final classifiers.The 95% confidence intervals of AUCs for the training and validation sets were computed based on 1,000 bootstrap samples.*Represented the AUC of the classifier based on clinical risk factors and the cfDNA profile was greater than based on clinical risk factors or the cfDNA profile alone by DeLong's test.*, ** and *** represented the different significance of p < 0.05, p < 0.01 and p < 0.001.Acc, accuracy; Sen, sensitivity; Spe, specificity; CI, confidence interval.

FIGURE 3
FIGURE 3 Performances of classifiers in predicting PE.Receiver operating characteristic (ROC) curves were drawn to evaluate performances of the early (A) and late-onset PE classifiers (B) based on the cfDNA profile and clinical risk factors in four datasets.TS, training dataset; IVS, internal validation dataset; EVS, external validation dataset.
(8)(9)(10)fter 20 weeks of gestation with the proteinuria level of ≥0.3 g/24 h or dipstick urine test ≥1+.Early-and late-onset PE were classified based on the time of PE diagnosis (34 weeks of gestation)(8)(9)(10). Pregnant women with PE were excluded if they had malignant neoplastic diseases or fetal chromosomal abnormalities.

TABLE 1
Demographic and clinical characteristics of early-onset PE patients and healthy controls in the training and validation datasets.

TABLE 2
The performance of the logistic regression model for predicting early-onset PE in the training and three validation datasets.The data were presented in terms of area under curves (AUCs), accuracies, sensitivities, specificities, positive predictive values (PPVs), and negative predictive values (NPVs) of the final classifiers.The 95% confidence intervals of AUCs for the training and validation sets were computed based on 1,000 bootstrap samples.*Represented the AUC of the classifier based on clinical risk factors and the cfDNA profile was greater than based on clinical risk factors or the cfDNA profile alone by DeLong's test.
*, ** and *** represented the different significance of p < 0.05, p < 0.01 and p < 0.001.Acc, accuracy; Sen, sensitivity; Spe, specificity; CI, confidence interval.10.3389/fmed.2024.1254467Frontiers in Medicine 08 frontiersin.orgThird,NIPT data we used single-end reads.It lacks information on the length of cfDNA fragments.A study verified that pregnant women with late-onset PE had a significantly higher cfDNA fragment size distribution compared to controls (25).The potential of combined this cfDNA signal with pTSS coverages for PE prediction needs to be further explored.Fourth, the accuracy of the C LPE was lower than that of the C EPE .The reason might be that there is higher heterogeneity in the pathogenesis of late-onset PE.In the future, we hope to develop classification models for different subtypes of late-onset PE in larger samples.

TABLE 3
The performance of the random forest model for predicting late-onset PE in the training and three validation datasets.