Prediction of postpartum hemorrhage (PPH) using machine learning algorithms in a Kenyan population

Introduction Postpartum hemorrhage (PPH) is a significant cause of maternal mortality worldwide, particularly in low- and middle-income countries. It is essential to develop effective prediction models to identify women at risk of PPH and implement appropriate interventions to reduce maternal morbidity and mortality. This study aims to predict the occurrence of postpartum hemorrhage using machine learning models based on antenatal, intrapartum, and postnatal visit data obtained from the Kenya Antenatal and Postnatal Care Research Collective cohort. Method Four machine learning models – logistic regression, naïve Bayes, decision tree, and random forest – were constructed using 67% training data (1,056/1,576). The training data was further split into 67% for model building and 33% cross validation. Once the models are built, the remaining 33% (520/1,576) independent test data was used for external validation to confirm the models' performance. Models were fine-tuned using feature selection through extra tree classifier technique. Model performance was assessed using accuracy, sensitivity, and area under the curve (AUC) of the receiver operating characteristics (ROC) curve. Result The naïve Bayes model performed best with 0.95 accuracy, 0.97 specificity, and 0.76 AUC. Seven factors (anemia, limited prenatal care, hemoglobin concentrations, signs of pallor at intrapartum, intrapartum systolic blood pressure, intrapartum diastolic blood pressure, and intrapartum respiratory rate) were associated with PPH prediction in Kenyan population. Discussion This study demonstrates the potential of machine learning models in predicting PPH in the Kenyan population. Future studies with larger datasets and more PPH cases should be conducted to improve prediction performance of machine learning model. Such prediction algorithms would immensely help to construct a personalized obstetric path for each pregnant patient, improve resource allocation, and reduce maternal mortality and morbidity.

Despite aggressive governmental efforts over the last 15 years to reduce maternal mortality in Kenya, including the implementation of a reproductive health voucher program in 2006 (8) and the provision of free maternity treatments in government facilities in 2013 (9), progress has been slow. The maternal mortality ratio (MMR) in Kenya is high (342 per 100,000 live births) compared to the current global MMR of 211 per 100,00 live births (4). A reduction in maternal mortality to a target of less than 70 maternal deaths per 100,000 live births is one of the United Nations' Sustainable Development Goals for 2030 (10). The ability to identify patients at risk of PPH and associated complications reliably, accurately, and early in pregnancy would be an important step towards achieving this aspirational goal.
In contrast to traditional general-purpose predictive algorithms, which merely transform input data into an output based on predetermined rules, artificial intelligence (AI) systems can generate new rules and patterns by analyzing both input and output data. A recent systematic review found three PPH risk prediction models with promising clinical applications (11), but AI approaches have not yet been thoroughly evaluated in obstetrics (12). Moreover, these predictive models that have been developed thus far used populations from the United Kingdom (13), South Korea (14), and China (15), and focused on PPH in the setting of cesarean delivery. Predictive modeling for PPH in a general obstetric LMICs population has not previously been reported.
The aim of this study was to construct and validate machine learning models to predict PPH in a general obstetric population using data from the Kenya Antenatal and Postnatal Care Research Collective (ARC) cohort. The long-term goal is to allow obstetricians to identify patients at high risk of PPH and guide clinical decision-making.

Study population
Data from the Maternal and Newborn Health (MNH) monitoring report collected by ARC were utilized for the development and validation of PPH prediction models. Briefly, MNH monitoring report data were collected as a part of prospective longitudinal study for pregnancy risk stratification innovation and measurement alliance at multiple LMIC sites, including in Ghana, Kenya, Zambia, and Pakistan. For this study, we used the antenatal, intrapartum, and postnatal visit data from the Kenya site collected between August 2020 and February 2022. The inclusion criteria were women with documented antenatal, intrapartum, and postnatal visits, including delivery outcome and reported PPH outcome. As a part of Maternal labor and delivery outcome documentation, PPH outcome were reported by healthcare staff within 24 h of delivery. Data were collected during home visits by healthcare staff as well as during healthcare facility visits, including antenatal clinic (ANC) visits, intrapartum visit within 24 h of delivery, and postnatal care (PNC) visits.

Experimental design
Demographic and clinical information collected during the ANC and intrapartum (delivery) visits were used to build the predictive models. The data were randomly divided into a training (67%) and independent testing (33%) dataset. The ratio of PPH to non-PPH cases was kept similar in training and testing dataset. The training dataset was again randomly divided into 67% and 33%, and the smaller dataset used for cross validation. The definition and classification of PPH in the literature is variable, depending on such factors as the estimated blood loss (EBL) (>500 ml or >1,000 ml), type of delivery (vaginal vs. cesarean), and timing of hemorrhage (early vs. late) (16)(17)(18). In this study, we used the outcome of interest (presence or absence of PPH) as reported by healthcare staff while collecting maternal labor and delivery outcome during delivery visit. This was generally determined using the criteria of EBL > 500 ml after vaginal delivery and >1,000 ml after cesarean delivery. All were early PPH cases as they were documented at or shortly after delivery.

Feature engineering and machine learning
Data were collected across four ANC visits at 0-17 weeks, 18-25 weeks, 26-33 weeks, and ≥34 weeks of gestational age. Socio-demographic data (such as age and height) were collected during the first ANC visit, whereas clinical data (such as hemoglobin level, blood pressure, and proteinuria,) were collected at each ANC visit. Overall, around 700 features consisting of categorical, numerical, and date/time variables were collected from each study subject across all pregnancy visits. To develop predictive algorithms for PPH, most of the features collected after the onset of labor (with the exception of the presence or absence of PPH) and all of the features collected at the PNC visit were removed from the analysis. The exclusion of such features was intentional to facilitate early prediction of PPH. This approach aims to optimize resource allocation, particularly in low-resource healthcare settings prevalent in LMICs. When two columns or features were highly correlated with each other as identified by data science techniques, only one of them was retained to avoid multicollinearity. Missing values were handled in one of two ways: for categorical variables, a new category "others" was created; for numerical variables, missing values were imputed using Generative Adversarial Nets Framework (GAIN) imputation methodology (19). Data for each study subject were captured at different gestational ages: 47% of women had their first ANC visit at 0-13 weeks of gestation, whereas 53% had their first ANC visit at 14-18 weeks of gestation. This is common in the healthcare setting resulting in data that is heterogenous and irregularly sampled at multiple time points. To address this issue, we employed the FIDDLE (Flexible Data-Driven Pipeline) framework (20) and transformed our features into two categories: time-invariant and time-variant features. Time-invariant features are those that were collected only once and typically do not vary (such as maternal age), whereas time-variant features were collected at different time points and vary over the course of the study (such as blood pressure).

Model performance and comparison
First, we trained four machine learning models (logistic regression, naïve Bayes, decision tree, and random forest) to predict PPH outcome using all of the features. The internal validation was performed using k-fold cross-validation. Thereafter, secondary models were built using limited sets of features selected through extra trees classifier, thereby making it more relevant to the clinical setting. For comparison between the models and to evaluate performance accuracy, the sensitivity, and area under the curve (AUC) of the receiver operating characteristics (ROC) curve were calculated using model predictions on the independent testing dataset.

Model interpretation
To estimate relative relevance of each feature, Shapley values were calculated using python library SHAP (21). To understand the importance of each feature towards predicting PPH, the mean absolute SHAP (SHapley Additive exPlanations) values were plotted for each individual in the training dataset.

Results
A total of 2,550 women were included in the Kenya maternal cohort. Of those, women were excluded if there was no reported delivery outcome (n = 924, 36.2%) or no report of the presence or absence of PPH (n = 50, 2.0%), leaving 1,576 women (61.8%) in the final analysis ( Figure 1). Among these 1,576 women, 40 (2.5%) were reported to have PPH. Table 1 presents the comparison of demographic and clinical characteristics between the PPH and non-PPH groups, revealing no significant differences between two groups. The mean age was 28.5 and 26.4 years for the PPH and non-PPH groups, respectively.
A total of 58 candidate features were derived from the overall 707 features and used to develop the PPH prediction models. The features included, amongst others, maternal age, gestational age, hemoglobin levels, systolic and diastolic blood pressure, and respiratory rate. These 58 features were transformed into 264 variables based on the time at which the variables were captured. Using these variables, logistic regression, naïve Bayes, decision tree, and random forest models were built. Due to high-class imbalance (40 PPH cases and 1,536 non-PPH cases), the logistic model did not perform well with AUC of 0.51 on the testing dataset. The remaining three models performed marginally Data are given as n (%) or mean ± SD. (PPH, postpartum hemorrhage).

FIGURE 1
Flow chart of Kenya maternal cohort.  Table 2). To further improve the performance of the models, the top 10 variables were selected using the extra trees classifier technique. Out of these top 10 variables, few were associated with the following features collected at intrapartum period: systolic blood pressure, diastolic blood pressure, respiratory rate, hemoglobin levels, and signs of pallor. Remaining variables were related with features such as time of third ANC visit, anemia diagnosis at first trimester and second trimester, and fetal heart rate in third trimester. As compared to the baseline models, the performance of all the models improved after training on these 10 selected variables. The naïve Bayes model performed significantly better than the other models when compared across the majority of performance metrics with 0.76 AUC, 0.95 accuracy, and 0.97 specificity ( Table 3). The comparison of ROC curves across the four models after training is shown in Figure 2.
As the naïve Bayes model performed best, we performed SHAP analysis using this model to investigate the impact of individual variables on PPH prediction. The critical features associated with a high risk of PPH included: (1) signs of pallor documented during the intrapartum visit, (2) a diagnosis of anemia made anytime during the pregnancy (defined as hemoglobin levels less than 11 g/dl), (3) limited prenatal care (defined as the third ANC visit occurring within 11 weeks of delivery), (4) elevated diastolic blood pressure at intrapartum visit (greater than 85 mmHg), and (5) elevated systolic blood pressure at intrapartum visit (greater than 123 mmHg). In contrast, elevated hemoglobin concentrations at intrapartum visit (greater than 13 g/dl) and rapid respiratory rate (more than 20 breaths per minute at intrapartum visit) were protective of PPH prediction (Figure 3).

Main findings
In this study, we demonstrated that machine learning can be employed to predict PPH using routine clinical data collected at routine antenatal and intrapartum visits. The ability to accurately predict patients at high risk of PPH is important in order to identify and stratify their care and mitigate the consequences of this dangerous condition. The association of clinical features such as blood pressure, respiratory rate, hemoglobin levels, and anemia with the development of PPH is consistent with published data (6,22). The association with anemia is particularly significant since, in addition to predisposing women to PPH, it also limits tolerance to blood loss (18). Our final model did not include many previously identified risk factors such as a history of PPH in a prior pregnancy, route of delivery, and multiple gestations. This could be because of the low incidence of such conditions in the study population or because the drivers and risk factors for PPH may be different in a LMIC population such as that in Kenya as compared to a western population.
In this cohort, we also found an association between ANC visits and risk of PPH. More limited prenatal care (defined as the third ANC visit occurring within 11 weeks of delivery) was associated with an increased risk of PPH. This underlines the importance of prenatal care and timely ANC visits in LMICs. Such insights can help inform policies that address unfavorable social determinants of health to take down barriers that prevent some women from accessing care and adhering to recommended antenatal visit schedules. This finding further suggests that targeted interventions to improve access and visit compliance may reduce maternal mortality and morbidity due to PPH in LMICs.
In our study, we made notable observations regarding the performance of different classifiers on an independent test dataset. Specifically, the naïve Bayes classifier exhibited superior accuracy compared to random forest and decision tree, which experienced significant decline in accuracy. This decline may indicate a potential issue of overfitting in the latter classifiers (23,24). In contrast, recent study on Iran population found that machine learning model such as random forest and decision tree provided improved performance in predicting PPH (25). Westcott et al. (26) also reported better performance of boosted decision trees for prediction of PPH in United States population. Additionally, we observed suboptimal performance of logistic regression when compared to naïve Bayes. This can be attributed to the relatively small sample size of the PPH population. Previous studies have consistently demonstrated that generative classifiers such as naïve Bayes tend to outperform discriminatory classifiers like logistic regression when trained on limited sample sizes (27,28).

Clinical implications
These findings suggest that routine clinical and demographics data can be used to predict women at high risk for adverse pregnancy events. At present, healthcare providers predominantly rely on clinical expertise and nonspecific risk factors to identify high-risk pregnancies. Further, the International Federation of Gynecology and Obstetrics (FIGO) developed a PPH care pathway to integrate WHO PPH guidelines in Sub-Saharan African countries (29). These guidelines are primarily derived from expert opinion and clinical consensus lacking the ability to offer personalized risk predictions. By incorporating such predictive models into clinical practice, healthcare professionals will be able to personalize and stratify the care of women at high-risk for PPH. For such cases, targeted interventions may include treating anemia, administrating prophylactic medication (often referred to as active management of the third stage of labor), ensuring the availability of blood products during delivery, and/or non-clinical interventions such as providing free transportation to the clinic or home visits in resource-limited settings. The efficacy of these intervention modalities should be tested against existing practice before introducing them into routine clinical practice. This can be accomplished with impact studies such as cluster randomized trials (30,31).

Strengths and limitations
To our knowledge, this is the first study to predict PPH in a population from Sub-Saharan Africa using a machine learning  approach. This "proof of concept" study demonstrates that-even in LMICs with limited resources and a lack of standardized electronic health records-such predictive machine learning techniques can be used to identify women at high-risk of serious adverse pregnancy events, such as PPH which is a major cause of maternal mortality and morbidity. Additional studies are needed to prospectively validate our model and to determine whether it is generalizable to other Sub-Saharan African populations. Limitations of this study include a small sample size with only 40 PPH cases. Moreover, our cohort had a PPH rate of 2.5%, which is significantly lower than the reported 10.5% prevalence of PPH in this region (5). The exclusion of late PPH cases, considering that the PPH outcome was reported within 24-hours of delivery, could potentially account for the lower PPH rate observed in the present study population. Additionally, it is plausible that the underreporting of cases could be another contributing factor to the observed low PPH rate. Data quality with a high proportion of missing data is another limitation of this study. Given the current state of clinical documentation in LMICs, it is possible that certain important factors were not captured and therefore could not be incorporated into the model. Future studies including larger datasets with more cases of PPH would allow for the careful examination and inclusion of additional variables into the prediction models for improved prediction performance.

Conclusions
Seven factors (anemia, limited prenatal care, hemoglobin concentrations, signs of pallor at intrapartum, intrapartum systolic blood pressure, intrapartum diastolic blood pressure, and intrapartum respiratory rate) were associated with PPH prediction in Kenyan population. These findings provide an opportunity to explore machine learning approaches to identify patients at high-risk of PPH in resource constrained settings. Use of such predictive models to identify and stratify women at high risk of PPH and other adverse pregnancy events could bring us one step closer to designing a personalized obstetric journey for each pregnant patient, improving resource allocation, and ultimately reducing maternal mortality and morbidity.

Data availability statement
As the raw data supporting the conclusions of this article contains patient-level information, it is available upon request to the authors.

Ethics statement
The studies involving human participants were reviewed and approved by Kenya Medical Research Institute (KEMRI) and Centers for Disease Control and Prevention (CDC). The patients/ participants provided their written informed consent to participate in this study.

Author contributions
SSh: conceived and designed the study. BT, JW, SK and GO: were involved in primary data acquisition, data cleaning, and data management. SSa, SR and NN: were involved in data cleaning, data analysis, interpretation of data and model building. SSh, MT and SG: supervised the study. MT: wrote the manuscript. EN and RR: contributed to manuscript discussion and critical review. DO and VA: were involved in manuscript review and data interpretation. All authors contributed to the article and approved the submitted version.

Funding
This study was funded by the Bill & Melinda Gates Foundation grant (INV038099) as a part of Antenatal risk stratification AI support work.

Conflict of interest
SS, SR and NN are employees of CognitiveCare Inc.'s wholly owned subsidiary. SYS, SG and MT are founding team members and employees of CognitiveCare Inc. CognitiveCare Inc. has a patent pending for a maternal and infant health intelligence and cognitive insight (MIHIC) system and score to predict the risk of maternal, fetal and infant morbidity and mortality.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.