Risk prediction of bronchopulmonary dysplasia in preterm infants by the nomogram model

Backgrounds and Aims Bronchopulmonary dysplasia (BPD) has serious immediate and long-term sequelae as well as morbidity and mortality. The objective of this study is to develop a predictive model of BPD for premature infants using clinical maternal and neonatal parameters. Methods This single-center retrospective study enrolled 237 cases of premature infants with gestational age less than 32 weeks. The research collected demographic, clinical and laboratory parameters. Univariate logistic regression analysis was carried out to screen the potential risk factors of BPD. Multivariate and LASSO logistic regression analysis was performed to further select variables for the establishment of nomogram models. The discrimination of the model was assessed by C-index. The Hosmer-Lemeshow test was used to assess the calibration of the model. Results Multivariate analysis identified maternal age, delivery option, neonatal weight and age, invasive ventilation, and hemoglobin as risk predictors. LASSO analysis selected delivery option, neonatal weight and age, invasive ventilation, hemoglobin and albumin as the risk predictors. Both multivariate (AUC = 0.9051; HL P = 0.6920; C-index = 0.910) and LASSO (AUC = 0.8935; HL P = 0.7796; C-index = 0.899) - based nomograms exhibited ideal discrimination and calibration as confirmed by validation dataset. Conclusions The probability of BPD in a premature infant could be effectively predicted by the nomogram model based on the clinical maternal and neonatal parameters. However, the model required external validation using larger samples from multiple medical centers.


Introduction
Bronchopulmonary dysplasia (BPD) is a common chronic lung disease in preterm infants, leading to long-term complications such as cardiopulmonary dysfunction and growth retardation (1,2). In particular, moderate and severe BPD is a major cause of death and neurodevelopmental disability in preterm infants (3,4). Although the survival rate of preterm infants has increased significantly with the widespread application of antenatal steroids, exogenous pulmonary surfactant replacement therapy, and the development of neonatal intensive care, the incidence of BPD has been not dramatically decreased (5)(6)(7). What's more, BPD has serious immediate and long-term sequelae as well as morbidity and mortality (8)(9)(10). Therefore, the prevention of BPD has become a hot topic of clinical concern.
The pathogenesis of BPD is complex, with the underlying cause being impaired development of the immature lung in response to inflammation, hyperoxia, and other damaging factors (11, 12). Prenatal and neonatal factors have been associated with BPD (13,14). The prenatal factors are related to the lack of antenatal steroid therapy, chorioamnionitis, and maternal hypertension (13,14). Neonatal factors include gestational age, birth weight, and postdelivery resuscitation (13,14). The independent prenatal risk factors may be oligohydramnios, male gender, and intrauterine growth restriction (14, 15). Postnatal risk factors seem to be the length of exposure to mechanical ventilation, nosocomial pneumonia, and the necessity for FiO 2 of more than 0.30 in the delivery room (14, 15). It is important to prevent and control the occurrence of BPD if the risk of BPD development can be identified and assessed early.
The statistical analysis methods commonly used to select the risk variables contain the logistic regression analysis (16). Logistic regression analysis extends the techniques of multiple regression analysis to research situations in which the outcome variable is categorial (17). The least absolute shrinkage and selection operator (LASSO) regression is a shrinkage and variable selection method for regression models, which has been used to determine the variables of ischemic stroke, Alzheimer's disease, COVID-19, and lymph node metastasis (18)(19)(20)(21)(22). Nomogram can help present the risk degree of evidence-based outcomes, and the corresponding mathematical equation addresses the impact of risk factors associated with diseases (23, 24). Many studies have identified the risk factors by LASSO regression or multivariate logistic regression and attempted to visualize the incidence and probability of BPD (25, 26). In this study, we sought out to screen risk prediction factors based on these two methods multivariate logistic regression analysis and LASSO logistic regression analysis. The probability of BPD in a premature infant was visualized and compared by evaluating the discrimination performance.

Eligibility criteria
Premature infants with BPD less than 32 weeks of gestational age had been admitted to the Neonatal Intensive Care Unit of Linyi Central Hospital. This study enrolled preterm infants with gestational age less than 32 weeks. At the same time, we collected demographic, clinical and laboratory data. This study enrolled 237 cases of premature infants. This study protocol was reviewed and approved by the Research Ethics Commission of Linyi Central Hospital (No. LCH-LW-202208).
In this study, 36 cases were excluded because of death within 28 days after birth (n = 16), maternal mental retardation (n = 1), maternal psychiatric abnormality (n = 1), aggravation (n = 6), and transfer (n = 12). The number of premature infants who met the inclusion criteria was 237, and the cases were categorized into training set (n = 189) and validation set (n = 48). The training set was used to select potential risk factors by univariate logistic regression analysis followed by multivariable logistic regression analysis and LASSO regression analysis. Nomogram model was generated and validated using data from validation cohort. There were no significant differences in demographic, clinical and laboratory results between training dataset and validation dataset ( Table 1).

Definition of BPD
BPD was defined as a categorical variable: no BPD and BPD (mild, moderate and severe BPD) according to BPD criteria of the National institute of Child Health and Development (NICHD) (27). In this study, premature infants born with gestational age less than 32 weeks were diagnosed with BPD as defined by oxygen support more than 21% of fraction of inspired oxygen (FiO 2 ) at 36-week postmenstrual age for at least 28 days. Mild BPD was defined as not receiving supplemental oxygen; Moderate BPD was receiving oxygen less than 30% of FiO 2 ; Severe BPD was receiving oxygen support more than 30% FiO 2 or needing positive-pressure ventilation or nasal continuous positive airway pressure.

Data collection, filtering, and imputation
In this study, the research collected the potential risk factors, including demographic, clinical, and laboratory information. We removed the variables with the proportion of missing values greater than or equal to 20% in each cohort. The missing data were interpolated using the random forest technique. As for data preprocessing, values of height and weight of puerperal women, prothrombin time, indirect bilirubin, creatine kinase MB, and creatinine were winsorized. The values of white blood cell, activated partial thromboplastin time, glutamic pyruvic transaminase, creatine kinase MB, and blood urea nitrogen were log-transformed; and values of indirect bilirubin and CO 2 CP were transformed by squared root function. The categorical variables included birth order, gestational age, Apgar scores (1 min and 5 min), duration of ventilation (more than 1 week), neonatal respiratory distress syndrome (NRDS), plateletcrit, and fibrinogen. The data in 2019, 2021, and 2022 were used as the training set, and the data in 2020 was used as the validation set. The sample size in each group meets the ratio requirement of 7:3. Due to the limited sample size, the data set was randomly split, and the data set was divided into training set and validation set by a ratio of 8:2.

Logistic regression analysis and nomogram model development
The candidate risk factors were initially analyzed by univariate logistic regression analysis. Multivariate logistic regression analysis was performed for risk factor selection based on forward stepwise selection with a significance level alpha of 0.05, and the selected

Model evaluation
The predictive performance of model 1 and model 2 was evaluated using ROC curve and calibration curve using the data from internal validation cohort and external validation cohort. The calibration of the nomogram was accompanied with the Hosmer-Lemeshow test. Harrell's C-index was used to measure the discrimination performance of the model. Model evaluation was carried out using the R-package.

Results
Univariable logistic regression analysis revealed maternal age, caesarean, gestational age, birth weight, 1 min Apgar scores (<8), 5 min Apgar scores (<8), postnatal asphyxia, IV, and duration of ventilation (>1 week), NRDS Grade III-IV, pulmonary surfactant (PS) application, PS + budesonide, hemoglobin, prothrombin time, albumin, and GLBI may be associated with BPD. Multivariable logistic regression confirmed that advanced maternal age, gestational age less than 29 weeks, and duration of ventilation more than 1 week may the causative factors or indicators for BPD (Table 2). Caesarean, high birth weight, and increased hemoglobin level may decrease the risk of BPD in preterm infants. The LASSO logistic regression model was analyzed using the R-package glmnet. The optimal λ value was determined by cross-validation with the number of folds set to 10. The two dotted lines in Figure 1A represent two values, lambda.min and lambda.1se. Lambda.min is defined as the lambda value of the mean value of the smallest target parameter among the lambda values. As for Lambda.1se, it is the lambda value of the most compact model obtained within a variance range of lambda.min. The red dots represent the mean value of the target parameter, and CI was obtained for the target parameter. The curves in Figure 1B are the trajectory of each independent variable coefficient. As the value of lambda increases, the number of independent variables entering the model decreases. When lambda.1se was selected, the variables delivery, weight and age of premature infants, intensive ventilation more than 7 days, hemoglobin, and albumin were included in the establishment of a risk prediction model 2.
The prediction model 1 and model 2 were presented in the nomogram (Figures 2A,B). For model 1, formula for calculating the probability of BPD was computed as: Logit(P|BPD) = ln(P/ 1 − P) = 5.690 + 0.126*MAge − 1.943*Delivery − 3.226*GWeight + 1.223*GAge + 1.308*intensive ventilation − 0.033*hemoglobin; As for model 2, Logit(P|BPD) = ln(P/1 − P) = 11.2260 − 1.5403* Delivery − 3.1594*GWeight + 1.1190*GAge + 1.2411*intensive ventilation − 0.0269*hemoglobin − 0.0701*albumin, where MAge is maternal age, GWeight and GAge are the birth weight and age of premature infants, respectively. The calibration curve of the nomogram model 1 for the probability of BPD demonstrated good agreement between prediction and observation between prediction and observation in the training cohort and validation cohort ( Figures 3A,B). The Hosmer-Lemeshow test (HL) yielded a nonsignificant statistic (P = 0.6823) in training cohort and a nonsignificant difference in validation cohort (P = 0.6920). The calibration curve of the nomogram model 2 for the probability of BPD demonstrated good agreement between prediction and observation in the training cohort and validation cohort ( Figures 4A,B). The Hosmer-Lemeshow test (HL) yielded a nonsignificant statistic  Nomogram models estimating the probability of bronchopulmonary dysplasia (BPD) in a premature infant. (A) model 1 incorporates maternal age (MAge), delivery mode, the birth weight and age of premature infants, application of intensive ventilation (IV), and hemoglobin (HGB) level; (B) model 2 includes delivery mode, the birth weight and age of premature infants, application of intensive ventilation (IV), hemoglobin level (HGB), and albumin level (ALBI). Gao et al. 10.3389/fped.2023.1117142 Frontiers in Pediatrics showed indistinct net benefits across a wide range of threshold probability in the training cohort, indicating both model 1 and model 2 possesses clinical usefulness ( Figure 6).

Discussion
Analysis of 189 neonates revealed that 28-day-old neonates were prospectively predicted at the risk of BPD, which may be associated with advanced maternal age, caesarean, gestational age less than 29 weeks, birth weight, 1 min Apgar scores (less than 8), 5 min Apgar scores (less than 8), postnatal asphyxia, intensive ventilation application, duration of ventilation (more than 1 week), NRDS grade III-IV, PS application, PS and budesonide application, decreased hemoglobin, increased prothrombin time, and decreased albumin and GLBI. Risk factors identified by multivariate and LASSO logistic regression analysis were considered early predictors for the development of risk prediction model for BPD. Stepwise and LASSO analysis all selected delivery, birth weight, birth age, intensive ventilation application, and hemoglobin level as predictors. It is clear that LASSO selection method showed better prediction accuracy and interpretability compared to stepwise method.
The maternal parameters were analyzed for selecting risk factors of BPD. The main causes of BPD are advanced maternal age and delivery by caesarean section. There is an ongoing trend in China and developed countries that delayed childbearing shows no signs of diminishing although preterm newborns born to women of increasing maternal age are reported with multiple adverse birth outcomes like BPD (28). For very preterm infants, increasing maternal age is not significantly associated with neonatal mortality or major morbidity (29). Instead, younger maternal age may increase the risk of severe intraventricular hemorrhage in very preterm infants (29). Delivery by caesarean section remained significantly associated with the decreased occurrence of BPD. It is conceivable that these preterm infants born by caesarean section were exposed to postnatal antibiotics, preventing multiple respiratory disorders and lung injury (2).
This study analyzed various antenatal, perinatal, and postnatal factors that may contribute to the development of BPD. Preterm infants born at <29 weeks of gestational age had an increased incidence of BPD. Infants weight was significantly lower in BPD than non-BPD group. Although intensive ventilation and prolonged ventilation can cause lung injury and are risk factors for BPD, lung-protective ventilation is still an important strategy for the current clinical resuscitation of critically ill preterm neonates. Incomplete differentiation of alveolar type 2 cells causes a lack of pulmonary surfactant occurring fairly fate in gestation (30). Pulmonary surfactant replacement therapy and budesonide application were believed to increase the incidence of BPD. However, contrary findings were witnessed with the introduction of the routine application of pulmonary surfactant and budesonide (31)(32)(33)(34). It was believed that our study is a retrospective and non-randomized controlled study. Premature infants receiving endotracheal application of pulmonary surfactant and budesonide were mainly with lower body weight and younger gestation age compared to other studies, which may cause a higher incidence of BPD.
BPD severity is greatly associated with Apgar scores at 1 min and 5 min (35, 36). Premature infants diagnosed with BPD were registered with lower 5 min Apgar scores compared with non-BPD. Postnatal asphyxia was linked to the development of BPD as evidenced by our analysis and other studies (37)(38)(39)(40). In our preterm infants, NRDS, especially NRDS grade III-IV, is an important cause of BPD. Laboratory examination revealed that premature infants with BPD showed increased prothrombin time and decreased levels of hemoglobin, albumin and globulin. Fetal hemoglobin decreases early, indicating the reduction of endogenous blood component, which has been proposed as a predictive value for BPD development (41). The levels of albumin and globulin have been considered biomarkers associated with BPD in other studies (42).
Multivariate and LASSO logistic regression analysis was carried out to select the predictors for BPD. It is worth noting that LASSO differently selected albumin as the predictors instead of maternal age compared to stepwise method. Both stepwise and LASSObased nomogram models exhibited good discrimination. Model 1 by stepwise method had a favorable discrimination performance with an AUC of 0.9051, compared to 0.8935 for LASSO logistic regression model 2, analyzed with the validation dataset. A retrospective analysis included risk factors birth weight, gestational age, gender, et al. for generating risk scoring system, FIGURE 5 Comparison of the predictive sensitivity and specificity of model 1 and model 2 using the training data (A) and validation data (B). Model 1 (red line); model 2 (blue line). Decision curve analysis (DCA) curves for the model 1 and model 2. The black line is the hypothesis that no patients; the gray line represents the hypothesis that all premature infants have BPD. of which a sensitivity was 65%-90.3% and a specificity was 77.8%-88% (36, 43). The strengths of our models included a more favorable discrimination performance, comparison of two predictor selection methods stepwise and LASSO logistic regression analysis, and external validation of the risk factors. The established prediction models could be used to predict the probability of BPD for premature infants by using the scoring formulae that were proposed based on maternal age, delivery mode, the birth weight and age of premature infants, application of intensive ventilation, hemoglobin level, and albumin level. The established models may help the clinicians to early diagnose the disease, design therapy project and estimate prognosis. However, our study was limited in selection bias and a relatively small sample of BPD from a single medical center.

Conclusions
This study of risk scoring for BPD supports that the probability of BPD could be predicted for premature infants by maternal age, delivery options, birth weight, birth age, invasive ventilation, hemoglobin and albumin identified by stepwise and LASSO logistic regression analysis. The developed nomogram model 1 and model 2 provided risk predictors for BPD, and explain the potential detriments for premature infants. However, further larger samples from multiple medical centers are required for external validation.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Ethics statement
The studies involving human participants were reviewed and approved by the Research Ethics Commission of Linyi Central Hospital. Written informed consent from the participants' legal guardian/next of kin was not required to participate in this study in accordance with the national legislation and the institutional requirements.