Comparing different scoring systems for predicting mortality risk in preterm infants: a systematic review and network meta-analysis

Background This study aimed to compare the predictive values of eight scoring systems (Neonatal Critical Illness Score [NCIS], Neonatal Therapeutical Intervention Score System [NTISS], Clinical Risk Index for Babies [CRIB], Clinical Risk Index for Babies II [CRIB-II], Score for Neonatal Acute Physiology Perinatal Extension [SNAPPE], Score for Neonatal Acute Physiology Perinatal Extension II [SNAPPE-II], Score for Neonatal Acute Physiology [SNAP], and Score for Neonatal Acute Physiology II [SNAP-II]) for the mortality risk among preterm infants. Methods The Embase, PubMed, Chinese Biomedical Database, Web of Science, and Cochrane Library databases were searched to collect studies that compared different scoring systems in predicting the mortality risk in preterm infants from database inception to March 2023. Literature screening, data extraction, and bias risk assessment were independently conducted by two researchers. Subsequently, the random-effects model was used for the network meta-analysis. Results A total of 19 articles were included, comprising 14,377 preterm infants and 8 scoring systems. Compared to CRIB-II, NCIS, NTISS, SNAP-II, and SNAPPE-II, CRIB demonstrated better predictive efficiency for preterm infant mortality risk (P < 0.05). Relative to CRIB, CRIB-II, and SNAPPE, SNAP-II had worse predictive efficiency for preterm infant mortality risk (P < 0.05). The surface under the cumulative ranking curve of the eight scoring systems was as follows: CRIB (0.980) > SNAPPE (0.718) >SNAP (0.534) >CRIB-II (0.525) >NTISS (0.478) >NCIS (0.422) >SNAPPE-II (0.298) >SNAP-II (0.046). Conclusion The CRIB scoring system showed the highest accuracy in predicting preterm infant mortality risk and was simple to perform. Therefore, CRIB selection can be prioritized in clinical practice. Systematic Review Registration https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=434731, PROSPERO (CRD42023434731).


Introduction
According to the World Health Organization, approximately 15 million preterm infants (gestational age <37 weeks) are born every year worldwide.The death rate of preterm infants accounts for approximately 35% of neonates and 16% of children aged <5 years.Premature birth is an important global health problem (1,2).Immature organ function and poor adaptability to the extrauterine environment in preterm infants make them prone to asphyxia, acute respiratory distress syndrome, intracranial hemorrhage, infection, and other complications that increase the risk of death.Approximately 1.27 million babies died of complications related to preterm delivery worldwide in 1990.With the progress in perinatal medicine and neonatal intensive care technology, the survival rate of preterm babies has gradually increased.However, the data showed that 0.66 million babies still died of preterm birth-related complications worldwide in 2019 (3).
Studies indicate that timely and effective disease assessment and mortality risk prediction may greatly reduce clinical mortality among preterm infants (4).The Neonatal Critical Illness Score (NCIS), Neonatal Therapeutical Intervention Score System (NTISS), Clinical Risk Index for Babies (CRIB), Clinical Risk Index for Babies II (CRIB-II), Score for Neonatal Acute Physiology Perinatal Extension (SNAPPE), Score for Neonatal Acute Physiology Perinatal Extension II (SNAPPE-II), Score for Neonatal Acute Physiology (SNAP), and Score for Neonatal Acute Physiology II (SNAP-II) are commonly used in clinics and are reliable in predicting the mortality risk of preterm infants.To date, no reviews have explored the strengths and weaknesses of the different scoring systems (5).Therefore, this study aimed to compare the predictive efficiency of the above critical illness scoring systems through a network meta-analysis and estimate the rank order of each scoring system.This study provides reference for clinical decision-making and as such may further reduce the clinical mortality of preterm infants.

Methods
The study protocol for this systematic review and meta-analysis was prospectively registered with PROSPERO (CRD42023434731).S1.Similar search threads were used for all other databases.Reference lists of relevant literature were reviewed to identify as many available studies as possible.

Selection criteria
The inclusion criteria were as follows: (1) retrospective or prospective cohort study, (2) research participants were preterm infants (gestational age <37 weeks), and (3) the predictive value of any two or more scoring systems among the NCIS, SNAP, SNAP-II, SNAPPE, SNAPPE-II, CRIB, CRIB-II, and NTISS on the mortality risk in preterm infants were compared.The exclusion criteria were: (1) repeatedly published literature; (2) literature reviews, case reports, conference abstracts, and gray literature; and (3) studies with incomplete data.

Screening process and data extraction
Two researchers (ZLZ and ZYS) independently screened the articles, extracted the data, and cross-checked the results.Any discrepancies were resolved through discussions with a third researcher (XWL).First, the titles and abstracts were reviewed for relevance, and then two researchers independently read the full texts of all potential articles to determine the final studies for inclusion.The following data were extracted: title, author, country, sample size, gestational age, birth weight, study type, scoring system, and outcome indicator.

Risk of bias assessment
Two researchers (ZLZ and ZYS) independently assessed the quality of the included studies and cross-checked the results.Differences were resolved by discussion or by asking a third researcher (XWL) to assist in the judgment.The risk of bias was assessed using the Newcastle-Ottawa Scale (NOS) based on three categories: selection, comparability, and outcome (6).Studies awarded six or more stars were classified as high quality.

Statistical analysis
Data were compiled and analyzed using Markov chain Monte Carlo simulation chains of R software (version 4.2.1) based on a Bayesian framework according to the PRISMA NMA instruction manual.Stata software (version 16.0) was used to draw the network and funnel plots, and verify the results of network meta analysis.The area under the curve (AUC) and their respective 95% confidence interval (CI) or standard error (SE) were used to combine the data.The I 2 test estimated heterogeneity between studies, where by I 2 < 50% was considered slight heterogeneity and I 2 ≥ 50% was considered high heterogeneity.Due to potential differences between studies, we decided to use a random-effects model rather than a fixed-effects model to analyze the data, employing four Markov chains, 10,000 burn-in iterations, and 30,000 simulation iterations.To quantify and demonstrate the agreement between indirect and direct comparisons, we used the nodal method calculated according to the instructions in Stata.The consistency test was passed if the P-value was > 0.05.The potential scale-reduced factor (PSRF) was used to determine model convergence.When the PSRF value was <1.05, the convergence of the iterative effect was good.To rank the predictive value of each scoring system, we used the surface under the cumulative ranking (SUCRA) and mean ranks.The SUCRA values ranged from 0 to 1, and a larger SUCRA value implied a higher predictive value.Small-scale studies could lead to publication bias in NMA, for which we created network funnel plots and checked them visually using symmetry criteria.

FIGURE 1
Flowchart of the study selection process.

Study characteristics
The 19 articles included in the study were published between 1994 and 2022, and comprised 14,377 preterm infants and 8 scoring systems: NCIS, SNAP, SNAP-II, SNAPPE, SNAPPE-II, CRIB, CRIB-II, and NTISS.Their main characteristics are shown in Table 1.

Risk of bias of included studies
Quality evaluation results showed that seven studies (8,11,13,14,18,23,25) did not describe the determination of exposure in detail; four studies (16,17,20,21) did not describe the follow-up time; and nine studies (7, 9-11, 14, 18, 19, 23, 25) did not address the confounding factors.All included studies had a total quality score ≥6 points, suggesting that the quality of the included studies was high (Table 1).

Network plot
We described network diagrams for different scoring systems, In the presented network diagrams, each node represents a different scoring systems, and the lines connecting the nodes represent a direct comparison between scoring systems.The size

Model convergence
The Bayesian model constructed in this study showed that the fluctuation of the Markov chain was small, the iterative trajectory tended to be stable, and the PSRF value is close to 1, which indicates that the model converged better and could effectively predict the data.And the prediction results of R software and Stata software are consistent.

Assessment of heterogeneity and consistency
A heterogeneity test was performed on the original studies that directly compared each scoring system, and the results showed that the heterogeneity between CRIB and CRIB-II (I 2 = 58.3%)was high.There was slight heterogeneity between SNAPPE-II and SNAP-II (I 2 = 43.3%)and CRIB-II (I 2 = 18.2%); other studies had no heterogeneity.The node-split method was used to test the consistency.There was no significant difference between the indirect and direct comparison results of each scoring system (P > 0.05), indicating that the effects of consistency between the studies were acceptable.

Network meta-analysis results
The network meta-analysis showed that compared with CRIB-II, NCIS, NTISS, SNAP-Ⅱ, and SNAPPE-II, CRIB was more effective in predicting the mortality risk in preterm infants (P < 0.05).Relative to CRIB, CRIB-II, and SNAPPE, SNAP-Ⅱ was worse in predicting the mortality risk in preterm infants (P < 0.05).There were no statistically significant differences between the other scoring systems (P > 0.05), as shown in Table 2.

Publication bias test
The included literature was assessed for publication bias, and a corrected funnel plot was drawn, which is shown in Figure 4.The studies were evenly distributed on both sides of the center line of the funnel plot, which was symmetrical, and no significant publication bias was found.SUCRA curve of predictive value of the eight scoring systems.
Funnel plot of the network meta-analysis.

Discussion
The present study compared the predictive values of critical illness scoring systems for preterm infant mortality risk using a network meta-analysis.A total of 19 articles involving 8 scoring systems were included in this study.According to the SUCRA ranking results, CRIB had the best predictive performance (SUCRA = 0.980).This study suggests that CRIB may be the most suitable scoring system for predicting the risk of preterm infant death.
The CRIB was formulated by Cockburn et al. ( 26) in 1993 and is suitable for preterm infants with a gestational age <31 weeks or a birth weight ≤1,500 g.The system includes six scoring items: gestational age, birth weight, congenital malformations, maximum base excess, and maximum and minimum fractions of inspired oxygen.Within 12 h after birth, the medical staff spend 5-10 min completing the assessment.Studies have shown that gestational age and birth weight are independent risk factors for preterm infant mortality, and with younger gestational age and lower birth weight comes a higher mortality rate (27).Additionally, the fraction of inspired oxygen (FiO 2 ) reflects the maturity of lung development in preterm infants at birth.The imperfect development of lung function can lead to a lack of pulmonary surfactant and a lag in alveolar and pulmonary vascular development; therefore, it is necessary to increase the oxygen concentration FiO 2 and provide alternative therapy to improve the blood gas index and oxygenation state.At the same time, the imperfect development of lung function makes infants vulnerable to invasion by many pathogenic factors, such as respiratory distress syndrome, bronchopulmonary dysplasia, infectious pneumonia, respiratory failure, and other complications (15,28).Maximum base excess is an index used to assess the metabolic acid-base imbalance in the body, which can effectively reflect the steady state of the internal environment and is a necessary condition for normal life activities of the body.The primary increase or decrease of H + and HCO3-in preterm infants can cause acidosis or alkalosis.In severe cases, organ dysfunction, acute renal insufficiency, and shock, which are serious threats to life, may occur (7,29).It can be observed that the six-item index of CRIB is an important evaluation index of clinical outcomes of preterm infants and has a strong predictive ability for the risk of death in preterm infants.Saravi et al. (30) compared the three scoring systems, CRIB, SNAP, and SNAPPE, in a retrospective analysis and found that CRIB had the highest predictive value for the mortality risk before discharge in preterm infants with birth weight ≤1,500 g, which was similar to the results of the present study.Nonetheless, CRIB has some limitations.For example, the maximum and minimum fractions of inspired oxygen in the evaluation index are easily influenced by ventilation mode and preventive application of pulmonary surfactant, and oxygen inhalation concentration values are determined by clinicians, leading to a lack of objectivity (15).Parry et al. (31) simplified and revised CRIB to obtain CRIB-II in 2003; the scoring indicators include birth weight, gestational age, temperature, sex, and maximum base excess, which are also suitable for preterm infants with a gestational age <31 weeks or a birth weight ≤1,500 g.However, compared with CRIB, the predictive efficacy of CRIB-II for the mortality risk in preterm infants is reduced (SUCRA = 0.525) (4).Although gestational age and birth weight are independent predictors of preterm infant mortality risk, CRIB's assessment of congenital malformations and lung maturity can further enhance its ability to predict mortality risk in preterm infants.SNAP, formulated by Richardson et al. (32) in 1993, includes 28 scoring indexes, such as heart rate within 24 h after birth, blood pressure, oxygenation index, arterial partial pressure of oxygen (PaO 2 ), creatinine, and blood urea nitrogen, and is suitable for newborns of various birth weights.In the same year, Richardson et al. (33) studied 1,621 newborns at three neonatal intensive care units in the United States.The results confirmed that birth weight, 5-min Apgar score, and small for gestational age were independent predictors of neonatal mortality risk.These three perinatal indicators were added to SNAP and developed into SNAPPE, which made the scoring system more comprehensive and effective.In 2001, Richardson et al. (34) simplified and corrected the scoring indicators based on SNAP and SNAPPE to obtain SNAP-II and SNAPPE-II.The SNAP-II scoring indicators were simplified to six indicators closely related to the risk of neonatal death.The lowest temperature, serum pH, PaO 2 /FiO 2 ratio, multiple seizures, urine output, and lowest blood pressure were observed in the first 12 h after admission.SNAPPE-II was formed by including three perinatal indicators based on SNAP-II.The results of this present study showed that SNAPPE had a good predictive effect on mortality risk in preterm infants (SUCRA = 0.718).The possible reasons for this are as follows: the SNAPPE score is comprehensive, and the indicators involve various organ systems, including more clinical and laboratory parameters, which can better reflect the clinical status of preterm infants.Gestational age and birth weight can be used to evaluate the growth and development of preterm infants.The 5-min Apgar score can be used to preliminarily assess asphyxia and disease severity in preterm infants.The respiratory frequency and oxygenation index can reflect pulmonary oxygenation and ventilation function in preterm infants.Blood urea nitrogen and creatinine levels can be used to diagnose renal function diseases, bilirubin is an essential index for evaluating abnormal liver function, and blood sodium and potassium are mainly used to maintain intracellular and extracellular osmotic pressure and acid-base balance, reflecting the homeostasis of the internal environment.Therefore, a perfect scoring index makes the SNAPPE evaluation more accurate; however, the disadvantage is that data collection is cumbersome, and the acquisition of some items depends on laboratory inspection, which increases the difficulty of evaluation.The results of this present study also showed that compared with SNAP and SNAP-II, SNAPPE and SNAPPE-II could predict the mortality risk in preterm infants more accurately after adding the three perinatal indicators of birth weight, 5-min Apgar score, and small for gestational age.The results further confirmed that it is feasible to improve the accuracy of prediction by adding reliable scoring items, but it is also necessary to fully consider realistic conditions and operability and prioritize the selection of more sensitive and simple indicators to optimize the scoring system (11, 35).
The NCIS was formulated by the Chinese Medical Association in 2001 and is used to evaluate the condition of critically ill newborns and predict the risk of death.A total of 11 scoring indicators were used, as follows: heart rate, blood pressure, respiratory frequency, PaO 2 , pH, sodium, potassium, creatinine, blood urea nitrogen, hematocrit, and gastrointestinal condition (36).The results of the present study showed that the ability of NCIS to predict the mortality risk of preterm infants was limited (SUCRA = 0.422), and it could not accurately assess the mortality risk in preterm infants, which is consistent with the findings of previous studies (30,37,38).Such findings may be related to the fact that the NCIS score is mainly formulated with reference to full-term infants, and its evaluation index is not targeted at predicting the condition of preterm infants (36).Additionally, some NCIS scoring indicators may not be sufficiently accurate.For example, PaO 2 /FiO 2 can more accurately reflect the respiratory function status of infants than PaO 2 , while cystatin-C and neutrophil gelatinase-associated lipocalin can more accurately reflect dynamic changes in renal function in infants than blood urea nitrogen and creatinine (10).Yang et al. (7) retrospectively analyzed 192 cases of extremely low birth weight infants and found that compared with CRIB, CRIB-II, SNAP-II, and SNAPPE-II, NCIS had the weakest correlation with birth weight and gestational age, which may be one of the reasons for its low predictive value.The NCIS scoring index must be further optimized to meet clinical needs.
NTISS, formulated by Gray et al. (39) in 1992, assesses the risk of neonatal illness and death based on treatment intensity, and includes 62 indicators, such as cardiopulmonary resuscitation, mechanical ventilation, extracorporeal membrane oxygenation, minor operation, major operation, dialysis, red blood cell transfusion, antibiotics, and insulin levels.The results of the current study revealed that the NTISS's ability to predict the mortality risk in preterm infants was weak (SUCRA = 0.478) and it was unable to objectively reflect the actual situation, which is consistent with the results of Oygur et al. (40).NTISS score indicators are based on treatment intensity to assess the mortality risk of preterm infants; however, therapeutic intervention measures may be affected by individual differences, hospitalization time, illness changes, medical equipment, and technical level and cannot fully reflect the real situation of infants (5).
Collectively, the present findings indicate that compared with other scoring systems, CRIB was more sensitive in predicting mortality risk in preterm infants, and its scoring items were sensitive indicators reflecting the condition of preterm infants.The other scoring systems have some shortcomings.For example, the SNAP, SNAPPE, and NTISS have a certain predictive value for mortality risk in preterm infants, but they have many scoring items and strict quality control of indicators.However, the equipment and instruments in grassroot hospitals are relatively simple, laboratory detection indicators are imperfect, and neonatal intensive care unit medical staff perform complex rescue tasks and often cannot evaluate them immediately.Hence, there are significant limitations to the popularization of clinical applications.Although CRIB-II, SNAP-II, and SNAPPE-II are simplified based on the original scoring system, which makes the scoring data easier to obtain and the evaluation process simpler, the predictive efficacy of the mortality risk in preterm infants is also reduced.NCIS is mainly formulated for term infants, and some indicators are not sufficiently accurate; therefore, it needs further optimization.

Study limitations
In this study, the network meta-analysis results were reliable, but several potential limitations should be noted.First, only published Chinese and English studies were included in this study, and other studies in other languages may have been missed.Second, some scoring systems in our network metaanalysis included only a few studies and studies with greater population coverage are needed in the future.Third, the included studies from different countries may have increased clinical heterogeneity.Finally, there were differences in the causes of death and evaluation times of preterm infants in the included studies, which may have affected the results.

Conclusions
The present results showed that CRIB had the best predictive efficiency for mortality risk in preterm infants and was simple to perform and easy to collect data.Thus, CRIB is recommended as the first choice for predicting the mortality risk in preterm neonates; however, the results still need to be verified using a larger number of samples.Multicenter studies should be conducted in the future to further compare the accuracy and application of various scoring systems for predicting the mortality risk in preterm infants.
of each node and the width of the connecting lines are proportional to the number of studies.The network plot is shown in Figure2.

FIGURE 3
FIGURE 3 Two independent researchers (ZLZ and ZYS) systematically searched for original research articles in Embase, PubMed, Web of Science, Cochrane Library, and Chinese Biomedical Database, from the inception of each database to March 2023, with the languages restricted to Chinese and English.The following search terms were used: "[(premature OR preterm infant OR low birth weight OR LBW OR very low birth weight OR VLBW) AND (neonatal critical illness score OR NCIS OR neonatal therapeutical intervention score system OR NTISS OR clinical risk index for babies OR CRIB OR clinical risk index for babies Ⅱ OR CRIB-Ⅱ OR score for neonatal acute physiology perinatal extension OR SNAPPE OR score for neonatal acute physiology perinatal extension Ⅱ OR SNAPPE-Ⅱ OR score for neonatal acute physiology OR SNAP OR score for neonatal acute physiology Ⅱ OR SNAP-Ⅱ)]."The Web of Science search strategy is presented in detail in Supplementary Table

TABLE 1
Characteristics of the included studies.

TABLE 2
Results of indirect comparisons of the scoring systems.