Risk stratification in adult and pediatric pulmonary arterial hypertension: A systematic review

Introduction Currently, risk stratification is the cornerstone of determining treatment strategy for patients with pulmonary arterial hypertension (PAH). Since the 2015 European Society of Cardiology/European Respiratory Society (ESC/ERS) guidelines for the diagnosis and treatment of pulmonary hypertension recommended risk assessment, the number of studies reporting risk stratification has considerably increased. This systematic review aims to report and compare the variables and prognostic value of the various risk stratification models for outcome prediction in adult and pediatric PAH. Methods A systematic search with terms related to PAH, pediatric pulmonary hypertension, and risk stratification was performed through databases PubMed, EMBASE, and Web of Science up to June 8, 2022. Observational studies and clinical trials on risk stratification in adult and pediatric PAH were included, excluding case reports/series, guidelines, and reviews. Risk of bias was assessed using the Prediction model Risk Of Bias Assessment Tool. Data on the variables used in the models and the predictive strength of the models given by c-statistic were extracted from eligible studies. Results A total of 74 studies were eligible for inclusion, with this review focusing on model development (n = 21), model validation (n = 13), and model enhancement (n = 9). The variables used most often in current risk stratification models were the non-invasive WHO functional class, 6-minute walk distance and BNP/NT-proBNP, and the invasive mean right atrial pressure, cardiac index and mixed venous oxygen saturation. C-statistics of current risk stratification models range from 0.56 to 0.83 in adults and from 0.69 to 0.78 in children (only two studies available). Risk stratification models focusing solely on echocardiographic parameters or biomarkers have also been reported. Conclusion Studies reporting risk stratification in pediatric PAH are scarce. This systematic review provides an overview of current data on risk stratification models and its value for guiding treatment strategies in PAH. Systematic review registration [https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42022316885], identifier [CRD42022316885].


Introduction
Pulmonary hypertension (PH) is a condition defined by an increased pulmonary arterial pressure. Based on pathophysiological mechanisms, clinical presentation, and hemodynamic characteristics, PH can be classified into five main groups: pulmonary arterial hypertension (PAH, group 1), PH due to left heart disease (group 2), PH due to lung disease and/or hypoxia (group 3), PH due to pulmonary artery obstructions (group 4), and PH with unclear and/or multifactorial mechanisms (group 5) (1). Each PH type can be further divided into multiple subgroups. Group 1 PAH is a progressive and eventually fatal pulmonary vascular disease. Occlusion of small pulmonary arteries leads to increased right ventricular afterload, which eventually results in right ventricular failure.
Initially, the only available treatment option for PAH was calcium channel blockers. However, these calcium channel blockers showed only beneficial to a small subset of patients with a response to acute pulmonary vasodilator testing during right heart catheterization (RHC) (2). Over the last decades, various PAH-targeted therapies have become available, such as endothelin receptor antagonists, phosphodiesterase type 5 inhibitors, guanylate cyclase stimulators, prostacyclin analogues, and selective prostacyclin receptor agonists (3). With the availability of these drugs, the treatment of PAH was initially focused on preventing disease progression and prolonging patient survival. When a patient deteriorated on initial therapy, therapy was escalated to double, triple, or maximal combination therapies. These strategies led to improved patient survival after which the focus of treatment strategies started shifting toward clinical improvement. According to current treatment algorithms, treatment decisions are recommended to be based on the assessment of mortality risk of the individual patient, estimated by using clinical prognosticators, both at initiation of therapy as well as for evaluating treatment response (3)(4)(5). Therefore, adequate prediction of risk of mortality is pivotal in the treatment of PAH patients.
To estimate patient risk status, various risk equations and risk stratification models have been established. Initially, risk equations were developed to estimate patient outcome by expressing their chances of survival in a percentage. The first time survival was estimated for PAH patients was in 1991 when D'Alonzo et al. (6) developed the NIH (National Institute of Health registry) risk equation, based on the mean pulmonary arterial pressure (mPAP), mean right atrial pressure (mRAP), and cardiac index (CI). Since then other risk equations have been developed, such as the French PAH registry equation (7), the PHC (Pulmonary Hypertension Connection) survival equation (8), and the REVEAL (Registry to Evaluate Early and Long-term Pulmonary Arterial Hypertension Disease Management) risk equation (9). From this original REVEAL risk equation, consisting of nineteen etiologic factors and parameters, the first risk stratification model was derived (10).
Currently, treatment strategies are guided by risk stratification, as proposed by the consecutive European Society of Cardiology/European Respiratory Society (ESC/ERS) guidelines for the diagnosis and treatment of PH (3,4) and the American College of Cardiology Foundation/American Heart Association (ACCF/AHA) expert consensus document on PH (11). According to these strategies, patients are categorized as having low, intermediate, or high risk for mortality, where the aim is to achieve and maintain a low-risk status. The estimated risk is based on multiple clinical, hemodynamic, and echocardiographic parameters with their own cut-off values for each risk category. A risk stratification guided treatment strategy has also been proposed for children with PAH during the World Symposium on Pulmonary Hypertension (WSPH), using the binary strata low and high risk (12, 13).
The aim of this systematic review is to provide an overview of the current risk stratification models in adult and pediatric PAH. With the growing number of risk stratification models it is crucial to assess the reliability and accuracy of these models, especially since their use in daily practice is advocated. Therefore, the two research questions addressed in this systematic review are: (1) which variables are used for risk stratification models in PAH and (2) what is the prognostic value of risk stratification models for transplant-free survival or all-cause mortality?

Methods
This review is reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis for Scoping Reviews (PRISMA) (14). The objectives, inclusion criteria and methods adopted in this systematic review were specified and documented in advance (Prospero registration number: CRD42022316885).

Eligibility criteria
Clinical trials and observational studies focused on risk stratification models both in adult (age ≥ 18 years) and pediatric (age < 18 years) PAH patients were eligible for inclusion. Pediatric patients with PH due to lung disease were also considered eligible for inclusion because of the pathological crossover between PAH and the abnormal pulmonary vascular development, seen in developmental lung diseases such as bronchopulmonary dysplasia and congenital diaphragmatic hernia. In these studies the diagnosis had to be confirmed by RHC, or echocardiography in infants with developmental lung disease, and meet the hemodynamic definitions (1). Additionally, the risk stratification model was considered a model only if it comprised at least three variables. Results and conclusions had to be supported by appropriate statistical methods with endpoints defined as transplant-free survival or all-cause mortality. Furthermore, studies had to be written in English.
Studies reporting risk stratification models in adult patients with PH group 2, 3, 4, and 5 according to the Nice 2018 classification (1), and pediatric patients with PH group 2, 4, and 5 were excluded, as well as case reports, case series, guidelines, and reviews. If less than three variables were used for risk stratification models or endpoints other than transplant-free survival or all-cause mortality, studies were excluded as "no risk stratification model" or "not eligible endpoint, " respectively. Studies not meeting the inclusion criteria and not fitting any of the above mentioned exclusion reasons were excluded as "other." In this review, survival or risk equations were not considered as risk stratification models.

Information sources and search strategy
Systematic literature searches were conducted in the electronic databases MEDLINE (PubMed), Embase (Elsevier), and Web of Science (Clarivate). The search strategies were developed in collaboration with an information specialist (SW). The structure of the search strategies is based on two concepts: (1) PAH, pediatric PH and (2)

Managing references and selection process
The results of the database searches were exported to the reference management program EndNote, version 20. In EndNote, duplicate items were determined and removed following the steps described by Bramer et al. (15). The de-duplicated results were exported to the screening program Rayyan.
Two researchers independently performed the screening in Rayyan in two steps. In the title-abstract screening, articles were excluded that were clearly not relevant. Potentially relevant articles and articles with insufficient information in the titles or abstract selected by at least one of the researchers were selected for the full-text screening. In the full-text screening, the two researchers independently judged if the selection criteria were met. Disagreements in decisions between the screeners were solved by a third reviewer. Finally, articles that met the criteria, as agreed by the researchers, were included and divided into four classes judged on the primary aim of the article: (1) model development, (2) model validation, (3) model enhancement, and (4) serial risk stratification. In accordance with the aims of the systematic review, the authors focused on the studies belonging to class 1, 2, and 3. Studies in class 4 focused on risk stratification at follow-up and/or changes in risk score or stratum between baseline and follow-up, whether or not under the influence of intervention, and were hence disregarded from the current review.

Data collection
Data was extracted from the included studies using a standardized data extraction form. Extracted data included: study setting, population demographics and baseline characteristics, variables used in the risk stratification model including cut-off points and defined endpoint, statistical methodology, and the prognostic value of the model.

Analysis
To present an overview of the variables used in risk stratification models, multiple tables were produced. Each table reports the model name or basis, the used definition of risk, the number of risk strata, the number of variables, and specifies which variables are used for the model. Separate tables were created for the renowned risk stratification models (containing both development and validation), the lesser studied models, model enhancement, and pediatric risk stratification strategies.
For the evaluation of the prognostic value of the risk stratification models, the reported c-statistic was used. The c-statistic is equivalent to the area under the receiver operating characteristic curve (AUROC) and is a measure of the discriminatory ability. It can be interpreted as the probability that a patient who died had a higher predicted probability of death than a patient who survived. A c-statistic of 1.0 shows a perfect prediction, whereas a c-statistic of 0.5 is indicative of poor prediction and the model is no better than chance. Hence, the model with the higher c-statistic (or greater AUROC) is better at discriminating between survival and death (17).

Risk of bias
Risk of bias (ROB) was assessed using the Prediction model Risk Of Bias Assessment Tool (PROBAST) for every study and in case of studies including multiple models, separately for the different risk stratification models (16). This tool consists of four domains -participants, predictors, outcome, and analysis-with a total of 20 signaling questions to assist in assessing ROB. These questions can be answered as (probably) yes, (probably) no, or no information, with "no" indicating potential bias. For model development studies, the development signaling questions were answered, and for validation studies the validation questions. In the case of studies reporting both the development of a model and the validation of this model or other models, both the development and validation signaling questions were answered for each model separately. For model enhancement studies, the development questions were acknowledged, as well as the validation questions when the original model was also validated. Besides ROB, the applicability of the model was evaluated to determine the relevance of the participants, predictors, and outcome to the research question. ROB and applicability assessment was performed by one researcher, but when in doubt, a second researcher was consulted.

Identified studies
In Figure 1, the PRISMA flowchart for the identification of studies is shown. A total of 2,395 records were identified from the databases. After duplicate removal, 1,539 studies remained for abstract screening of which 1,385 were excluded during abstract screening. Of the 154 full-text screened studies, 80 were excluded (Supplementary Table 2). The remaining 74 studies were considered eligible for inclusion (Supplementary Table 3), of which two studies involved pediatric PAH patients. No studies concerning risk stratification in pediatric PH due to lung disease were retrieved, as such our results focus on RHC confirmed PAH only. The 31 studies concerning serial risk stratification were disregarded, since the current study focusses solely on model development (n = 21), validation (n = 13), and enhancement (n = 9) of risk stratification models, resulting in a total of 43 studies to be discussed in this review. The main characteristics of these 43 studies are presented in Tables 1-3 for respective development, validation, and enhancement studies.

Variables in risk stratification
We have identified multiple risk stratification models, such as the REVEAL risk calculator and the ESC/ERS 2015 guidelines-based COMPERA, SPARH, FPRH invasive and noninvasive models, and other abbreviated versions of the ESC/ERS 2015 guidelines. In Table 4 an overview of the variables used for these risk stratification models is given, along with the total number of variables used in each model, the risk definition, and the number of strata.
The first REVEAL risk calculator was developed by Benza et al. (10) in 2012 and consisted of twelve variables: WHO Functional Class (WHO-FC), 6-minute walk distance (6MWD), N-terminal-pro brain natriuretic peptide (NT-proBNP, or brain natriuretic peptide-BNP), pericardial effusion, mRAP, pulmonary vascular resistance (PVR), WHO group 1 subgroup, male older than 60 years of age, renal insufficiency, systolic blood pressure (SBP), heart rate (HR), and percentage predicted carbon monoxide lung diffusing capacity (DL CO ). Points are assigned to every variable, with its weight based on the results of the multivariable Cox proportional hazard model.    (24)(25)(26)(27)(28)(29)(30)(31)(32)(33). The variables used in these models are also shown in Table 4. From this table it can be observed that most often used variables in risk stratification models are WHO-FC, 6MWD, NT-proBNP, mRAP, CI, and SvO 2 . The enhancement of above mentioned risk stratification models has been explored by several studies by adding one or more imaging or biomarker variables to the models, such as the right ventricular end-systolic volume index (34), estimated  Table 5 an overview of the model enhancement studies with the variables is presented. Additionally, others have tried to create risk stratification models based solely on echocardiographic parameters (37,38) or biomarkers (39, 40) ( Table 6). For example, Ghio et al. (38) used the echocardiographic parameters tricuspid annular plane systolic excursion (TAPSE), degree of tricuspid regurgitation (TR) and a marker of systemic venous congestion represented by inferior vena cava diameter. Yogeswaran et al. (40) developed a model with the biomarkers -glutamyl transferase (GGT), aspartate aminotransferase/alanine aminotransferase (AST/ALT) ratio, and neutrophil-to-lymphocyte ratio (NLR). A different approach for developing a risk stratification model was shown by Haddad et al. (41). They attempted to model the data architecture by creating a network graph. This graph shows the connectivity of every parameter with the other parameters and identified NT-proBNP as the most central (important) parameter.

Prognostic value of risk stratification models
The prognostic value of the REVEAL risk scores in different studies and populations is shown in Figure 2 by a forest plot of the c-statistic with its 95% confidence interval (95% CI). The c-statistic was found to range from 0.70 to 0.75 for the REVEAL risk score calculator and from 0.65 to 0.74 for the REVEAL 2.0 calculator. REVEAL Lite 2 had a c-statistic of 0.70.
In Several enhancement studies were found to have an increase in c-statistic upon the addition of an imaging or serum biomarker to a previously described model (Figure 4)

Risk of bias
The PROBAST results of the ROB analysis are presented in Table 8. The ROB for the domains participants, predictors, and outcome was low for almost every study. However, many studies were judged as having a high ROB based on the described analysis, causing an overall high ROB for nearly all studies. To differentiate between studies scoring poorly on one or two signaling questions and those failing on nearly all aspects of the analysis, the judgment is marked with one, two or three asterixis. These asterixis correspond to respective one to three, four to six, and seven or more negatively answered questions ("no" or "no information") out of nine for development studies and out of six for validation studies. There was low concern regarding applicability of models for participants, predictors, and outcome. C-statistic (95% CI) of biomarker and eigenvector centrality risk stratification models. NT-proBNP, N-terminal-pro brain natriuretic peptide. $ Derivation cohort, validation cohort, * cohort includes PAH and chronic thromboembolic PH patients.

Discussion
In this systematic review we identified twenty different risk stratification models that have been proposed for adult PAH and only two for pediatric PAH. The REVEAL risk calculators are the most frequently validated models in literature, followed by the COMPERA model and FPHR invasive and non-invasive models. For the enhancement of existing risk stratification models, the FPHR invasive method and REVEAL 2.0 calculator have been studied most frequently. The non-invasive WHO-FC, 6MWD, and BNP/NT-proBNP, and the invasive mRAP, CI, and SvO 2 were found to be the variables that are most often used for the risk stratification of PAH. Reported c-statistics representing model predictive strength range from 0.39 to 0.77. Studies enhancing models by adding new variables report improvement of model strength.
Most risk stratification models include the non-invasive variables of WHO-FC, 6MWD, and BNP/NT-proBNP. The inclusion of these parameters in risk stratification may be due to the extensive studies on the prognostic value of these parameters, and stresses their important prognostic abilities in adult PAH patients. Based on the comparable predictive strength of non-invasive models and models including invasive parameters reported in three studies (19, 45, 46), a fully non-invasive risk stratification may be feasible. However, data supporting fully non-invasive risk stratification models are still scarce. Therefore, it may still be too early to set aside the invasive parameters included in most risk stratification models.
In the identified risk stratification models, different methods are used to combine cut-off scores of individual variables to determine the overall risk status. The three main definitions of risk are (1) the number of low risk criteria, (2) risk category based on an average score, and (3) risk category based on the total sum of the score. Furthermore, the risk stratification models can use weighted or unweighted variables. Risk stratification models using the number of low risk criteria (e.g., FPHR invasive and non-invasive method) or an average score (e.g., COMPERA and SPAHR models) do not take the weight of the variables into account for their determination of risk. This may lead to an underestimation or overestimation of patient risk. The REVEAL risk calculators were the only models found to consider the weighted values for individual variables in the calculation of risk. Variables that showed at least a twofold increase in hazard for mortality according to the multivariable Cox proportional hazard model were assigned two points, whereas variables with lower hazard received one point (10). This inclusion of variable weight in the risk estimation does appear to have an effect on the discriminatory ability of the model. C-statistics found in studies using the REVEAL risk calculators were, in general for most studies, higher (0.70-0.75) than those reported for COMPERA and FPHR models (0.62-0.69). These findings may favor the use of weighted risk scores instead of averages or number of low or high risk criteria in further development of risk stratification models.
Overall, the c-statistic of most studies was found to range between 0.6 and 0.8. Considering that a c-statistic of 0.5 indicates a poor prediction and 1.0 a perfect prediction, we may consider the current risk stratification models to have a moderate predictive ability. Whether or not this is sufficient enough to rely on for optimal treatment strategies can be debated. In the recently released 2022 ESC/ERS guidelines for the diagnosis and treatment of PH (4), the four-strata COMPERA 2.0 model of Hoeper et al. (22) using WHO-FC, 6MWD, and BNP/NT-proBNP is recommended for risk stratification at follow-up to guide treatment strategies in adult patients with PAH. The c-statistic for 1 year mortality of this four-strata model was reported to be 0.67 at baseline WHO-FC, WHO functional class; 6MWD, 6-minute walk distance; NT-proBNP, N-terminal-pro brain natriuretic peptide; RA area, right atrial area; mRAP mean right atrial pressure; CI, cardiac index; SvO2, mixed venous oxygen saturation; PVR, pulmonary vascular resistance; BP, blood pressure; DLCO, carbon monoxide lung diffusing capacity; ST2, soluble suppressor of tumorigenicity-2; TAPSE, tricuspid annular plane systolic excursion; BMI, body mass index; mPAP/mSAP, mean pulmonary arterial pressure/mean systemic arterial pressure ratio; PVRI, indexed PVR. C-statistic (95% CI) of risk stratification models used in pediatric PAH. ST2, soluble suppressor of tumorigenicity-2; SvO2, mixed venous oxygen saturation; RA area, right atrial area. and 0.73 at follow-up, in an external validation study by Boucly et al. (42). According to this, the authors would advocate that we should strive for improving current risk stratification models. A possible approach for improving risk stratification models may be the addition of new parameters. The increase of the c-statistic in all enhancement studies, except for the addition of arterial carbon dioxide partial pressure to the FPHR noninvasive model (43), shows that the predictive strength of risk stratification models can be improved by adding imaging or serum biomarkers. Of all the enhancement studies, the addition of the right ventricular end-systolic volume index seems most promising (34). Prospective and external validation studies are needed to further establish the predictive value of enhanced models.
Furthermore, the use of risk stratification is not restricted to estimate risk at diagnosis or initiation of therapy. Also serial risk stratification every 3-6 months is proposed in order to use follow-up risk estimates to evaluate treatment response and to identify the need to escalate therapy (47). Recent reports show that risk stratifications may have a better discrimination of outcome at first follow-up RHC compared to baseline (48), and that changes in risk status are predictive of survival (49). Moreover, the addition of serial changes in NT-proBNP or right heart reverse remodeling (a combination of three echocardiographic parameters) increased the c-statistic of respective the eigenvector centrality model of Haddad et al. (41) (0.81-0.85) and the REVEAL 2.0 calculator (0.69-0.87) (50). As such the strength of risk stratification models may lie in serial assessments.
Data regarding the use of risk stratification models in pediatric PAH is extremely scarce. In this review, only two pediatric PAH studies were found, one based on the variables recommended by the WSPH 2013 pediatric task force and one based on the REVEAL 2.0 calculator. Nonetheless, risk stratification to guide treatment strategies is currently recommended also in the pediatric population. The updated guideline of the European Pediatric Pulmonary Vascular Disease for the diagnosis and treatment of pediatric pulmonary hypertension presents a risk score sheet for pediatric PH based solely on expert opinion (51). However, no validation yet exists and in the guideline it is stated that it is not clear which cut-offs should be used for the risk stratification variables. For this reason, Haarman et al. (45) in their study used cut-off values derived from separate prognosticator studies in children with PAH. Considering the reference class problem, which dictates that the prediction for the individual patient depends on the reference class the patient is assigned to, it is recommended to develop a risk stratification model with variable cut-offs and weights designed specifically for the pediatric population.
Nearly all studies included in this systematic review were judged to have a high ROB based on their analysis. This can be explained with closer observation of the analysis domain of PROBAST (16), the tool that was used to rate ROB. First, according to PROBAST, the number of events (death or death + transplant) per variable should be higher than 10 for development studies, and for validation studies at least 100 participants with the outcome are required. These criteria were met by approximately only half of the included adult studies. Since pediatric PAH is a rare disease, none of the included pediatric studies met the criteria for the number of participants, which shows the limitation of the applicability of PROBAST in a rare disease. Secondly, if continuous variables were dichotomized or categorized for the development of a model, according to PROBAST the model could have a high ROB. However, categorization forms the basis of risk stratification and thus many model development were rated to have a high ROB. For validation studies categorization of continuous variables was allowed if the cut-offs were similar to the original model. Third aspect in PROBAST is the inclusion of all enrolled participants in the analysis and the appropriate handling of missing data, since excluding patients with missing data may cause selection bias. Besides, selection bias is a reasonable risk of registry studies since there are nearly always missing data due to the data not being collected according to a protocol or for the research   , mixed venous oxygen saturation; RA area, right atrial area; PaCO2, arterial carbon dioxide partial pressure; ISWD, incremental shuttle walk distance; RVESVi, right ventricular end-systolic volume index; NT-proBNP, N-terminal-pro brain natriuretic peptide; TAPSE/TRV, tricuspid annular plane systolic excursion/tricuspid regurgitation velocity ratio; TAPSE/sPAP, tricuspid annular plane systolic excursion/systolic pulmonary artery pressure ratio; eGFR, estimated glomerular filtration rate. +Indicates low ROB/low concern regarding applicability; -indicates high ROB/high concern regarding applicability; *1-3, **4-6, ***≥ 7 negatively answered questions. question at hand. Fourth, multiple studies did not report c-statistic or AUROC, where PROBAST demands reporting of both calibration and discrimination measures. Information on model overfitting and optimism in model performance was also often not described. Finally, the weights of the variables in the final model had to correspond to the results from the reported multivariable analysis. As discussed earlier, models defining risk by the number of low risk criteria or based on an average score do not take the weight of a variable into account, and thus these studies were also at high ROB. This study has several limitations. Not all included studies reported a c-statistic, which may have caused a bias in the judgment of the prognostic value of the models. The patients included in studies performed more recently were receiving treatment according to the risk stratification-based treatment algorithms. This may have influenced the outcome of those patients, which could have affected the prognostic value of risk stratification models of these studies. No meta-analysis was performed limiting direct conclusions on which model performs best. In order to keep focus, the studies concerning serial risk stratification were disregarded, limiting the ability to discuss the value of serial follow-up risk stratification.
For future purposes, it is recommended to perform prospective validation studies of the risk stratification models since now only retrospective studies of risk stratification exist. Studies developing new models or validating existing models should consider including both calibration and discrimination measures as both are needed to thoroughly describe the performance of the model. Furthermore, an individual patient data systematic review is recommended to define which risk stratification model has the best performance.

Conclusion
This systematic review contributes to our current knowledge on risk stratification in PAH and emphasizes the very limited presence of studies reporting risk stratification in pediatric PAH. The variables found to be used the most frequently in risk stratification models are WHO-FC, 6MWD, NT-proBNP (or BPN), mRAP, CI, and SvO 2 . The prognostic value of current risk stratification models is moderate to good, at best, and may be improved by adding new imaging and serum biomarkers, using weighted risk stratification variables, and adding changes in clinical parameters at serial risk stratification during follow-up. Moreover, there is a need for prospective validation of risk stratification models and more research into risk stratification for pediatric PAH has to be pursued.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.