Incremental prognostic value of lung ultrasound on contemporary heart failure risk scores

Introduction: Over the last decades, several scores have been developed to aid clinicians in assessing prognosis in patients with heart failure (HF) based on clinical data, medications and, ultimately, biomarkers. Lung ultrasound (LUS) has emerged as a promising prognostic tool for patients when assessed at discharge after a HF hospitalization. We hypothesized that contemporary HF risk scores can be improved upon by the inclusion of the number of B-lines detected by LUS at discharge to predict death, urgent visit, or HF readmission at 6- month follow-up. Methods: We evaluated the discrimination improvement of adding the number of B-lines to 4 contemporary HF risk scores (Get with the Guidelines -GWTG-, MAGGIC, Redin-SCORE, and BCN Bio-HF) by comparing the change in the area under the receiver operating curve (AUC), the net reclassification index (NRI), and the integrated discrimination improvement (IDI). The population of the study was constituted by the 123 patients enrolled in the LUS-HF trial, adjusting the analyses by the intervention. Results: The AUC of the GWTG score increased from 0.682 to 0.789 (p = 0.02), resulting in a NRI of 0.608 and an IDI of 0.136 (p < 0.05). Similar results were observed when adding the number of B-lines to the MAGGIC score, with an AUC that increased from 0.705 to 0.787 (p < 0.05). This increase translated into a NRI of 0.608 and an IDI of 0.038 (p < 0.05). Regarding Redin-SCORE at 1-month and 1-year, the AUC increased from 0.714 to 0.773 and from 0.681 to 0.757, although it did not reach statistical significance (p = 0.08 and p = 0.06 respectively). Both IDI and NRI were significantly improved (0.093 and 0.509 in the 1-month score, p < 0.05; 0.056 and 0.111 in the 1-year score, p < 0.05). Lastly, the AUC for the BCN Bio-HF score increased from 0.733 to 0.772, which was statistically non-significant, with a NRI value of 0.363 (p = 0.06) and an IDI of 0.092 (p < 0.05). Conclusion: Adding the results of LUS evaluated at discharge improved the predictive value of most of the contemporary HF risk scores. As it is a simple, fast, and non-invasive test it may be recommended to assess prognosis at discharge in HF patients.


Introduction
Risk prediction in heart failure (HF) remains essential to identify those patients who may benefit from a closer management. Since the turn of the century, several scores have been proposed and externally validated (Levy et al., 2006;Peterson et al., 2010;Pocock et al., 2013;Lupón et al., 2014;Álvarez-García et al., 2015). Most can be easily calculated using demographics, laboratory, and medication data, and are available through free-access websites. Nevertheless, no predictive scale has been found uncontrovertibly better than the rest (Codina et al., 2021), illustrating the complexity of risk prediction in HF.
As HF treatment has dramatically evolved during the last decades, existing prognostic scores, have been continuously updated adding emerging data from both newer therapeutic and diagnostic tools (Sinha et al., 2021). Particularly, lung ultrasound (LUS) has emerged in the last years as a simple and non-invasive instrument for detecting pulmonary congestion in patients with HF. Its prognostic value has been assessed in different clinical scenarios (Coiro et al., 2015;Coiro et al., 2015;Gargani et al., 2015;Platz et al., 2016;Scali et al., 2017;Coiro et al., 2020;Rivas-Lasarte et al., 2020;Domingo et al., 2021;Gargani et al., 2021;Kobayashi et al., 2021;Mazzola et al., 2021;, showing that the presence of B-lines detected by LUS is associated with an increased risk of worse outcomes.
Thus, we hypothesized that contemporary risk scores can be improved by incorporating the number of B-lines detected by LUS at HF discharge to predict death or hospital readmission at 6-month follow-up.

Study design
This is a sub-analysis including 123 patients enrolled in the LUS-HF trial, whose study design and primary results have been previously reported (Rivas-Lasarte et al., 2019). In brief, the LUS-HF was a single-center, single-blind, randomized clinical trial evaluating tailored LUS-guided diuretic treatment of pulmonary congestion in patients with HF. Patients were required to be aged ≥18 years and to have been hospitalized for HF defined by shortness of breath, pulmonary congestion on X-ray, and elevated N-terminal pro B-type natriuretic peptide (NT-proBNP) values in the first 24 h of admission (cut-off values: 450 ng/L in patients aged <50 years; >900 ng/L in patients aged 50-75 years; >1800 ng/L in patients aged >75 years). Exclusion criteria included inability to attend follow-up visits, life expectancy of <6 months, haemodialysis, and the presence of severe lung disease preventing LUS interpretation. Eligible patients were randomized at discharge to either the non-LUSguided group (control group) or the LUS-guided group (LUS group). Visits were scheduled in the HF clinic at 14, 30, 90, and 180 days after discharge. LUS was performed in both groups, but the result was only available to the treating physician in the LUSguided arm.
The primary endpoint was a composite of urgent visit, hospitalization for worsening HF, and death at 6 months. Urgent visits for worsening HF were defined as visits to the emergency department or un-scheduled visits to the HF unit as a result of signs and/or symptoms of worsening HF that required intravenous diuretic treatment or diuretic increase with a hospital stay of <24 h. Hospitalization for worsening HF was defined as a stay in hospital for >24 h mainly as a result of signs and/or symptoms of worsening HF. The reported events were reviewed by an independent panel of investigators.
The protocol was approved by the ethics committee and the study was conducted in accordance with the principles of the Declaration of Helsinki. Written informed consent was obtained from all patients prior to study participation.

Lung ultrasound protocol
According to current expert recommendations (Platz et al., 2019), LUS was recorded using a pocket ultrasound device (VScan, General Electrics) with a cardiac phased array transducer at four sites in each hemithorax (mid clavicular, mid axillar superior and inferior in each side) with the transducer perpendicular to the ribs and at a 16 cm imaging depth being the patient in the semi-recumbent position. The number of B-lines reported was the sum of the B-lines visualized in each thoracic site. For this post-hoc analysis, the number of B-lines detected by LUS at discharge was analysed.

Contemporary HF risk scores
We selected 4 contemporary scores: Get with the Guidelines, MAGGIC, Redin-SCORE and BCN Bio-HF.

2.3.1
The get with the guidelines-heart failure score (GWTG-HF) (Peterson et al., 2010) The GWTG-HF score incorporates 9 variables (age, systolic blood pressure (BP), body mass index (BMI), total cholesterol, high-density lipoprotein cholesterol, QRS duration, smoking status, use of antihypertensive medication, use of diabetes medication) to predict the risk of in-hospital mortality for patients hospitalized with HF.

2.3.2
The meta-analysis global group in chronic heart failure score (MAGGIC) (Pocock et al., 2013) The MAGGIC score was derived from a metanalysis of 30 studies to predict mortality rates in patients with HF, and includes 13 predictors: age, lower ejection fraction (EF), New York Heart Association (NYHA) class, serum creatinine, diabetes, not prescribed beta-blocker, lower systolic BP, lower BMI, time since diagnosis, current smoker, chronic obstructive pulmonary disease, male gender, and not prescribed angiotensin converter enzyme inhibitors (ACEi) or angiotensin-receptor blockers.

2.3.4
The BCN Bio-HF score (Lupón et al., 2014) The first version of the BCN Bio-HF included clinical variables, medications, conventional laboratory analytes *Renal insufficiency refers to eGFR <60 ml/min/1.73 m2. **Anaemia refers to haemoglobin levels of <13 g/dl in men and <12 g/dl in women. Data are expressed as number (%), mean ± standard deviation, or median (interquartile range), as appropriate.

Statistical analysis
Continuous variables are expressed as mean (standard deviation) or as median (interquartile range) whenever appropriate. Differences in continuous variables were tested by the analysis of variance (ANOVA), Student's t-test, or Wilcoxon signed rank test for independent samples. Categorical variables were presented as frequency and percentage. Differences in the categorical variables were assessed by the χ2 test or by Fisher's exact test.
Discrimination, calibration and reclassification methods are recommended when evaluating candidate variables in prognostic studies (Januzzi et al., 2014). Thus, we first assessed the discriminative ability of each selected HF score to predict the occurrence of the primary endpoint at 6 months in the study population by calculating the area under the receiver operating curve (AUC). Thereafter, we analysed the incremental prognostic value of the number of B-lines at discharge on top of each score by comparing the AUC with and without LUS data and calculated the integrated discrimination improvement (IDI), and net reclassification improvement (NRI). Finally, we also performed decision curve analysis (DCA) to visualize the net benefit for clinical decisions. Data were analysed using STATA SE Version 15.0 (StataCorp LLC, College Station, TX, United States). A two-sided p < 0.05 was considered significant.

Characteristics of the study population and number of B-lines at discharge
Clinical characteristics of the LUS-HF population and LUS data at discharge are shown in Table 1. Briefly, median age of the patients was 70 years, most patients were male (72%), had a reduced LVEF (55%), and a high prevalence of comorbidities. Median number of B-lines at discharge was 4 (Levy et al., 2006;Pocock et al., 2013;Lupón et al., 2014;Álvarez-García et al., 2015;Codina et al., 2021;Sinha et al., 2021) and 41 patients (33%) had ≥5 B-lines. Table 2 summarizes the discrimination, calibration, IDI, and NRI parameters by the 4 HF scores for the primary outcome alone and in combination with the number of B-lines. Overall, the addition of the number of B-lines at discharge improved the AUC of each risk score (Figure 1). However, the incorporation of the number of B-lines only reached statistically significance for the GWTG and MAGGIC scores, but not for the Redin-SCORE at 1-month and 1-year, nor the BCN Bio-HF.

Incremental prognostic value of LUS over contemporary heart failure risk scores
Regarding reclassification indexes, both NRI and IDI after adding the number of B-lines showed a significant improvement with all scores, except for NRI in BCN Bio-HF Frontiers in Physiology frontiersin.org 04 score. As Figure 2 shows, the calibration curves indicating good concordance. Finally, Figure 3 displays the DCA, showing that the net benefit of adding LUS data was higher than that of the score alone for any threshold probabilities, except for the GWTG score, which applied only for an event probability under 70%.

FIGURE 1
Comparison between the Receiver Operating Characteristic curves (ROC) for the composite endpoint at 6-month follow-up: score alone versus score + number of B-lines. ROC curves compare sensitivity versus specificity across a range of values for the ability of the score to predict the composite endpoint. Each patient is given a score with the intention that the test will be useful in predicting event occurrence and the different points on the curve correspond to the different cutpoints used to determine whether the test results are positive. Adding B-lines to GWTG, MAGGIC and REDIN-Score 1 year scores (A,B,D) makes the true positive rate higher and the false positive rate lower at all cutpoints compared with the score alone. Regarding BCN Bio-HF and REDIN-Score 1 m (C,E) adding B-lines improves both sensitivity and specificity in almost all cutpoints.
Frontiers in Physiology frontiersin.org 4 Discussion

Main findings
Our work shows that the predictive value of contemporary HF risk scores can be improved by integrating LUS.

Prognostic value of LUS over HF risk scores
Prior research (Bettencourt et al., 2004;Khanam et al., 2018;Lupón et al., 2018) has already focused in analysing the prognostic value of new clinical variables to allow better Frontiers in Physiology frontiersin.org 06

FIGURE 3
Decision curve analysis for predicting the primary composite endpoint. Decision curve analysis illustrates the performance of the model in a range of threshold probabilities, which may of interest to the clinician making the decision. X axis represents the probability threshold for the composite endpoint according to the score. The y axis represents the net benefit ([true positives -w x false positives]/total number of patients): positive values indicate an improvement in the classification of patients, and w is a correction factor for the probability threshold. The upper limit is 0.32 because the incidence of readmission for HF in LUS-HF was 32%. The diagonal black line assumes that all expected patients were readmitted, 32% at 6 months. The coloured lines represent the result of applying the different scores. Adding B-lines provided a net benefit due to better classification of the patients for probabilities below 70% in GWTG and BCN Bio-HF scores (A,E). When B-lines were incorporated to MAGGIC score (B), a net benefit was obtained due to better classification of the patients for probabilities between 0 and 80%. Regarding REDIN-score 1 year and 1m, a net benefit was obtained in all probability spectrum when using LUS data (C,D).
Frontiers in Physiology frontiersin.org 07 prediction, such as the incorporation of ARNI or the effect of adding natriuretic peptides. The BCN Bio-HF score was one of the pioneers developing an updated version integrating those variables that allowed a better risk prediction.
LUS has emerged in the last decade as a simple, fast, and noninvasive test for lung congestion quantification. Several studies have shown that it might be a better tool for detecting subclinical pulmonary congestion than clinical assessment (Platz et al., 2016;Pellicori et al., 2019) and it has become widely available in an increasing number of centres, with the generalization of echographic equipments including pocket devices.
As NT-proBNP, it has also been reported that the presence of B-lines in HF patients is an independent prognostic factor (Coiro et al., 2015;Gargani et al., 2015;Gustafsson et al., 2015;Aras and Teerlink, 2016;Platz et al., 2016;Rivas-Lasarte et al., 2020;Domingo et al., 2021) although no study to date has analysed its prognostic value when added to the most used contemporary risk scores. To the best of our knowledge, this is the first study analysing if risk scores can be improved upon by the inclusion of B-lines detected by LUS at discharge and we found that the predictive yielding improved in a different degree according to the presence or absence of hemodynamic or biochemic markers of left ventricular function in their respective models.
Moreover, the number of B-lines is not only a prognostic marker but has shown to be also a therapeutic target in HF patients improving their prognosis when monitored during follow-up, mainly due to a reduction of HF decompensations (Rivas-Lasarte et al., 2019). As it is a dynamic marker that evolves with therapeutic measures, we hypothesize that its changes may also be of interest in predicting prognosis, although this remains to be elucidated in further studies.

Clinical implications
Risk stratification remains essential in HF to make medical decisions based on life expectancy and develop appropriate treatment plans, but the accuracy of available prognostic risk scores in patients with HF is still limited. Our study contributes on this important issue by integrating in existing contemporary HF risk scores LUS and allowing for a significant improvement in their predictive value in the majority of cases.
Some variables included in the pre-existing predictive HF models are not frequently obtained in the clinical care of HF, but LUS can be performed quickly and easily at bedside, and has already become an add-on to lung auscultation for the evaluation of pulmonary congestion.
As a semi-quantitative measure of pulmonary congestion, LUS adds new and valuable information to the scores. Due to its dynamic behaviour, it can be used as a monitoring tool allowing reassessment of patient's status whenever clinical situation changes, and also as a therapeutic target. Several studies had proved a LUS-guided therapy reduces acute decompensation events in the follow-up (Rivas-Lasarte et al., 2019;Araiza-Garaygordobil et al., 2020;Marini et al., 2020;Mhanna et al., 2021;Rastogi et al., 2022), which explains its rapid and wide implementation in the HF field.

Study limitations
Our study has some limitations. First, we tested prognostic scores which were specifically designed for ambulatory HF patients in a sample that was comprised by HF patients discharged from hospital. Second, LUS-HF was designed for a 6-month follow-up which may have determined an underestimation of the number of events, since some scores were originally designed to predict longer time points. Also, this is a retrospective (not pre-specified) analysis of the LUS-HF. Finally, our analysis accounts for a composite endpoint consisting in HF hospitalizations, urgent visits for worsening heart failure and all-cause mortality so it may not be generalized to prognostic risk scores specifically designed for other outcomes.
We consider our study as hypothesis generating and acknowledge the need of testing the hypothesis in other HF larger cohorts, especially multicentric and with a longer follow-up.

Conclusion
Adding the results of LUS evaluated at discharge improved the predictive value of most of the contemporary HF risk scores. As it is a simple, fast, and non-invasive test it may be recommended to assess prognosis at discharge in HF patients.

Data availability statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.

Ethics statement
The studies involving human participants were reviewed and approved by the Hospital Sant Pau. The patients/participants provided their written informed consent to participate in this study.