Predictors of Diffusing Capacity in Children With Sickle Cell Disease: A Longitudinal Study

Background: Gas exchange abnormalities in Sickle Cell Disease (SCD) may represent cardiopulmonary deterioration. Identifying predictors of these abnormalities in children with SCD (C-SCD) may help us understand disease progression and develop informed management decisions. Objectives: To identify pulmonary function tests (PFT) estimates and biomarkers of disease severity that are associated with and predict abnormal diffusing capacity (DLCO) in C-SCD. Methods: We obtained PFT data from 51 C-SCD (median age:12.4 years, male: female = 29:22) (115 observations) and 22 controls (median age:11.1 years, male: female = 8:14), formulated a rank list of DLCO predictors based on machine learning algorithms (XGBoost) or linear mixed-effect models, and compared estimated DLCO to the measured values. Finally, we evaluated the association between measured or estimated DLCO and clinical outcomes, including SCD crises, pulmonary hypertension, and nocturnal desaturation. Results: Hemoglobin-adjusted DLCO (%) and several PFT indices were diminished in C-SCD compared to controls. Both statistical approaches ranked FVC (%), neutrophils (%), and FEF25−75 (%) as the top three predictors of DLCO. XGBoost had superior performance compared to the linear model. Both measured and estimated DLCO demonstrated a significant association with SCD severity: higher DLCO, estimated by XGBoost, was associated with fewer SCD crises [beta = −0.084 (95%CI: −0.13, −0.033)] and lower TRJV [beta = −0.009 (−0.017, −0.001)], but not with nocturnal desaturation (p = 0.12). Conclusions: In this cohort of C-CSD, DLCO was associated with PFT estimates representing restrictive lung disease (FVC, TLC), airflow obstruction (FEF25−75, FEV1/FVC, R5), and inflammation (neutrophilia). We used these indices to estimate DLCO, and show association with disease outcomes, underscoring the prediction models' clinical relevance.


INTRODUCTION
Sickle cell disease (SCD) is a hemoglobinopathy that leads to a chronic inflammatory state resulting in vasculitis, pulmonary fibrosis, and pulmonary hypertension (1). Children with SCD (C-SCD) often suffer from an impaired gas exchange (2), primarily due to chronic airway inflammation, and the association between DLCO and sputum IL-6 level has been reported (2). If untreated, gas exchange abnormalities in SCD may result in chronic hypoxemia, cardiopulmonary morbidity, and poor disease outcomes (3). Chronic hypoxemia in SCD contributes to the pathophysiology of vaso-occlusive crises (VOC) and acute chest syndrome (ACS) (4), and it may also lead to pulmonary hypertension, which can impact life expectancy in this vulnerable population (5,6). Quantifying the underlying pathophysiologic changes is not feasible in routine clinical practice, and thus gas exchange impairment can be used as a prognostic indicator of disease severity in SCD (7).
The single-breath technique for estimating carbon monoxide uptake, or DLCO, is a standard gas exchange measurement technique (8). In addition to airway inflammation, hypoventilation and anemia also result in DLCO impairment (9)(10)(11). Despite its importance, very few studies have been evaluated DLCO in C-SCD. Biltagi et al. (2) reported DLCO impairment in C-SCD compared to controls, and we previously reported an annual DLCO decline of 1.5% in C-SCD (12). Moreover, DLCO impairment in SCD may differ by race/ethnicity (13). However, to date, no studies have focused on the determinants of DLCO in the SCD population. Addressing that knowledge gap could help gain further insight into the origins of impaired gas exchange and prevent related morbidity.
DLCO and lung volumes have a fast rate of decline in SCD (12,14). This decline is likely multifactorial and could be interrelated and may have prognostic significance. For instance, the relationship between DLCO and FVC has been used to stratify mortality risk in pulmonary hypertension (15); this underscores the importance of studying the predictors of DLCO in SCD, which can lead to pulmonary parenchymal disease, impaired gas exchange and pulmonary hypertension.
Anemia is considered the key determinant of DLCO. Subjects with low hemoglobin typically have under-estimated DLCO. Therefore, for precise interpretation, DLCO should be adjusted for hemoglobin in C-SCD. Association between airflow obstruction and diffusion impairment has been described in adults (16). While gas-diffusion certainly depends on ventilation, it would be important to consider the association between various PFT estimates and DLCO, specially in diseased lungs like SCD, which could be affected by both obstructive and restrictive airway disease. However, several SCD-related studies considered adjusting DLCO for hemoglobin instead of ventilation (17,18). Our previous study demonstrated the use of impulse oscillometry (IOS) to measure obstructive airway disease (OAD) in C-SCD (12), although a relationship between airway resistance or reactance with DLCO has never been established in C-SCD. Thus, the association between DLCO and spirometric and IOS measures of OAD would be clinically relevant yet relatively unexplored domain. Unlike OAD, restrictive lung disease can be a late manifestation in C-SCD (19), and measures like total lung capacity (TLC) and vital capacity (VC) could be significant predictors of declining DLCO-which is more evident in older C-SCD (2).
In this study, we aim to better understand the predictors of DLCO and their relative importance. Our primary objective was to identify pulmonary function test (PFT) indices and biomarkers that are associated with and predict DLCO in C-SCD and assess their predictive accuracy. Our secondary objective was to determine if estimated DLCO (eDLCO) is associated with clinical outcomes in C-SCD, which would further emphasize the clinical relevance of DLCO.

Study Population
Through retrospective chart review of 140 C-SCD from 2010 to 2020, we identified 51 C-SCD (6-19 years), who were referred to Penn State comprehensive Pediatric SCD clinic to see Pediatric Pulmonology and subsequently performed comprehensive PFTs (spirometry, IOS, plethysmography, and DLCO), along with pertinent laboratory data. The indications for the referral include SCD related chronic lung disease, asthma, and frequent respiratory exacerbation such as wheeze. We also identified 22 race-matched controls (African-American and Hispanic children) who required DLCO primarily to investigate dyspnea of unknown origin but had no cardio-pulmonary or hematological-oncological conditions.

Predictors of Adjusted DLCO
Percent-predicted DLCO was adjusted for hemoglobin [DLCO/Hb (%)] and was calculated age using sex-specific predictive equations adjusted for age (20). We selected the following potential predictors of DLCO: i. Spirometry measures such as forced vital capacity (FVC), forced expiratory volume in 1 second (FEV1), FEV1/FVC, and the forced expiratory volume between 25 and 75th of FVC (FEF 25−75 ) and plethysmography indices like TLC, VC, residual volume (RV), and RV/TLC. NHANES III equations were used to calculate %predicted values. ii. IOS estimates of total airway resistance (R5) and reactance (X5, Fres, and AX) were obtained and expressed as %pred using Berdel/Lechtenbörger equations (AX does not have standard reference values) (21). Subjects did not receive bronchodilator therapy for 12 h before the PFTs. iii. Laboratory biomarkers: the degree of anemia and biomarkers of hemolysis (LDH, total bilirubin, reticulocyte count) are known to be correlated with SCD-related complications.
Biomarkers of systemic diseases, including renal failure, anemia, low fetal hemoglobin (HbF) levels, and leukocytosis, have been reported as significant predictors of mortality in SCD (5). Thus, we included complete blood count (CBC) with differential, HbF, liver and renal function test results, and lactate dehydrogenase (LDH) levels in the preliminary association analyses ( Table 2).

Indicators of Disease Severity and Clinical Outcomes
The number of ACS has been reported to have an association with the risk of early death in C-SCD as early as 10 years of age (5,22). Clinical severity indicators considered in this study include lifetime number of hospitalizations with ACS and VOC; sleep-related nocturnal desaturation (percentage of total sleep time spent with SpO 2 < 90%) (23). Additionally, tricuspid regurgitation jet velocity (TRJV) of 2.5 m/s of higher, measured by echocardiography, was considered as a surrogate marker of pulmonary hypertension (24).

Statistical Analyses
We used R (3.6.1) and SPSS (25) for data analysis. DLCO estimates falling outside three times the mean Cook's Distance and two-standard deviation of Studentized t-values was excluded as outliers. We compared case and control groups with Mann-Whitney U-tests and used Pearson correlations to estimate the association between potential predictors and DLCO. We added a bootstrap correction to Pearson correlation to adjust for non-normality (26).

Prediction Models
We used variables with a significant association with DLCO/Hb (%) to build the prediction models using two methods, XGBoost, and linear mixed-effects regression. XGBoost is a machine learning instrument that can be used for any type of regression analysis or ranking of the predictors, as programmed by a userbuilt prediction model (27), while mixed-model is useful for longitudinal data; both approaches can account for variables with repeated measures within participants. Both models were adjusted for potential confounders or effect modifiers such as age, sex, race, hemoglobin genotype (13,25). Models were also adjusted for hydroxyurea, which increases HbF (28). Finally, models were also controlled for the diagnosis of asthma (yes/no) since it is a major comorbidity in C-SCD and asthma medications (LABA and ICS), which can elevate PFT estimates (29). We built the XGBoost model based on the five-fold cross-validation (CV) method. Subjects were randomly divided into five equal groups; four of those five groups were selected at a time as training data and the remaining one as test data, and the process was repeated five times. Based on the results, the predictors of DLCO were selected, and the algorithm was built. Predictors were ranked based on their relative importance determined by "gain" measure in XGBoost and by p-values in the linear mixed model. To quantify both models' performance in terms of predictive accuracy, we calculated the mean absolute percentage error (MAPE) and the correlation coefficient between measured and eDLCO. MAPE values of <10% and between 10 and 20% were considered as "excellent" and "good" forecasting, respectively (30). Further details on predictors and model selection are described in Supplementary Data 1.

Association Between DLCO and Clinical Outcome Measures of SCD
To confirm the prognostic importance of DLCO, we analyzed its association with SCD clinical outcomes using linear regression, adjusted for age and sex. For the correlational analyses between lifetime events (numbers) of VOC/ACS and DLCO, we used the median values of DLCO for the subjects with multiple data points. We also conducted correlation analyses between measured DLCO and other disease severity indicators, including TRJV and nocturnal desaturation. Then we used both the prediction models to calculate eDLCO, and further analyzed the association between eDLCO values and outcome measures using linear regression to cross-examine the accuracy and clinical relevance of the prediction models.

Validation of the Prediction Model
Leave-one-out performance (LOOP) was used to cross-validate XGBoost model (31). Using "LOOP" function, predicted DLCO was estimated for every single observation while the remaining data was used to train the algorithm. This process was repeated 112 times (excluding 3 outliers). The forecast's strength was estimated with MAPE and the Pearson correlation coefficient between observed vs. predicted DLCO.

Evaluation of DLCO Predictors
The correlations between hemoglobin-adjusted DLCO with PFT estimates, anthropometrics, and biomarkers are presented in  Table 2) (32), and thus they were not included in further analyses to prevent overadjustment bias.

Measured and Estimated DLCO vs. Outcome Measures
Measured DLCO was significantly associated with the number of lifetime VOC/ACS events and TRJV ( Table 4), but not with nocturnal desaturation (p = 0.13). After adjusting for age and sex, each 1% decrease in adjusted DLCO was associated with 0.08 more lifetime ACS/VOC events and 0.009 m/s higher TRJV (Table 4). eDLCO, obtained from our predictive models, was also significantly associated with AOC/VOC events and TRJV (  eDLCO was associated with 0.08-0.10 more lifetime ACS/VOC events and with 0.009-0.014 m/s higher TRJV ( Table 4).

Validation of the Prediction Model
We tested the strength of the prediction model using the LOOP method. Estimated DLCO (mean ± SD) was 87.9 ± 17.18 compared to a measured DLCO of 87.79 ± 10.87, with good forecasting (MAPE of 17.3%) and significant correlation (r = 0.40, p < 0.001) between two groups (Supplementary Figure 2).

DISCUSSION
In this study in children with SCD, we show that PFT estimates representing OAD (FEF 25−75 , FEV1/FVC, R5%), restrictive lung disease (FVC%, TLC%), and biomarkers of inflammation (neutrophil%) were associated with DLCO; and that models  built on those six variables can calculate "estimated DLCO (e-DLCO)" with precision. Moreover, we demonstrate that lower DLCO and e-DLCO are significantly associated with worse clinical outcomes, including more frequent ACS/VOC events and evidence of pulmonary hypertension. These results advance our understanding of factors associated with impaired gas exchange in SCD.
Most pediatric SCD centers in the US do not offer a multi-disciplinary clinic, and PFTs (including DLCO) are not always a part of standard of care in C-SCD. Clinical status can change rapidly in these children, and PFTs along with other biomarkers should be obtained especially in children with frequent complications, to monitor the clinical deterioration of the cardiorespiratory system. Considering the prognostic significance of impaired gas-exchange, DLCO should be incorporated into a standard of care in C-SCD with frequent SCD crises, and in-depth clinical research on DLCO is necessary. American Society of Hematology (ASH) guidelines, 2019, recommends obtaining PFTs in SCD patients with various respiratory symptoms even if they are at their steady state (33). ASH acknowledges that the usefulness of routine PFT is unknown because of the lack of research, and thus it does not recommend PFTs for every SCD patient. However, ASH further suggests that if the PFTs are obtained, it should be a comprehensive study including lung volumes and DLCO, in addition to spirometry (33).
We found that C-SCD significantly lower PFTs than their peers without SCD, consistent with previous studies that have reported impaired lung function in SCD (2,9). On the other hand, we did not find associations between biomarkers of systemic involvement and DLCO, as has been described in adult SCD literature (17). This could be partially explained by differences in disease severity or progression in adults with SCD compared to younger populations.
OAD is a relatively early phenomenon in SCD lung involvement, and it can be measured by spirometry and IOS. We found that FEF 25−75 (%) and FEV1/FVC were positively correlated with DLCO, indicating an association between OAD and impaired gas diffusion in C-SCD. One of the novel aspects of this study was our ability to examine the association between IOS estimates and DLCO. Although a relationship between IOS estimates and DLCO has never been studied in SCD, a negative correlation between airway resistance and DLCO has been reported in adults with idiopathic pulmonary fibrosis (34). With age, airway resistance increases (12), and DLCO (%) decreases in C-SCD (2); thus, the significant inverse correlation between R5 (%) and DLCO (%) may represent a parallel deterioration in gas diffusion and airway obstruction. Restrictive airway disease is a relatively late phenomenon in youth with SCD (9). As the disease progresses, lung volumes and DLCO simultaneously decline due to recurrent inflammation and progressive pulmonary fibrosis (17,35,36). The alveolar ventilation and the rate of uptake of CO (KCO) determines the gas exchange. However, SCD is path-physiologically unique and complex, since the diseased lungs could be affected by obstructive as well as restrictive changes, and their relative impact on DLCO is unknown. The positive correlation we report between DLCO and lung volume indices such as FVC (%) and TLC (%) further indicate that gas diffusion could also be influenced by restrictive lung disease, and these associations may start in childhood, even with apparently normal lung functions. Recurrent SCD crises lead to parenchymal disease and impaired gas diffusion (2,37). Neutrophil activation generates extracellular traps, triggers endothelial activation and pro-inflammatory pathways in SCD (38), resulting in thromboembolism in the pulmonary microvasculature, triggering VOC (39). Thus, neutrophilia may indicate disease severity, and it has been recognized as a predictor of mortality in SCD (5). We found that neutrophilia (both count and percent) had an inverse correlation to DLCO, and neutrophil (%) was among the top three predictors of DLCO. Absolute neutrophil counts have been reported to have an inverse correlation with DLCO in the general population (40), but to our knowledge, this is the first report associating neutrophilia with impaired gas exchange in C-SCD.
The study has several limitations that should be acknowledged. It was a retrospective, single-center study, and thus we could not evaluate the effect of center-level practices on our results. Since an external cohort was not available, further studies will be needed to validate our findings. We lacked racial and genotypical diversity in the study population, although this is probably fairly representative of the SCD population as a whole. Most of the subjects were in their early teens and had stable lung function, and therefore we could not extrapolate to younger or older ages; the predictor rank list may have been different in young children or adults with advanced SCD lung disease. We used TRJV as a surrogate marker of pulmonary hypertension. However, TRJV was a screening tool. To diagnose pulmonary hypertension, right heart catheterization was necessary, which was beyond the scope of this analysis and would merit evaluation in future, prospective studies. At the same time, our study has several strengths. While diffusing capacity is an important biomarker of SCD lung pathology and is associated with clinical outcomes, diffusion limitation and its probable predictors have not been well-studied in C-SCD. We had repeated longitudinal comprehensive PFT data for the cohort. We used two different statistical approaches; while one was more accurate than the other in estimating DLCO, both selected the same predictors, which included easy to obtain spirometric and laboratory values. Finally, both measured and estimated DLCO were associated with SCD clinical outcomes and successful cross-validation of the XGBoost model further added reliability to the prediction model (30).
In conclusion, in a cohort of children with SCD, we report several markers associated with impaired gas exchange, including PFT estimates representing restrictive lung disease (FVC), obstructive airway disease (FEF 25−75 ), and inflammation (neutrophils). DLCO was associated with SCD severity indicators, and we were able to use simple predictors to calculate eDLCO, which was significantly associated with disease outcomes. This underscores the clinical relevance of our prediction models and could help to identify children at risk.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors upon request, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Penn State College of Medicine Institutional Review Board. Written informed consent from the participants' legal guardian/next of kin was not required to participate in this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
PM: PI of the study, contributed in building study protocol, data collection, statistical analyses and interpretation, manuscript writing, and final draft approval. VM: contributed in building study protocol, statistical analyses and interpretation, manuscript writing, and final draft approval. AK and SS: contributed in data collection, statistical analyses and interpretation, manuscript writing, and final draft approval. EF: contributed in building study protocol, statistical analyses, data interpretation, manuscript writing, and final draft approval. All authors contributed to the article and approved the submitted version.