ORIGINAL RESEARCH article
Sec. Pediatric Infectious Diseases
Machine learning early prediction of respiratory syncytial virus in pediatric hospitalized patients
- 1Dascena, Inc., Houston, TX, United States
- 2Montera Inc., San Francisco, CA, United States
Respiratory syncytial virus (RSV) causes millions of infections among children in the US each year and can cause severe disease or death. Infections that are not promptly detected can cause outbreaks that put other hospitalized patients at risk. No tools besides diagnostic testing are available to rapidly and reliably predict RSV infections among hospitalized patients. We conducted a retrospective study from pediatric electronic health record (EHR) data and built a machine learning model to predict whether a patient will test positive to RSV by nucleic acid amplification test during their stay. Our model demonstrated excellent discrimination with an area under the receiver-operating curve of 0.919, a sensitivity of 0.802, and specificity of 0.876. Our model can help clinicians identify patients who may have RSV infections rapidly and cost-effectively. Successfully integrating this model into routine pediatric inpatient care may assist efforts in patient care and infection control.
Respiratory syncytial virus (RSV) is the most common lower respiratory tract infection in children; nearly all children have been infected by the time they reach 2 years of age (1). RSV causes mild infection in most healthy children, with symptoms often including fever, nasal congestion, and mild cough (1). However, RSV may also result in severe illness requiring hospitalization. Current estimates suggest that nearly 60,000 children under 5 years of age are hospitalized with RSV annually in the United States (2). Hospitalization rates are high among infants < 6 months old, particularly for those born prematurely (3). The Centers for Disease Control and Prevention (CDC) estimates that 1–2% of RSV infections in this age group result in hospitalization (2, 4). Other risk factors include premature birth, chronic pulmonary or congenital heart disease, immunodeficiencies, or neuromuscular disorders (1, 2). Pediatric patients hospitalized with RSV may require intensive care unit (ICU) admission and mechanical ventilation, which is associated with substantial healthcare spending both during and following treatment (5).
RSV can be detected in infected children through polymerase chain reaction (PCR) testing and, less accurately, through rapid antigen testing (6, 7). Because RSV infections are extremely common in children, current guidelines from the American Academy of Pediatrics recommend against routine screening for RSV in young children presenting with respiratory infection (8), noting that a positive RSV test generally does not change the course of care for patients whose infection can be managed in an outpatient setting. However, RSV testing can be informative for patients treated in hospital settings, where it may help to identify infected patients in need of isolation to prevent outbreaks, as well as identify vulnerable patients in need of additional monitoring and supportive management (8).
The utility of machine learning algorithms (MLAs) to discriminate between COVID-19 and other viral lower respiratory infections in pediatric patients has been established in previous research (9). To guide appropriate use of RSV testing, we have developed a MLA to identify pediatric patients who have RSV upon hospital admission. We present a clinically useful MLA that uses individualized demographics and vital signs data that are routinely collected early upon hospital admission. Infections are substantially enriched among patients identified as high-risk for RSV by our MLA, which demonstrates its utility as a rapid screening tool to help clinicians more efficiently target patients for confirmatory testing and response.
Materials and methods
In this study, a large commercially available electronic health record (EHR) database was used that collects data from over 700 inpatient and ambulatory care sites located in the United States. Clinical, claims, and other medical administrative data are included in the database. Data was obtained from all emergency department and inpatient encounters for the year 2019. Inclusion criteria included children aged five or younger with at least one measurement of all the required data inputs present within the first 2 h of hospital admission. Patient data was de-identified in compliance with the Health Insurance Portability and Accountability Act and thus, does not constitute human subjects research.
For each patient encounter, only measurements available within the first 2 h of admission were used as inputs to predict RSV positivity. The model used the following inputs, which were all required for the model to make a prediction: age, sex, systolic blood pressure (SysABP), diastolic blood pressure (DiasABP), heart rate (HR), respiratory rate (RespRate), body temperature (Temp), peripheral oxygen saturation (SpO2), height, and weight. Time varying features (e.g., clinical measurements) were summarized by the first, mean, and last measurement within the first 2 h of admission; those three summary statistics were used as features in the model. An 80/20 train/test split via randomization was used.
A positive result of RSV nucleic acid amplification tests (NAATs) such as a PCR test, either from a stand-alone test or as part of a respiratory disease panel, was considered the positive label for our model. NAATs are considered the clinical gold standard for diagnosis of RSV (8, 10). All other RSV tests, such as antigen tests, were disregarded by the model. As the model presented here is a binary classification model, all non-positive encounters were automatically considered negative.
The attrition process for the MLA is illustrated in Figure 1. We excluded some patients who had a positive RSV test result, as we could not conclusively determine that it was resulting from an NAAT. We also excluded patients whose RSV test samples were collected within the first 2 h after admission. The final population consisted of 54,413 patients who were randomly split into train and test sets.
Figure 1. Inclusion criteria for training and testing datasets of patient hospital encounters for algorithm development.
Machine learning nodel
We used XGBoost (XGB, or extreme gradient boost), a class of gradient boosted decision tree implemented it using the XGBoost library in Python (11, 12). We took advantage of the versatility of the algorithm as XGB is highly interpretable and performs well for an imbalanced dataset (12). A grid search cross-validation was performed to determine the optimal parameters. The parameters used in the final model are reported in Supplementary Table 1.
95% confidence intervals (CIs) were reported for model performance. For the area under the receiver-operating curve (AUROC), bootstrap sampling with replacement of prediction indices was used to generate multiple receiver-operating curves (ROC) and the area under each curve was calculated. We then reported the 5th and 95th percentile values of AUROC. For other performance metrics, 95% CIs were calculated using normal approximation. For the demographics table, Fisher’s Exact tests were performed between the positive and negative groups to obtain p-values.
To develop and test our models, we used hospital records for 54,413 encounters with patients aged 5 years or younger. Prior to algorithm trigger time, no RSV diagnostic tests had been documented and no RSV tests had been performed within 2 h of admission (Figure 1). These encounters were divided into a training set with 80% (n = 43,530) and a holdout test set with 20% (n = 10,883) of encounters. We observed demographic differences in age between encounters with and without positive RSV tests (Table 1). RSV-positive encounters had higher proportions of patients aged 1–3 years (p < 0.001); encounters without positive RSV tests had higher proportions of patients who were aged less than 1 year (p = 0.002) or aged 4–5 years (p < 0.001) or preterm birth (p < 0.001). The prevalence of RSV in the holdout test set, as measured by NAAT, was 1.8% (n = 197 RSV-positive), with a test positivity rate of 18.7%.
Table 1. Demographic data of non-RSV positive and RSV positive patients with hospital encounters included in the holdout test set.
Figure 2 shows the ROC of the XGBoost model. The AUROC for our model was 0.919, demonstrating exceptionally high accuracy in distinguishing RSV-positive encounters as positive and non-RSV-positive encounters as non-positive in a binary classification task. Fixing the sensitivity of the model to 0.80 yielded a specificity of 0.876 (Table 2).
Figure 2. Algorithm discrimination and precision in identifying hospital encounters with future positive RSV tests. The receiver-operating curve (ROC) for the XGBoost model, showing superiority to random chance (gray) in discrimination between RSV-positive and non-RSV-positive encounters.
To determine which features of patient encounters most strongly influenced our model’s prediction of RSV, we generated summary plots from Shapley additive explanations (SHAP) analyses (Figure 3) (13). Our model showed a strong dependence on patient weight and age, together with systolic or diastolic blood pressure and high respiratory rate (14). These results show that our model’s ability to successfully distinguish future RSV positivity among hospitalized pediatric patients is most strongly dependent on vital signs and clinical data that are routinely and rapidly collected at patient point-of-care.
Figure 3. Shapley value plots for degree of model’s dependence on specific features. From top to bottom, the relative importance of each feature was ranked. Red dots represent relatively high values of a feature and blue dots represent relatively low values. On the x-axis, the SHAP values (or impact on model output) is plotted. If most of the red dots are on the right of the x-axis, it means high value of that feature (ex. mean DiasABP in this figure) substantially contributes to a positive prediction. SysABP, systolic arterial blood pressure; DiasABP, diastolic arterial blood pressure; RespRate, respiratory rate; HR, heart rate; SpO2, oxygen saturation; Temp, body temperature.
In this study, we developed an MLA to rapidly and systematically predict a positive RSV NAAT test among hospitalized pediatric patients. This algorithm used inputs that are routinely collected and reported in patients’ EHRs within 2 h of admission to predict a positive NAAT for RSV later in the same admission. Our work demonstrates the utility of leveraging machine learning techniques to rapidly predict previously unidentified infections among hospitalized patients.
There are two major innovations in our study that substantially contribute to the field. First, our study focuses specifically on identifying likely RSV infections rapidly upon presentation to a hospital emergency room. This differs from previously developed MLAs focused on pediatric RSV infections, which have focused either on predicting future RSV diagnosis, hospitalization, or severe progression of disease in the months to years following data collection, or on identifying RSV infections among pediatric patients that were already hospitalized with known symptoms of respiratory viral infection (15–17). Several of these previous algorithms also were developed using data only from preterm infants (16, 17), thereby limiting their generalizability as compared to our MLA. As a preventive tool, Heaton et al. developed an MLA to predict seasonal RSV outbreaks to allow for timely immunoprophylaxis injections for children predisposed to poor infection outcomes (18). Other studies using MLAs that predict suitable treatment courses (19) or patient outcomes (15, 20) for bronchiolitis patients, a disease commonly caused by RSV, require a proper diagnosis prior to running the algorithm. These RSV preventive and treatment studies do not address the need for broad screening of incoming pediatric patients and rapid identification of RSV infected patients. Our study therefore provides unprecedented utility among RSV-focused MLAs for hospital healthcare providers to improve the efficiency and accuracy of their initial care for pediatric patients. Second, our MLA is designed to predict RSV positive tests without requiring detailed patient data that require surplus time and effort over standard-of-care protocols performed early in hospitalization. This differs from previously developed risk scores or MLAs that required inputs of ICD diagnosis codes, transcriptome data, and/or documentation of specific symptoms that take additional time to collect and log in patients’ EHRs (16, 17, 21–23). The relative simplicity of our MLA indicates that integration into hospital settings would be more efficient and immediately useful to clinicians who care for pediatric inpatients.
If successfully implemented as a rapid, preliminary RSV screening system in a hospital setting, our algorithm could provide several primary services to healthcare providers caring for pediatric patients. First, it could be used as a tool for identifying patients to be enrolled or not enrolled in cohort studies or clinical trials that involve active RSV infection - either to include or exclude patients who are actively infected (24). This would save clinical researchers time and effort by substantially narrowing their scope of viral testing. Second, it could help hospital infection prevention personnel to more quickly identify infected patients who may need to be placed on additional precautions to prevent healthcare-associated transmission of RSV. Outbreaks of RSV in pediatric hospital settings are well documented and have been shown to contribute to increased patient morbidity, mortality, and complexity of care (6, 25, 26). Third, our algorithm could better inform delivery of care for infected patients by identifying them more rapidly and with greater efficiency of viral testing. Taken together, these advantages could be leveraged particularly well in tertiary care research and teaching hospitals that would benefit from an efficient alternative to established risk scores or systematic viral testing to identify infected patients.
There are several limitations to this study. First, the use of NAAT testing for RSV as a “gold standard” likely excluded many diagnoses of infection by rapid antigen detection, which may have skewed the RSV prevalence and predictive power of the MLA. Second, we did not include data on the presence or absence of respiratory symptoms that are known to be strong predictors of RSV infection (2, 8, 27), because these data were often missing from EHRs of the patients included in this study. Future directions of this research could potentially be improved by considering RSV diagnoses made by rapid antigen testing. Additionally, future studies should include the presence or absence of known symptoms of acute respiratory disease to identify patients with RSV.
The model we present in this study performed well in identifying RSV infections among pediatric inpatients at the time they presented to the hospital, using clinical data that are routinely collected in the first 2 h following admission. Our model demonstrates utility for clinicians who would benefit from rapidly identifying RSV infections among pediatric inpatients for purposes of infection prevention, clinical trial enrollment, or management of care. Future directions in the field include refining diagnostic algorithms by including more detailed patient data and the development of new models focused on other infectious diseases of substantial clinical concern.
Data availability statement
The original contributions presented in this study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.
This research only used de-identified patient data, thus it is exempt from ethical approval, in line with local legislation.
All authors listed have made a substantial, direct, and intellectual contribution to the work, and approved it for publication.
We thank Nicole Zelin, MD for clinical advice and Myrna Hurtado, Ph.D. for extensive editing and revisions.
Conflict of interest
All authors were employed by Dascena, Inc. (Houston, TX, United States). QM was employed by Montera, Inc. (San Francisco, CA, United States).
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fped.2022.886212/full#supplementary-material
1. Paes BA, Mitchell I, Banerji A, Lanctôt KL, Langley JM. A decade of respiratory syncytial virus epidemiology and prophylaxis: translating evidence into everyday clinical practice. Can Respir J. (2011) 18:e10–9. doi: 10.1155/2011/493056
2. CDC. Learn about RSV in Infants and Young Children. Centers for Disease Control and Prevention. (2020). Available online at: https://www.cdc.gov/rsv/high-risk/infants-young-children.html (accessed March 15, 2021).
3. Stein RT, Bont LJ, Zar H, Polack FP, Park C, Claxton A, et al. Respiratory syncytial virus hospitalization and mortality: systematic review and meta−analysis. Pediatr Pulmonol. (2017) 52:556–69. doi: 10.1002/ppul.23570
4. Rha B, Curns AT, Lively JY, Campbell AP, Englund JA, Boom JA, et al. Respiratory syncytial virus-associated hospitalizations among young children: 2015-2016. Pediatrics. (2020) 146:e20193611. doi: 10.1542/peds.2019-3611
5. Amand C, Tong S, Kieffer A, Kyaw MH. Healthcare resource use and economic burden attributable to respiratory syncytial virus in the united states: a claims database analysis. BMC Health Serv Res. (2018) 18:294. doi: 10.1186/s12913-018-3066-1
6. Abels S, Nadal D, Stroehle A, Bossart W. Reliable detection of respiratory syncytial virus infection in children for adequate hospital infection control management. J Clin Microbiol. (2001) 39:3135–9. doi: 10.1128/JCM.39.9.3135-3139.2001
7. Allen AJ, Gonzalez-Ciscar A, Lendrem C, Suklan J, Allen K, Bell A, et al. Diagnostic and economic evaluation of a point-of-care test for respiratory syncytial virus. ERJ Open Res. (2020) 6:00018–2020. doi: 10.1183/23120541.00018-2020
8. By Committee on Infectious Diseases, American Academy of Pediatrics Kimberlin DW, Barnett ED, Lynfield R, Sawyer MH. Red Book: 2021–2024 Report of the Committee on Infectious Diseases. Itasca, IL: American Academy of Pediatrics (2021). doi: 10.1542/9781610025782
9. Nino G, Molto J, Aguilar H, Zember J, Sanchez-Jacob R, Diez CT, et al. Chest X-ray lung imaging features in pediatric COVID-19 and comparison with viral lower respiratory infections in young children. Pediatr Pulmonol. (2021) 56:3891–8. doi: 10.1002/ppul.25661
10. Miller JM, Binnicker MJ, Campbell S, Carroll KC, Chapin KC, Gilligan PH, et al. A guide to utilization of the microbiology laboratory for diagnosis of infectious diseases: 2018 update by the infectious diseases society of america and the american society for microbiologya. Clin Infect Dis. (2018) 67:e1–94. doi: 10.1093/cid/ciy381
11. XGBoost. Python Package Introduction — xgboost 1.4.0-SNAPSHOT Documentation. (2020) Available online at: https://xgboost.readthedocs.io/en/latest/python/python_intro.html. (accessed January 19, 2021).
12. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. New York, NY: (2016). p. 785–94. doi: 10.1145/2939672.2939785
14. Duarte-Dorado DM, Madero-Orostegui DS, Rodriguez-Martinez CE, Nino G. Validation of a scale to assess the severity of bronchiolitis in a population of hospitalized infants. J Asthma. (2013) 50:1056–61. doi: 10.3109/02770903.2013.834504
15. Raita Y, Camargo CA, Macias CG, Mansbach MJ, Piedra PP, Porter SC, et al. Machine learning-based prediction of acute severity in infants hospitalized for bronchiolitis: a multicenter prospective study. Sci Rep. (2020) 10:10979. doi: 10.1038/s41598-020-67629-8
16. Blanken MO, Koffijberg H, Nibbelke EE, Rovers MM, Bont L on behalf of the Dutch Rsv Neonatal Network. Prospective validation of a prognostic model for respiratory syncytial virus bronchiolitis in late preterm infants: a multicenter birth cohort study. PLoS One. (2013) 8:e59161. doi: 10.1371/journal.pone.0059161
17. Resch B, Bramreiter VS, Kurath-Koller S, Freidl T, Urlesberger B. Respiratory syncytial virus associated hospitalizations in preterm infants of 29 to 32 weeks gestational age using a risk score tool for palivizumab prophylaxis. Eur J Clin Microbiol Infect Dis. (2017) 36:1057–62. doi: 10.1007/s10096-016-2891-6
18. Heaton MJ, Ingersoll C, Berrett C, Hartman BM, Sloan C. A Bayesian approach to real-time spatiotemporal prediction systems for bronchiolitis. Spat Spatio-Temporal Epidemiol. (2021) 38:100434. doi: 10.1016/j.sste.2021.100434
19. Mateo J, Rius-Peris JM, Maraña-Pérez AI, Valiente-Armero A, Torres AM. Extreme gradient boosting machine learning method for predicting medical treatment in patients with acute bronchiolitis. Biocybern Biomed Eng. (2021) 41:792–801. doi: 10.1016/j.bbe.2021.04.015
20. Luo G, Stone BL, Nkoy FL, He S, Johnson MD. Predicting appropriate hospital admission of emergency department patients with bronchiolitis: secondary analysis. JMIR Med Inform. (2019) 7:e12591. doi: 10.2196/12591
21. Paes B, Fullarton JR, Rodgers-Gray BS, Carbonell-Estrany X. Adoption in canada of an international risk scoring tool to predict respiratory syncytial virus hospitalization in moderate-to-late preterm infants. Curr Med Res Opin. (2021) 37:1149–53. doi: 10.1080/03007995.2021.1911974
22. Mosalli R, Abdul Moez AM, Janish M, Paes B. Value of a risk scoring tool to predict respiratory syncytial virus disease severity and need for hospitalization in term infants: predicting RSV Hospitalization in Term Infants. J Med Virol. (2015) 87:1285–91. doi: 10.1002/jmv.24189
23. Jong VL, Ahout IML, van den Ham HJ, Jans J, Zaaraoui-Boutahar F, Zomer A, et al. Transcriptome assists prognosis of disease severity in respiratory syncytial virus infected infants. Sci Rep. (2016) 6:36603. doi: 10.1038/srep36603
25. Baier C, Haid S, Beilken A, Behnert A, Wetzke M, Brown RJP, et al. Molecular characteristics and successful management of a respiratory syncytial virus outbreak among pediatric patients with hemato-oncological disease. Antimicrob Resist Infect Control. (2018) 7:21. doi: 10.1186/s13756-018-0316-2
26. Homaira N, Sheils J, Stelzer-Braid S, Lui K, Oie JL, Snelling T, et al. Respiratory syncytial virus is present in the neonatal intensive care unit: RSV in NICU. J Med Virol. (2016) 88:196–201. doi: 10.1002/jmv.24325
Keywords: respiratory syncytial virus, pediatric infection, diagnosis, machine learning, algorithm, XGBoost
Citation: Tso CF, Lam C, Calvert J and Mao Q (2022) Machine learning early prediction of respiratory syncytial virus in pediatric hospitalized patients. Front. Pediatr. 10:886212. doi: 10.3389/fped.2022.886212
Received: 28 February 2022; Accepted: 04 July 2022;
Published: 04 August 2022.
Edited by:Maurizio Aricò, Department of Pediatrics, Italy
Reviewed by:Mauricio Tomas Caballero, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina
Arturo Solis-Moya, Dr. Carlos Sáenz Herrera National Children’s Hospital, Costa Rica
Copyright © 2022 Tso, Lam, Calvert and Mao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Qingqing Mao, email@example.com
†These authors have contributed equally to this work