Pediatric Vital Sign Distribution Derived From a Multi-Centered Emergency Department Database

Background We hypothesized that current vital sign thresholds used in pediatric emergency department (ED) screening tools do not reflect observed vital signs in this population. We analyzed a large multi-centered database to develop heart rate (HR) and respiratory rate centile rankings and z-scores that could be incorporated into electronic health record ED screening tools and we compared our derived centiles to previously published centiles and Pediatric Advanced Life Support (PALS) vital sign thresholds. Methods Initial HR and respiratory rate data entered into the Cerner™ electronic health record at 169 participating hospitals’ ED over 5 years (2009 through 2013) as part of routine care were analyzed. Analysis was restricted to non-admitted children (0 to <18 years). Centile curves and z-scores were developed using generalized additive models for location, scale, and shape. A split-sample validation using two-thirds of the sample was compared with the remaining one-third. Centile values were compared with results from previous studies and guidelines. Results HR and RR centiles and z-scores were determined from ~1.2 million records. Empirical 95th centiles for HR and respiratory rate were higher than previously published results and both deviated from PALS guideline recommendations. Conclusion Heart and respiratory rate centiles derived from a large real-world non-hospitalized ED pediatric population can inform the modification of electronic and paper-based screening tools to stratify children by the degree of deviation from normal for age rather than dichotomizing children into groups having “normal” versus “abnormal” vital signs. Furthermore, these centiles also may be useful in paper-based screening tools and bedside alarm limits for children in areas other than the ED and may establish improved alarm limits for bedside monitors.

inTrODUcTiOn Vital sign thresholds are incorporated into various screening tools to help identify those children at higher risk of serious medical or surgical illness (1-4). To effectively utilize vital sign data in children, however, it may be more useful to identify the magnitude of deviation from the expected vital sign distribution, considering the child's age and location of care [e.g., emergency department (ED) versus intensive care unit (ICU)], rather than determining if the vital sign value is abnormal. Since most current scoring tools consider vital signs as dichotomous variables (i.e., "normal" or "abnormal") within relatively wide age ranges, it is not surprising that most triage and scoring tools have performed poorly even though they are significantly associated with the outcome of interest (4)(5)(6)(7). Most screening tools use vital sign thresholds that fail to consider the physiologic stress response of a child seen in the ED. Thus, the upper "normal" vital sign thresholds observed in ED patients were higher than observed in children who were hospitalized on the ward or who were ambulatory (8)(9)(10). The value of using empirically derived ED vital sign thresholds was demonstrated in a study of a pediatric ED sepsis screening tool that incorporated temperature (TMP) adjustment for heart rate (HR) and respiratory rate (RR) (11). The tool's positive predictive value was 48.7%, almost threefold better than using the consensus systemic inflammatory response syndrome (SIRS) criteria (12), with no loss of sensitivity (11).
In sepsis, early identification of children at risk is a key recommendation for optimal management (13) since early implementation of protocol-guided sepsis care decreased sepsisrelated organ dysfunction, hospital and ICU length of stay, and mortality (14)(15)(16). Clinical judgment alone misses approximately 27% of septic children seen in the ED (17). Using vital sign data with current threshold parameters is limited by the high rate of tool activation; almost 17% of febrile or hypothermic children in the ED met alert criteria, but only 2.5% of these children had severe sepsis or septic shock (17). Similarly, more than 90% of febrile children in the ED meet vital sign criteria for SIRS (5), and ~12% of all ED children triggered an alert based on tachycardia alone (4). These data suggest that current sepsis screening tools identify too many at-risk children, leading to alert fatigue (18) and reluctance to use the tool.
Recent studies empirically derived centile ranks and, in some cases, z-scores for vital sign parameters by age (9,11). The rationale for considering the vital sign parameter's z-score or centile rank rather than "normal" versus "abnormal" is based on the enhanced statistical power of the former over the latter (19,20).
We hypothesized that current pediatric HR and RR vital sign thresholds used in Pediatric Advanced Life Support (PALS) or derived from low-acuity ED patients do not accurately reflect empirically derived HR and RR centiles. To develop empirically derived thresholds that could be incorporated into ED screening tools and may inform monitor alarm limits, we analyzed a very large multi-institutional database to derive HR and RR centile ranks and z-scores stratified by age in children presenting to the ED. Ultimately, our goal is to derive HR and RR data that can be applied as continuous variables in electronic health record (EHR)-based tools to stratify children into risk groups or used as threshold limits in paper-based triage tools. Empirically derived vital sign distributions also may better determine alarm limits in different aged children to reduce alarm fatigue and may be useful to stratify children into risk groups for clinical trials.

Data source
Data in Cerner Health Facts ® (21) are extracted directly from the EHR of hospitals in which Cerner has a data use agreement. All admissions, medication orders and dispensing, laboratory orders, and specimens are date and time stamped, providing a temporal relationship between treatment patterns and clinical information. Cerner Corporation has established Health Insurance Portability and Accountability Act-compliant operating policies to establish de-identification for Health Facts ® .

Data analysis
The HR and RR distributions by age were modeled using the generalized additive models for location, scale, and shape (GAMLSS) methodology and software (22,23). GAMLSS adjusts for kurtosis and skews in the distributions and allows the generation of normalized standard centiles, or "z-scores, " and smooth centile curves by age. This process requires that data are fitted to one of several mathematical distributions (24) that approximate real-world distributions of vital sign measurements. For modeling HR, the Box-Cox power exponential (BCPE) distribution was chosen based on previous work (9,25) and goodness of fit. For modeling RR, however, BCPE was unsuitable due to the highly leptokurtic (thin, slender peaked) and heavier tailed nature of the RR distributions (as shown in Figure 1); therefore, an alternative distribution, the Box-Cox "t" (BCT) (26), was employed.
For modeling the distributions of RR, a natural logarithm transformation was required for model convergence. It was also necessary to introduce Gaussian statistical noise of up to ±2 breaths/min (with a mean value of 0) to overcome the reduced variation due to digit bias in the raw RR measurements (9) and so induce greater conformity to the BCT modeling distribution.
For each vital sign, our modeling process entailed a stepwise fitting procedure to calculate optimal age-specific fits for mean, SD, skew, and kurtosis, with an additive term that used "penalized B-splines" to create smooth centile curves (22). The process was repeated with patient age raised to various exponents [designated in GAMLSS literature as the "power parameter" (27)] between 0.01 and 1 to further optimize model fit. The methodology used for smoothing is necessarily subjective in that the modeler may choose between iterative methods (22,27) that result in varying degrees of over-fitting or under-fitting of the model to the empirical data. We chose the Schwarz Bayesian Criterion for minimizing local deviation between model and data, which resulted in acceptably smooth curves that retained a good fit to the underlying vital signs data (22). A technical specification of the GAMLSS parameters that describe our centile models is found in Data Sheet S1 in Supplementary Material. Because our original intent was to examine both raw and TMP-corrected initial vital signs, our data capture was restricted to initial encounters having HR, RR, and TMP values taken within 15 min of one another. Encounters having two or more distinct measurements recorded for the same vital sign at the same date and time were not uncommon (comprising about 7% of the total). In such cases, we selected the average value, provided that the range of simultaneous values did not exceed 10% of the largest value for HR or RR (i.e., approximately 10-20 bpm for HR or 2-5 breaths/min for RR), or 3% of the largest value for TMP (i.e., approximately 1°C), arbitrarily chosen to exclude likely erroneous outliers; if exceeded, the encounter was excluded. Records in one or more of the following categories also were similarly excluded as likely outliers: (1) extreme (likely spurious) values of HR (<30 or >300 bpm), RR (0 or ≥120 breaths/min), or TMP (<30 or >46°C); (2) encounters classified as "Trauma Center" cases; or (3) encounters where the patient had a diagnosis of a chronic heart or respiratory condition present on admission. The latter two exclusion types (collectively ~0.4% of cases) were excluded since we did not want to include children who were more likely to have very abnormal vital signs. A summary of the selection and exclusion criteria used to determine the final data set for our study is given in Data Sheet S2 in Supplementary Material with details on specific diagnostic exclusions in Data Sheet S3 in Supplementary Material. A sensitivity analysis of the effects of these exclusions on the final modeled centile results was conducted, as described below.

Model Validation
To test model reproducibility, we performed a stratified splitsample validation whereby the full data set used for each vital sign was divided into a "training"subset consisting of two-thirds of randomly selected records from each age group, and a "test" data subset consisting of the remaining one-third of records. The training subset was modeled by GAMLSS to generate centile cutoffs, and the percentage of records with vital sign values above and below these modeled cutoffs for the 95th, 99th, 5th, and 1st centiles, respectively, were compared between the training and test data subsets using exact chi-square tests. Holm (step-down Bonferroni) (28) corrections, performed separately for HR and RR, were used to adjust for the multiple testing of each metric.

sensitivity analysis
To determine the effect of excluding encounters classified as "Trauma Center, " or those where the patient had a diagnosis of a chronic heart or respiratory condition, the modeling of HR and RR distributions was repeated including these encounters. The resultant modeled centile values, rounded to the nearest whole number for HR and RR, were then compared between datasets of children with and without exclusions.

comparison to empirically Derived and guideline-Based Vital sign Thresholds
We graphically plotted the empirically derived 5th, 50th, and 95th centiles by age compared with the same centiles calculated by O'Leary et al. (10), derived from a large single-center ED population. We also show the upper and lower HR and RR thresholds for awake children recommended in the American Heart Association Pediatric Emergency Assessment, Recognition, and Stabilization, and PALS courses (29).

Data selection
The Health Facts ® data used comprises initial vital sign values and relative measurement times for HR and RR collected from ED encounters involving children (ages 0-17 years) at 169 U.S. hospitals (five were children's hospitals) for calendar years 2009 through 2013. For all encounters, only patient types classified as "Emergency" were selected, which excludes children who were hospitalized from the ED since the Health Facts database does not specifically identify those admitted patients who entered the hospital via the ED. Patient age was recorded as an integer age in months for children <2 years or in years for older children. Because ages were categorical rather than continuous, age was  converted to represent each category by its midpoint (e.g., 0-1 month becomes 0.5 months) for modeling purposes. Our preliminary analysis of patient TMPs taken in the ED found that TMP distributions varied according to patient age and route of measurement, making generalized TMP corrections of HR and RR problematic. Therefore, we analyzed HR and RR without TMP correction.

characteristics of study subjects
A total of 1,203,042 encounters were used to study the distribution of initial HR by age, with slightly fewer (1,202,984) available for an analysis of RR. Patient information and encounters included in our study for each of the vital signs examined are presented in Tables 1 and 2 and Data Sheet S4 and S5 in Supplementary Material. As expected, the sample sizes for age categories representing children <2 years were smaller than those for children 2 years or older due to the shorter age interval (1 month versus 1 year) represented by these categories ( Table 1). Table 2 lists information on patient demographics and length of stay for contributing encounters. Data Sheet S4 in Supplementary Material presents patient encounter frequencies according to the contributing hospitals' demographics. Data Sheet S5 in Supplementary Material summarizes contributing patient encounters by principal diagnosis using Clinical Classifications Software (30) categories, an Agency for Healthcare Research and Quality methodology for condensing patient ICD-9 diagnostic data into clinically meaningful groups.
Representative raw distributions (histograms) for the HR and RR (Figure 1) illustrate the positive skew (longer right tail) in the distributions of each vital sign, and the high kurtosis and heavy tails evident in the distribution of RR.

centile curves
For HR and RR (Figures 2A,B), the centile curves reveal a somewhat complex relationship of vital sign distributions with age for neonates and infants up to about two years of age, followed by smoothly decreasing vital sign values with age for all centiles, with values becoming nearly constant over the late adolescent age range.

Model Validation
The results of the split-sample validation of model result reproducibility ( Table 3) for all metrics and each centile group tested showed no significant difference in the proportion of cases identified by applying the model derived cutoffs obtained from the training data subset to the empirical data in the training and test subsets, respectively, based on either the raw or Holm-adjusted chi-square P statistic. Table 3 also shows the high general agreement between the modeled centiles and the empirical data. Ideally, the modeled centile groups ">99th" and "<1st" should each identify about 1% of the empirical data, while groups ">95th" and "<5th" should each identify about 5%. Deviations between the modeled and the empirical percentages are apparent only in the RR "<5th" and "<1st" (which identify about 3.9 and 0.7% of the respective empirical data for these groups).

centile and z-score Tables
The modeled values of HR, and RR from the 1st to 99th centiles for each age group are provided as Tables 4 and 5. Supplementary Material Data Sheet S6 and S7 in Supplementary Material present similarly formatted model results by age group for HR and RR as normalized standard centiles (z-scores) ranging from −3.0 to +3.0 SDs for HR and RR.

sensitivity analysis
Each sensitivity analysis compared modeled centiles between data sets with and without predefined exclusions for 11 centile levels and 40 age categories (as presented in Tables 4 and 5).
Our finding of only one discrepancy out of 440 combinations of centile and age for HR and four for RR-with differences of just 1 bpm for HR and 1 breath/min for each RR-shows that our choice to exclude these encounters resulted in a negligible effect on modeled centile values compared with those obtained without the exclusions. In infants, Figure 3A shows that the empirically derived 50th centile for HR was 9-16 bpm higher than the values determined by O'Leary et al. (10), and the 95th centiles for HR were 13-24 bpm higher than the O'Leary values, whereas the infant-aged PALS recommended upper and lower HR limits (29) were above and below the 95th and 5th centiles, respectively. The empirically derived 95th centile RR in infants (Figure 3B) was also higher by 7-11 breaths/min than O'Leary's data, whereas the 50th and 5th centile were similar. The PALS recommended upper RR in infants was similar to the empirically derived 95th centile range, although by 10 months of age, it steadily exceeded the empirically derived 95th centile. The lower PALS RR limit was 6-10 breaths/ min above the empirically derived 5th centile and was similar to the 50th centile by 7 months of age. In children (≥1 year), Figure 3A shows substantial discrepancy between the empirically derived 95th centile HR and O'Leary's (10) centile HR values up to around 10 years of age. The fifth centile for all three sets of data are similar, but the PALS upper HR limits are below the empirically derived 95th centile and are between the 50th and 75th centile for age up to ~6 years old (23-44 bpm lower from 1 through 6 years of age). Similarly, the empirically derived 95th centile RR (Figure 3B) is 3-6 breaths/ min higher than the PALS and O'Leary values up to 4-5 years of age.

DiscUssiOn
We hypothesized that empirically derived HR and RR distributions would deviate from PALS recommended distributions, and values derived from a low-acuity population of children seen in the ED. Ideally, an effective screening tool should balance high sensitivity to detect children at risk of deterioration while limiting too many false-positive patients leading to alert fatigue (5). It is important to recognize, however, that a vital sign-based screening tool is unlikely to identify all at-risk children. Instead, recent data show that combining an EHR-based screening tool with clinician identification, timelier joint team assessments and improved escalation of care processes results in high sensitivity and specificity for identification of severe sepsis/septic shock (31). As seen in Figures 3A,B, the utility of "normal" vital sign thresholds recommended by PALS (29), APLS (32), and other reference texts is limited since they group normal values into relatively wide age ranges, which encompass large physiologic ranges and thus a wide expected distribution of vital signs, and they are not empirically derived. The PALS upper HR thresholds are well below the empirically derived 95th centiles and are between the 50th and 75th centile for age up to ~6 years, leading to substantial over-identification of at-risk children.
Similarly, the empirically derived 95th centile RR ( Figure 3B) is higher than the PALS and O'Leary values up to 4-5 years of age. The upper PALS RR limit would over-identify many adolescents, whereas the lower PALS RR values generally exceed the empirically derived value up to 12 years of age falling 2-3 breaths/min below the empirically observed fifth centile. Using PALS criteria also would overclassify up to 50% of older infants as having an abnormally low RR ( Figure 3B).
Thus, it is not surprising that in a ward inpatient sample of over 116,000 observations, 54% of the HR and 40% of the RR vital signs observed in hospitalized children were outside textbook "normal" vital sign distributions and 38% of HR and 30% of RR observations would have resulted in increased early warning scores (9). Similarly, in an analysis of 40,356 ED visits at Colorado Children's Hospital (5), 16.3% of the children had a fever (>38.5°C) and 92.8% of these febrile children met SIRS vital sign criteria, as defined by the International Pediatric Sepsis Conference (12). Most of these children were discharged without any intervention.
Several groups have used data from EHR's to redefine the distribution of vital signs observed in children in the ED (8,10) or hospital wards (9,33). These analyses clearly show that the standard textbook and guideline distributions and thresholds for "abnormal" for vital signs do not reflect empirically observed distributions (9). However, our empirically derived centiles differ from centiles derived from a large (111,696) data set of children presenting to a single ED (10). Of note, this study (10) restricted its analysis to the lowest acuity children, whereas our analysis included all acuities, but excluded children seen in the ED who were subsequently admitted. This may better represent the distributions of vital signs seen in children brought to the ED, who are likely stressed by the ED environment, but who presumably are not critically or seriously ill and, thus, do not require hospitalization.
We believe our empirically derived centiles based on a very large, multi-centered database of ED visits by children provide evidence-based vital sign parameters to use in either EHR or paper-based early warning scores, to set monitor alarm limits and to risk-stratify children for clinical trials or epidemiologic studies. Rather than using dichotomous threshold values, which are often set by consensus, to define "normal" from "abnormal, " we believe that these data can lead to the development of more useful objective risk stratification tools.
Although using vital sign parameter z-scores or centile ranks rather than "normal" versus "abnormal" is complex, EHR systems can be programmed to analyze these data to risk-stratify children, which enhances the statistical power of analyzing continuous data rather than using a dichotomous threshold (19,20). By necessity, dichotomizing a continuous physiologic variable into two categories anticipates an unrealistic step function for risk at the threshold level. For example, if the upper limit of normal HR is set at 180 bpm, then an infant with a HR of 179 bpm is considered "normal, " even though this infant is not demonstrating the same physiologic response as the same aged infant with a HR of 120 bpm.
The potential value of using risk-stratified assessment of vital signs was seen in an ED-based study of 1,750 febrile children evaluated using seven different vital sign modeling strategies to identify children with serious bacterial infections (34). The model that worked best utilized the degree of deviation of age-adjusted HR and RR from median values, with or without TMP correction.

centile and z-scores
Although we generated smooth curves for the individual vital signs, the raw vital sign distributions for RR shown in Figure 1 are not Gaussian but rather show a strong kurtosis and right-tailed skew. The skewed RR may represent an increased prevalence of disease processes (i.e., respiratory conditions), which lead to more frequent elevated RR values in children presenting to the ED compared with a population of healthy, normal children.
We performed a stratified split-sample validation that showed ( Table 3) for each vital sign metric there was no significant difference in the proportion of cases in the training versus the test subjects.

TMP-adjusted Vital signs
Most screening tools do not TMP adjust the HR or RR. It is well known that increased TMP increases metabolic demand, thus increasing HR and RR (33,35,36). Analysis of vital sign data in hospitalized children (not ICU) found a near linear increase in HR with increasing TMP of approximately 10 beats/min for each degree Centigrade increase in TMP, although the increase appears to vary by age, with the HR increasing by 15 bpm in infants per degree Centigrade versus 8 bpm in 14-to 18-year-olds (33).
We planned to analyze the effect of TMP on HR and RR, but we were not confident in adjusting oral TMP to core TMP and did not know how to correct for axillary or temporal artery TMPs. A recent systematic review of the accuracy of peripheral versus core TMPs in adults and children (37) found few studies comparing oral to core TMP in children. Axillary and infrared tympanic TMPs can vary widely from core TMP, especially in patients who have poor perfusion. Moreover, an inpatient study conducted at Cincinnati Children's Hospital and Children's Hospital of Philadelphia observed significantly different TMP histograms between the institutions, which disappeared when one of the institutions transitioned to the same thermometer used at the other hospital, showing that different thermometers can introduce additional variation in the measured TMP (33).

limitations
Since we did not have precise ages, we used 1-month ranges (if <2 years of age) or 1-year age ranges for older children. It seems likely that the large sample size and smoothing of the data will limit the impact of including children within an entire year, especially since within any year range there is sizable normal variation based on child size and height as seen with blood pressure (38).
We chose to use extreme exclusion criteria for HR and RR. This may have resulted in the inclusion of data from children with erroneous or very abnormal vital sign values, but again the very large data set likely limits the impact of these outliers.
Further studies are needed to determine if using empirically derived vital sign thresholds would result in a lower sensitivity to identify high-risk children. Our previous analysis using empirically derived higher vital sign thresholds based on data from a single ED noted improved positive predictive value for severe sepsis/septic shock identification without affecting sensitivity (11).

cOnclUsiOn
In summary, the derived HR and RR centiles and z-scores from more than 1 million children seen in the ED of mostly adult hospitals were often different from currently derived centiles in children seen in the ED and from consensus-based PALS guideline vital sign thresholds. We believe our data provide a reliable representation of HR and RR distributions in stressed children seen in the ED who do not require hospitalization. These data could be used to create algorithms within EHR's to develop pragmatic risk scores that increase sensitivity and specificity and reduce alarm fatigue characteristic of dichotomous vital sign thresholds. We also believe these parameters may help establish improved alarm limits for bedside monitors and these empirically derived HR and RR thresholds may improve the performance of paper-based tools, such as those based on PALS vital sign thresholds. Finally, the data may better inform the creation of disease-specific screening tools.

aVailaBiliTY OF DaTa anD MaTerial
The datasets used and analyzed during the current study are available from the corresponding author, but restrictions apply to the availability of these data since they were obtained through a written agreement with Cerner Corporation and so are not publicly available. Data may be available from the author on reasonable request and with permission of Cerner Corporation.

eThics sTaTeMenT
Since we analyzed de-identified clinical data, this is not human subjects research.
aUThOr cOnTriBUTiOns RS, SG, and AZ each made substantial contributions to the con ception and design of the study. RS conducted the analyses and derived the centile curves and z-scores. RS and AZ jointly drafted and all three authors critically revised the manuscript for important intellectual content. All authors read and approved the final manuscript. acKnOWleDgMenTs