Administrative Data Is Insufficient to Identify Near-Future Critical Illness: A Population-Based Retrospective Cohort Study

Background Prediction of future critical illness could render it practical to test interventions seeking to avoid or delay the coming event. Objective Identify adults having >33% probability of near-future critical illness. Research Design Retrospective cohort study, 2013–2015. Subjects Community-dwelling residents of Manitoba, Canada, aged 40–89 years. Measures The outcome was a near-future critical illness, defined as intensive care unit admission with invasive mechanical ventilation, or non-palliative death occurring 30–180 days after 1 April each year. By dividing the data into training and test cohorts, a Classification and Regression Tree analysis was used to identify subgroups with ≥33% probability of the outcome. We considered 72 predictors including sociodemographics, chronic conditions, frailty, and health care utilization. Sensitivity analysis used logistic regression methods. Results Approximately 0.38% of each yearly cohort experienced near-future critical illness. The optimal Tree identified 2,644 mutually exclusive subgroups. Socioeconomic status was the most influential variable, followed by nursing home residency and frailty; age was sixth. In the training data, the model performed well; 41 subgroups containing 493 subjects had ≥33% members who developed the outcome. However, in the test data, those subgroups contained 429 individuals, with 20 (4.7%) experiencing the outcome, which comprised 0.98% of all subjects with the outcome. While logistic regression showed less model overfitting, it likewise failed to achieve the stated objective. Conclusions High-fidelity prediction of near-future critical illness among community-dwelling adults was not successful using population-based administrative data. Additional research is needed to ascertain whether the inclusion of additional types of data can achieve this goal.


APPENDIX A CLASSIFICATION AND REGRESSION TREE (CART) METHODOLOGY
CART 1,2 is an extremely flexible type of decision tree algorithm that can take account of a very large number of independent variables, automatically allowing for arbitrarily complicated interactions among the them.CART uses recursive partitioning to divide all members of a cohort into mutually exclusive subgroups, each defined by a given value/range/category of the independent variables.The result is a ramified tree where each "terminal leaf" is one such subgroup.
An innate feature of CART is its ability to completely separate subgroups.Identifying the predictors of one subgroup does not hinder its ability to identify those for other subgroups, whether overlapping or completely separate.This is not the case for multivariable regression, where in the absence of explicitly including numerous interaction terms which quickly become difficult to interpret, inclusion of a subgroup (e.g.nursing home residents) influences the coefficients of all other predictors.
We chose to use CART over other classification methods expecting that identifying a substantial number of individuals who will develop critical illness in the near future would require finding a large number of diverse subgroups (represented by terminal leaves) in which >33% experienced the outcome.Such subgroups would require applying an eventual intervention to no more than three people to have a chance to avert or delay one episode of critical illness.
For a binary outcome such as ours, CART seeks a solution maximizing the "purity" of leaves, in the sense that leaves have as great a fraction of zeros or ones as possible; 3 a variety of different measures can be used as the measure of purity.CART grows trees in single steps, working on an existing node and seeking a single variable (and it's splitting value) that improves the purity of the daughter nodes over the parent node that was split.Input variables may be split once, multiple times, or not at all.Conditions that terminate splitting are down a tree branch are: perfect purity is achieved; the chosen maximum number of branches is reached; the chosen minimum leaf size is reached for each leaf; or the chosen threshold p-value for continued splitting is exceeded.
CART uses two subcohorts to Train the model.We created these two portions of the Training dataset by a 60:40 random subdivision of the FY2013 and 2014 data.The first of these subcohorts is used to generate the full, maximal tree, and then the second subcohort is uses to identify the optimal subtree.Identifying the single best subtree begins with the original (maximal) tree of M leaves.By removing individual leaves from the end back towards the origin ("pruning"), it creates all subtrees of leaf number M-1, M-2, M-3, etc.For every such subtree it calculates the worth of the tree using one of a number of possible parameters.Using that measure of worth, it chooses the subtree of the highest worth containing M, M-1, M-2, M-3, etc. leaves.Among this family of optimal subtrees of different size, it chooses as the final tree the one that has a higher worth than any smaller tree, but equal or higher worth than any larger tree.
We used Lift as the measure of tree worth. 3Lift measures performance of a tree at predicting events in a chosen subset of leaves, compared to the rate of events in the entire sample.For example, if the overall population rate of events is 1%, but in a given subset of leaves the rate is 20%, then the Lift for this subset is 20.
The CART settings we used in SAS Enterprise Miner were: (a) minimum leaf size = 10, (b) a maximum of 30 levels of branching, (c) 2-way and 3-way branching allowed, (d) splitting criterion was GINI, 3 along with a threshold pvalue of 0.20 using both Bonferroni and depth adjustments, (d) tree worth assessed via the Lift measure. 3

APPENDIX B: EXAMPLE OF THE ABILITY OF CART TO COMBINE INPUT VARIABLES
An example of how CART can combine input variables in complex combinations is the following terminal leaf.This subgroup comprised subjects who were • female • living <105 km from the closest high-intensity ICU • SEFI ≥ -0.65 • 2 clinic visits in the 3 months prior to the Start Date • hospital-days were: <25 in the most recent 3 months <25 in the prior 7-12 months ≥10 from 13-24 months prior • ALC + rehabilitation-days were: <52 from 13-24 months prior <42 from 7-12 months prior • outpatient laboratory tests were: <14 outpatient laboratory tests performed in the most recent 3 months <21 in the prior 7-12 months • prescriptions: 12-13 different chemical classes filled in the 4-6 months prior.

APPENDIX C: EXPLANATION FOR AGE OF DATA REPORTED
This study was planned and funded in 2019.That plan included creating the CART model using two years of Training data (FY2013/14 and 2014/15), and evaluating its ability for high-fidelity prediction of critical illness in the Test data of subsequent years (FY2015/16, 2016/17 and 2017/18, the latest data then available).Anticipating that our hypothesis was correct, the plan included assessing how the predictive ability decays over time by serially assessing prediction for FY2015/16, then FY2016/17 and finally FY2017/18.However, contrary to our hypothesis, even prediction in the first Test year data (FY2015/16) was unsuccessful.Therefore, assessment of prediction for the two following fiscal years was judged to be futile.
While in principal it would have been possible to redo the analysis using FY2015/16 and FY2016/17 as the Training data and the final year of data (FY2017/18) as the Test year, we chose not to do this for three reasons.First, we considered it very unlikely that doing so would alter the conclusions of the presented manuscript.The others were a confluence of: (ii) analysis delays of approximately nine months over 2020 and 2021 due to the COVID-19 pandemic in Canada, and (iii) the desire of the provincial health department that the report they funded in 2019 would be completed and presented without further delay.x-C26.x,C30.x-C34.x,C37.x-C41.x,C43.x, C45.x-C58.x,C60.x-C76.x,C81.x-C85.x,C88.x, C90.x-C97.xTrauma, injury (ICD-10-CA) S00 -T35, T66 -T79, V, W, or X: with diagnosis type indicating present at hospital admission CCI, Canadian Classification of Interventions Supplementary Table 3

Table 1 .
Databases and data elements used.
years (April 1-May 30); s, values omitted as representing <5 individuals or required to censor other group(s) with <5 individuals; ICU, intensive care unit; IMV, invasive mechanical ventilation; GI, gastrointestinal; ADG, Johns Hopkins Aggregated Diagnosis Group system TM ; ALC, alternative level of care; ATC4, fourth level of the Anatomic Therapeutic Chemical Classification system; CVD, collagen-vascular diseases ‡Distance between residence and nearest high-intensity intensive care unit §CMA (census metropolitan area), CA (census area), MIZ (metropolitan influenzed zone) ¶Timing backwards from the Start date: (A) 13-24 months prior, (B) 5-12 months prior, (C) 4-6 months prior, (D) 0-3 months prior Supplementary Table5.Optimal CART tree branching, among 2644 total terminal leaves.#branchinglevel # of terminal leaves at this level % of all terminal leaves level of the Anatomic Therapeutic Chemical Classification system; SEFI, socioeconomic factor index; †From the Johns Hopkins Adjusted Clinical Group® (ACG®) Case-Mix System †Fiscal