Machine Learning-Augmented Propensity Score Analysis of Percutaneous Coronary Intervention in Over 30 Million Cancer and Non-cancer Patients

Background: It is unknown to what extent the clinical benefits of PCI outweigh the risks and costs in patients with vs. without cancer and within each cancer type. We performed the first known nationally representative propensity score analysis of PCI mortality and cost among all eligible adult inpatients by cancer and its types. Methods: This multicenter case-control study used machine learning–augmented propensity score–adjusted multivariable regression to assess the above outcomes and disparities using the 2016 nationally representative National Inpatient Sample. Results: Of the 30,195,722 hospitalized patients, 15.43% had a malignancy, 3.84% underwent an inpatient PCI (of whom 11.07% had cancer and 0.07% had metastases), and 2.19% died inpatient. In fully adjusted analyses, PCI vs. medical management significantly reduced mortality for patients overall (among all adult inpatients regardless of cancer status) and specifically for cancer patients (OR 0.82, 95% CI 0.75–0.89; p < 0.001), mainly driven by active vs. prior malignancy, head and neck and hematological malignancies. PCI also significantly reduced cancer patients' total hospitalization costs (beta USD$ −8,668.94, 95% CI −9,553.59 to −7,784.28; p < 0.001) independent of length of stay. There were no significant income or disparities among PCI subjects. Conclusions: Our study suggests among all eligible adult inpatients, PCI does not increase mortality or cost for cancer patients, while there may be particular benefit by cancer type. The presence or history of cancer should not preclude these patients from indicated cardiovascular care.


INTRODUCTION
Cardiovascular diseases and cancer are the most prevalent chronic diseases and are the leading causes of morbidity and mortality in the world; specifically, one in six deaths and an estimated total of 9.6 million deaths in 2018 were attributable to cancer (1)(2)(3). Cardiovascular diseases and several cancer types share similar modifiable risk factors: high body mass index, low fruit and vegetable intake, lack of physical activity, and tobacco and alcohol use (4)(5)(6)(7). Cancer itself is a pro-inflammatory and hypercoagulable state that increases the risk of cardiovascular events (4,(8)(9)(10)(11)(12)(13)(14). Certain primary malignancies are more likely than others to be associated with CAD, either due to shared risk factors or because their required treatments are associated with accelerated atherosclerosis (4,5,(15)(16)(17). Aside from the clinical impact, the economic impact of cancer also is increasing with the United States' annual direct medical costs (i.e., the total of all healthcare expenditures) for cancer totaled over $80 billion (7,18).
Further cancer patients with comorbid CAD are less likely to be treated with percutaneous coronary intervention (PCI) compared with the general population (9,19) as they present with higher risk of complications from PCI and increased frailty (20)(21)(22)(23)(24). This risk is more pronounced in specific primary malignancies (i.e., lung cancer) and with the presence of metastases (20). With improved patient survival from novel cancer treatments, as well as the parallel increase in the safety of interventional procedures, the use of PCI in patients with comorbid cancer has recently been revisited (9,20,21,(25)(26)(27)(28)(29)(30)(31)(32). This recent Nationwide Inpatient Sample offers an opportunity to evaluate the impact of current (with and without metastatic disease) or historical cancer diagnosis on clinical and economical outcomes (cost and length of stay). We sought therefore to conduct the first nationally representative analysis of PCI vs. no PCI among all CAD inpatients with and without cancer and among all available cancer types for mortality and cost using machine learning-augmented propensity score analysis including with racial and income disparity analysis.

Study Design
We sought to conduct the first nationally representative analysis of PCI vs. no PCI among all CAD inpatients with and without cancer and among all available cancer types for mortality and cost using machine learning-augmented propensity score analysis including with racial and income disparity analysis This study is thus a multi-center analysis of inpatient mortality (primary endpoint) and total costs (secondary endpoint) among all eligible hospitalized adults; it assessed the association among the endpoints and PCI (yes/no) for acute coronary syndrome (ACS, including unstable angina/including non-ST segment elevation myocardial infarction [UA/NSTEMI] and STEMI) and PCI and cancer (yes/no overall, including overall and comparatively by primary organ site). To reduce confounding bias in this non-randomized studies, the above endpoints were assessed in the above sub-group stratified analyses to facilitate result interpretation. The 2016 NIS dataset was selected for this study because it is the latest and best reflects current clinical trends in PCI use. Study inclusion criteria were all NIS hospitalizations for adults 18 years or older during 2016. This study used deidentified data and was conducted according to the ethical principles in the Declaration of Helsinki.
Subjects undergoing PCI were identified by the ICD-10 procedure codes of 00.66 (percutaneous transluminal coronary angioplasty), 36.06 [insertion of non-drug-eluting coronary artery stent(s)], or 36.07 [insertion of drug-eluting coronary artery stent(s)]. HCUP tools such as the Clinical Classification Software, which had been used prior to the NIS 2016 dataset for such purposes as classifying cancer (e.g., by primary type, current vs. historical), were not used in this study because they were found by HCUP as a beta version to be unreliable when applied to the 2016 dataset's ICD-10 data.

Data Source
The data source for this study was the 2016 NIS for hospital discharges. The NIS is largest all-payer inpatient dataset in the nation, sponsored by the US Department of Health and Human Services' Agency for Healthcare Research and Quality and maintained within the Healthcare Cost and Utilization Project (HCUP). The NIS began in 2004 with data collection from select hospitals and expanded in 2012 to encompass discharge data from all HCUP participating hospitals. In 2016, the NIS data coding adopted the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM). The NIS currently accounts for ∼1 in 5 discharges from all community hospitals in the United States. To reduce sampling bias, the sampling strategy has been modified in the most recent data to produce results more generalizable to all inpatient discharges in the country and so the associated sampling weights were applied to this analysis.

Statistical Analysis
Descriptive statistics for demographics (i.e., age, sex, race, insurance) and comorbidities were performed for the full sample. Comorbidities were selected for analysis (and identified in the dataset by their ICD-10 scores) on the basis of their clinical and/or statistical significance for similar studies in the existing literature. The comorbidities included in this study were diabetes, hypertension, peripheral vascular disease, hyperlipidemia, smoking, obesity, poor diet, stroke, congestive heart failure, cardiac arrest, myocardial infarction, cardiogenic shock, valvular disease, anemia, chronic obstructive pulmonary disease, coagulopathy, chronic kidney disease, and malignancy (overall and by primary malignancy type).
Bivariable analysis was then conducted separately according to the following: (a) inpatient mortality (yes/no); (b) PCI (yes/no) among the overall sample, stratified by metastases (yes/no) and in subgroup analyses among patients with malignancy; (c) PCI vessel number (multi-vs. single-vessel); (d) malignancy (yes/no) in subgroup analyses among patients who died with UA/NSTEMI and separately among those with STEMI; (e) length of stay by primary malignancy type; (f) total cost by primary malignancy type. For continuous variables, independent sample t-tests were performed to compare means and Wilcoxon rank sum tests were performed for medians. For categorical variables, Pearson chi square tests or Fisher exact tests were performed to compare proportions.
Variables found to be statistically significant in the bivariable analysis were then included in forward and backward stepwise regression to augment decision-making on which variables should be included in the final multivariable regression models. This regression analysis was conducted to assess the following outcomes: (a) inpatient mortality (by logistic) and, (b) total hospital costs (by linear, adjusting with the additional variable of length of stay). The regression models separately assessed these outcomes according to the following major predictors: (a) historical or active malignancy (yes/no), and primary malignancy type (brain and nervous system, head or neck, thyroid, breast, lung, esophagus, stomach, pancreas, liver or bile system, rectum or anus, colon, peritoneum, bone or connective tissue system, hematological malignancies [including Hodgkin lymphoma, Non-Hodgkin lymphoma, leukemia, and multiple myeloma], skin, uterus, cervix, ovarian, prostate, testes, bladder, and renal). Sub-group analysis without propensity score adjustment was conducted separately according to history of CAD (additionally with stratified analysis by ACS and active or prior malignancy), active malignancy, prior malignancy, presenting diagnosis of ACS, UA/NSTEMI, and STEMI. These models featured the interaction between PCI and malignancy, while adjusting for age, race, income, metastases, and mortality risk by DRG (other variables were excluded based upon the below machine learning analysis and diagnostic testing to produce the most clinically and statistically justifiable models).
Next, machine learning-backed propensity score-adjusted multivariable regression was conducted for mortality and controlled for age, race, income, presence of metastases, and mortality risk by diagnosis-related group in addition to the likelihood of undergoing PCI and the NIS weights accounting for the cluster sample data structure. The propensity score was then created for the likelihood of undergoing PCI (the treatment), balance was confirmed among blocks, and then the propensity score was included in the final regression models as an adjusted variable. This causal inference approach (propensity score adjustment) was selected because it is a widely accepted methodology to reduce but not eliminate selection bias and the effect of confounding variables. Such competing causal inference approaches as fixed, random, and mixed effects were not appropriate (though these have the added advantage of reducing unobserved variable bias) because the dataset lacked adequate repeated hospitalizations from the same subjects. Propensity score adjustment was used rather than covariate adjustment without the propensity score to enable a more complicated propensity score model (i.e., able to test interactions and higher order terms to produce the most robust estimated probability of treatment assignment) without risking overparameterizing while still permitting diagnostic analysis of the final models to be done to confirm superior performance to simple covariate adjustment without the propensity score. Finally, propensity score adjustment rather than competing propensity score techniques was used because of its superior performance in the appropriate context (confirmed by current statistical theory and adequate diagnostic quantitative testing of the final models in cardiovascular studies) (33,34).
The utility of this above hybrid analytic approach, which integrates the traditional statistical method of frequentist-based multivariable regression (supported by propensity score-based causal inference analysis) and supervised learning-based machine learning has been previously demonstrated, as causal inference results which are more familiar to medical science audiences can be confirmed and replicated automatically through machine learning (and thus may accelerate real-time findings on larger high-dimensional datasets as they already increasingly do for other economic sectors outside of medicine), while producing more rapid and accurate results compared to traditional statistics (35)(36)(37)(38)(39)(40).
To modify the final models until optimal performance was achieved, performance was first assessed relative to results from backward propagation neural network machine learning to ensure comparability by root mean squared error and accuracy. Regression model performance was additionally assessed with correlation matrix, area under the curve, Hosmer-Lemeshow goodness-of-fit test, Akaike and Schwarz Bayesian information criterion, variance inflation factor, and tolerance, multicollinearity, and specification error. An academic physician-data scientist and biostatistician confirmed that the final regression models were sufficiently supported by the existing literature and clinical and statistical theory. Fully adjusted regression results were reported with 95% confidence intervals (CIs) with statistical significance set at a 2-tailed p < 0.05. Statistical analysis was performed with STATA 14.2 (STATA Corp, College Station, Texas, USA), and machine learning analysis was performed with Java 9 (Oracle, Redwood Chores, California, USA).

Overall Sample Descriptive and Bivariable Analyses
Among the 30,195,722 hospitalized patients meeting study criteria, the mean (SD) age was 57.51 (20.33) years; 17,558,812

Overall Sample Multivariable Regression Analyses by PCI
In machine learning-backed multivariable regression fully adjusted for age, race, income, metastases, and mortality risk by DRG, PCI was associated with a significantly reduced odds of mortality for all patients among all adult inpatients regardless of cancer status (OR 0.77, 95%CI 0.75-0.79; p < 0.001) and specifically for cancer patients (OR 0.82, 95%CI 0.75-0.89; p < 0.001). This was confirmed by propensity score adjustment while significantly reducing their total hospital costs (beta USD$ −8,668.94, 95%CI −9,553.59 to −7,784.28; p < 0.001) independent of the length of stay.

CAD and Active Cancer Sub-group Multivariable Regression Analyses by PCI
Results were similar in sub-group analysis among CAD patients and separately in prior and active cancer patients (with greater mortality reductions in patients with active [OR 0.63, 95%CI 0.56-0.71; p < 0.001] rather than prior malignancies [OR 0.72, 95%CI 0.65-0.79; p < 0.001]) (Figure 1). In the CAD sub-group with stratified analysis by ACS (UA/NSTEMI and STEMI) and active or prior malignancy, PCI vs. medical management significantly reduced mortality for all patient   groups (Figure 1). Active vs. prior malignancy had mortality reductions across no ACS, UA/NSTEMI, and STEMI groups. The greatest mortality reductions among all groups were patients with active malignancy and UA/NSTEMI (OR 0.41, 95%CI 0.26-0.65; p < 0.001) and active malignancy with STEMI (OR 0.43, 95%CI 0.31-0.59; p < 0.001).

Primary Malignancy Sub-group Multivariable Regression Analyses by PCI
In sub-group analysis by primary malignancy among those with cancer, PCI was associated with a significantly reduced odds of mortality only in patients with head and neck vs. non-head and neck cancers (OR 0.34, 95%CI 0.17-0.66; p = 0.002), Hodgkin lymphoma vs. cancers other than Hodgkin lymphoma (OR 0.35, 95%CI 0.14-0.87; p = 0.025), and leukemia vs. cancers other than leukemia (OR 0.64, 95%CI 0.48-0.86; p = 0.003) (Figure 2). PCI in cancer patients with metastatic disease was associated with reduced mortality but not significantly (OR 0.86, 95%CI 0.71-1.04; p = 0.110). Similarly, PCI also was associated with nonsignificantly reduced mortality in patients with non-solid vs. solid tumors (OR 0.85, 95%CI 0.71-1.02; p = 0.079). There were no significant disparities by income or race among PCI subjects.

DISCUSSION
This propensity score adjusted nationally representative analysis of over 30 million hospitalized adults suggests that PCI does not increase inpatient mortality (primary endpoint) nor total costs (secondary endpoint) among patients with cancer regardless of whether they had concurrent non-ACS, UA/NSTEMI, or STEMI indications (with particular primary malignancies driving more of the above associations than others). These results may support offering PCI when deemed appropriate by clinicians to cancer patients who have traditionally been excluded from or underrepresented in cardiovascular randomized trials (which may account for some of the current hesitation with considering more readily such invasive treatment options). The above clinical findings may thus allow more informed clinician-patient discussions about treatment options at a time when such cardiooncology patients with both CAD and cancer represent a sizeable and growing portion of the PCI patient population nationally (as this analysis of over 1 million PCI procedures detected more than 1 in 10 being performed in such patients with both cancer and heart disease). The most common primary malignancies nationally per our study were prostate, gastrointestinal, breast, skin cancers, lung and hematological. Prostate and skin cancers were the most common primary malignancies in which single-vessel PCI was performed as they can be viewed as more favorable PCI candidates, and were clinical practice is often parallel to non-cancer patients. Conversely, patients with lung, breast, gastrointestinal, and hematological cancers are the cancer patient sub-groups in which multivessel PCI was performed at a higher proportion than single-vessel PCI probably due to time constraints, taking advantage of the window of opportunity and complete revascularization. Also, in lung cancer patients the additional CAD burden can be explained by the higher prevalence of cardiovascular risk factors (such as smoking) FIGURE 2 | Propensity score adjusted inpatient mortality odds ratio change with ACS and PCI by primary malignancy in fully adjusted multivariable regression (N = 30,195,722). Multivariable regression fully adjusted for age, race, income, metastases, and mortality risk by Diagnosis Related Group (DRG); ACS, acute coronary syndrome, NSTEMI, non-ST segment myocardial infarction; UA, unstable angina; STEMI, ST segment myocardial infarction; PCI, percutaneous coronary intervention); GI, gastrointestinal; gyn, gynecological. and the cancer treatments that promote early atherosclerosis (including radiation therapy) (4,5,(15)(16)(17). Prior studies of NIS data, as well as our analysis, have shown that PCI in the setting of lung cancer was associated with a higher risk of inpatient mortality when compared to other primary malignancies (20). The short interval to initiation of cancer treatment due to the aggressive nature of majority of these tumors could be utilized for a more comprehensive cardiovascular risk stratification/evaluation and to optimize medical management in an attempt to minimize cardiovascular morbidity and mortality.
VEGF inhibitors (bevacizumab, sunitinib, sorafenib, pazopanib), novel immunotherapies can be associated with vascular toxicity, enhanced inflammation of atherosclerotic plaques, destabilization of pre-existing plaques, and promotion of plaque rupture (41)(42)(43)(44)(45)(46)(47). Our study provides an overall picture of the impact of such cancer treatments, but the lack of data granularity prohibits more rigorous understanding of the impact of cancer treatments on CAD burden and PCI outcomes. Regardless, our results are consistent with prior studies that support the safety and efficacy of PCI in cancer patient (9,21,25,28,30).
The primary organ site and stage including presence or absence of metastatic disease are the main driver of outcomes in a cancer patient population. Metastatic patients have higher risk for inpatient mortality probably due to the greater extent of their oncologic disease. In our analysis, 1 in 20 cancer patients who underwent PCI had metastatic disease, and the intervention still appeared to reduce mortality.
Additionally our analysis also demonstrated that cancer patients who received PCI had decreased total hospital costs of ∼$8,000-9,000, independent of their inpatient length of stay, clinical acuity, mortality risk (as calculated by DRGs), and other factors rigorously tested in propensity score adjustment. The inherent cost of the procedure could potentially be reduced by their immediate symptomatic improvement and therefore decreased laboratory and imaging tests to identify the cause of symptoms. It appears that there could be a financial incentive for hospital systems to specifically encourage early cooperation and planning between cardiology and oncology teams regarding the timing and choice of cancer therapies and coronary revascularization decisions. Our data support the idea that cancer patients could benefit from cardiovascular evaluation and revascularization from such cardio-oncology teams.
This study does have notable limitations which indicate the results should be interpreted cautiously. This is a nonrandomized study without longitudinal follow-up that relies upon accuracy of ICD10 coding by providers (i.e., coronary artery disease burden, prior detailed cancer treatment regiments, and overall vs. cardiovascular specific mortality) and a selection bias is possible. By utilizing a large nationally representative dataset and propensity score and machine learning supported analyses with aggressive regression model performance optimization, we attempted to minimize the impact of such limitations and produce the most robust results possible on the association between PCI outcomes and cancer.

CONCLUSIONS
This nationally representative multicenter comprehensive analysis of inpatient mortality and total costs of PCI in all eligible hospitalized patients with and without cancer (including sub-group analysis by CAD, cancer by primary organ site, active vs. prior cancer, and ACS) suggests a significant and independent inpatient mortality and cost benefit for PCI vs. medical management particularly for cancer patients. As there is a unique cancer and coronary artery disease interaction, certain cancer types have a more pronounced mortality benefit compared to others. This study also suggests that PCI was considered in cancer patients regardless of their primary malignancy type, active or prior malignancy status, and ACS status and did not suggest a significant increase in LOS or cost. Our analysis may support future randomized trials to assess the safety and optimal clinical application of coronary revascularization of onco-cardiology patients with both CAD and cancer, while possibly highlighting the current utility of multi-disciplinary teams for this growing and complex patient population.

DATA AVAILABILITY STATEMENT
The data analyzed in this study is subject to the following licenses/restrictions: The dataset is available for purchase through the United States Agency for Healthcare Research and Quality Healthcare Cost and Utilization Project (HCUP). Requests to access these datasets should be directed to HCUP, HCUPDistributor@ahrq.gov.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by MD Anderson. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.