Rapid identification of inflammatory arthritis and associated adverse events following immune checkpoint therapy: a machine learning approach

Introduction Immune checkpoint inhibitor-induced inflammatory arthritis (ICI-IA) poses a major clinical challenge to ICI therapy for cancer, with 13% of cases halting ICI therapy and ICI-IA being difficult to identify for timely referral to a rheumatologist. The objective of this study was to rapidly identify ICI-IA patients in clinical data and assess associated immune-related adverse events (irAEs) and risk factors. Methods We conducted a retrospective study of the electronic health records (EHRs) of 89 patients who developed ICI-IA out of 2451 cancer patients who received ICI therapy at Northwestern University between March 2011 to January 2021. Logistic regression and random forest machine learning models were trained on all EHR diagnoses, labs, medications, and procedures to identify ICI-IA patients and EHR codes indicating ICI-IA. Multivariate logistic regression was then used to test associations between ICI-IA and cancer type, ICI regimen, and comorbid irAEs. Results Logistic regression and random forest models identified ICI-IA patients with accuracies of 0.79 and 0.80, respectively. Key EHR features from the random forest model included ICI-IA relevant features (joint pain, steroid prescription, rheumatoid factor tests) and features suggesting comorbid irAEs (thyroid function tests, pruritus, triamcinolone prescription). Compared to 871 adjudicated ICI patients who did not develop arthritis, ICI-IA patients had higher odds of developing cutaneous (odds ratio [OR]=2.66; 95% Confidence Interval [CI] 1.63-4.35), endocrine (OR=2.09; 95% CI 1.15-3.80), or gastrointestinal (OR=2.88; 95% CI 1.76-4.72) irAEs adjusting for demographics, cancer type, and ICI regimen. Melanoma (OR=1.99; 95% CI 1.08-3.65) and renal cell carcinoma (OR=2.03; 95% CI 1.06-3.84) patients were more likely to develop ICI-IA compared to lung cancer patients. Patients on nivolumab+ipilimumab were more likely to develop ICI-IA compared to patients on pembrolizumab (OR=1.86; 95% CI 1.01-3.43). Discussion Our machine learning models rapidly identified patients with ICI-IA in EHR data and elucidated clinical features indicative of comorbid irAEs. Patients with ICI-IA were significantly more likely to also develop cutaneous, endocrine, and gastrointestinal irAEs during their clinical course compared to ICI therapy patients without ICI-IA.

However, while ICIs are effective anti-cancer agents, checkpoint blockade is associated with development of immune-related adverse events (irAEs) that affect a wide spectrum of organ systems (14-17).IrAEs can pose a significant barrier to ICI usage.They can prevent patients from continuing ICIs, diminish patient quality of life, and in severe cases, lead to death (3,12,13,18).There are two major deficits in our current understanding of irAEs and our ability to care for patients with irAEs.First, irAEs are difficult to identify both in the patient care setting and for research studies.Outside of large clinical trials, many published studies are single-site with small cohorts identified and characterized by labor-intensive chart review (19)(20)(21)(22)(23)(24).Second, we currently have a very limited ability to predict which patients will develop irAEs.Rheumatologic irAEs, including ICI-induced inflammatory arthritis (ICI-IA), epitomizes both of these limitations (24-35).ICI-IA has been recognized as an irAE for less than a decade with case studies and small cohort descriptions first appearing in the literature in 2017 (24)(25)(26).With a reported Frontiers in Immunology frontiersin.orgprevalence of 1-7%, it is relatively rare (36).However, it can have significant impact for patients with 12-13% of cases resulting in termination of ICI therapy (34,35).Previous studies have described ICI-IA affecting the knees or small joints, tending to be seronegative for anti-cyclic citrullinated peptide (CCP) antibodies and rheumatoid factor, and most commonly treated with corticosteroids with elevation to disease-modifying antirheumatic drugs (DMARDs), tumor necrosis factor (TNF) inhibitors, and interleukin (IL)-6 receptor inhibitors (24-27, 29, 32, 37).Though development of ICI-IA cannot be well predicted, in a previous study, melanoma and genitourinary cancer as well as receiving combination ICI therapy were found to be associated with ICI-IA development compared to lung cancer and PD-1 monotherapy, respectively (38).Prompt identification of patients with ICI-IA is essential for providing rapid referrals for effective clinical care and for performing research into the etiology of ICI-IA.However, ICI-IA is difficult to identify in clinical data, owing to its rarity, lack of dedicated diagnosis code, and heterogeneous presentation (24-32, 35).Furthermore, ICI-IA's typically lower severity compared to other irAE such as myocarditis or pneumonitis means that patients may be less likely to be seen by a rheumatologist for their symptoms.While there have been numerous studies published on ICI-IA, ICI-IA cohort definition has primarily relied on manual identification of these patients by rheumatologists and/or filtering for patients who have been seen by a rheumatologist or been prescribed specific immunomodulatory drugs (26,27,29,32,34,38).As a result, cohorts have remained relatively small and single site and run the risk of missing many ICI-IA patients who may have had a less severe presentation or other barriers to seeing a rheumatologist.Additionally, while many studies on ICI-IA include description of comorbid irAEs (26)(27)(28)32), no previous study has tested the association of ICI-IA with these comorbid irAEs compared to a control cohort of ICI patients who did not develop arthritis, making it difficult to understand if the prevalence of comorbid irAEs experienced by ICI-IA patients is statistically different from the rest of the ICI population.
To address these fundamental gaps in knowledge and facilitate identification of ICI-IA and associations with the development of ICI-IA, we developed an electronic health record (EHR)-based machine learning strategy to rapidly identify patients who have possible ICI-IA and discover hidden features associated with ICI-IA.With the wide adoption of EHRs in inpatient and ambulatory settings (39), EHR data and data modeling strategies present an opportunity to develop tools to identify ICI-IA, as well as uncover clinical features of ICI-IA or associated conditions that may not be immediately obvious.Many previous studies have demonstrated success in identifying conditions including hypertension, stroke, systemic lupus erythematosus, asthma, and leukemia (40)(41)(42)(43)(44).To investigate the relationship between ICI-IA and other irAEs elucidated by our machine learning model, we also fully adjudicated a control cohort of 871 patients receiving ICI therapy for all irAEs, and used this control cohort to examine the association between the development of ICI-IA and other comorbid irAEs as well as cancer type and ICI regimen.

Patient population and data source
The study population included patients seen in the Robert H. Lurie Comprehensive Cancer Center at Northwestern Medicine (NM), a large healthcare system providing inpatient, outpatient, and specialty care throughout Chicago and Northern Illinois.Data was acquired from the Northwestern Medicine Enterprise Data Warehouse (NMEDW), NM's clinical research database containing data on over 10 million patients as of October 2023.This study was governed by the Northwestern University institutional review board, protocol #STU00210502 and STU00206779.
We retrospectively identified an ICI cohort of all patients aged 18 to 99 with a diagnosis of cancer (melanoma, renal cell carcinoma, non-small cell and small cell lung carcinoma, urothelial cancer, head and neck cancer, gastric cancer, colon cancer, liver cancer, cervical cancer, uterine cancer, breast cancer, Hodgkin's lymphoma, Merkel cell carcinoma, rectal cancer, prostate cancer, esophageal cancer, leukemia, or lymphoma) who received at least one dose of ICI therapy (pembrolizumab, nivolumab, cemiplimab, atezolizumab, avelumab, durvalumab, ipilimumab, or tremilumumab) between March 1, 2011 and January 1, 2021 (Figure 1A).Cancer was identified in the NMEDW by International Classification of Disease-9 th revision-Clinical Modification (ICD-9-CM) and ICD-10-CM diagnosis codes.ICIs were identified in the NMEDW by a regular expression search for the generic and brand name in the medication data table (Supplementary Table 1).Sex, race, and ethnicity were gathered from patient demographic data present in the NMEDW.Prior autoimmune diseases were collected from NMEDW diagnosis codes (Supplementary Table 2).

Adjudication of ICI-induced inflammatory arthritis and statistical test control cohorts
Patient charts in NM's Epic EHR were manually reviewed (SDT) with clinician guidance (CG, JL, AK, JS) to identify a case cohort of patients with ICI-IA (Figure 1A).We reviewed the charts of all patients who had the keywords: "arthritis", "arthralgia", or "joint" in the assessment and plan section of any clinical notes written after the patient received their first ICI dose.Cases were classified with ICI-IA if the patient had de novo joint pain/ arthralgia/arthritis (no history of arthritis or presentation different from what the patient has experienced in the past), and an oncologist or rheumatologist noted suspicion of the presentation being secondary to ICI therapy.Case status, date of ICI-IA onset, cancer, ICI regimen, joint involvement, rheumatoid factor, anti-CCP, anti-nuclear antibodies (ANA), and treatment for ICI-IA were recorded in a REDCap database (45).Cases were additionally reviewed for other irAEs the patients experiencedcutaneous AEs, thyroid dysfunction, hypophysitis or adrenal insufficiency, diabetes, hepatic AEs, diarrhea, colitis, pneumonitis, cardiovascular AEs, and encephalitis.They were classified with an irAE if an oncologist or the relevant specialist noted suspicion of the presentation being secondary to ICI therapy without other likely etiologies.See Supplementary Methods for details.
The remaining patients without ICI-IA in the full ICI cohort were used as controls for our machine learning models.To compare irAE associations with ICI-IA, a random sample of 871 patients without ICI-IA (ICI-NoArthritis) from the overall ICI cohort of 2451 were chart reviewed for all irAEs (same irAEs and classification threshold as above) to serve as the control cohort for the statistical tests (SDT, GMP, KJR, CDM, JDJ, JT, KV, PD, SM, UR) (Figure 1A).Following initial adjudication, SDT reviewed a random sample of 10% of the charts and found greater than 90% agreement.

EHR code selection for machine learning
We collected all EHR clinical codes for every patient in our ICI cohort: ICD-9-CM and ICD-10-CM diagnosis codes, Logical Observation Identifiers Names and Codes (LOINC) laboratory codes, Unified Medical Language System (UMLS) RxNorm medication codes, and Current Procedural Terminology (CPT) procedure codes (Figure 1B).To determine if a machine learning model could select sensible ICI-IA-relevant codes and discover new predictors of ICI-IA, we input all EHR codes into our models.ICD-9-CM diagnosis codes were translated to ICD-10-CM to prevent duplication of diagnosis codes, using the concept relationships in the Observational Medical Outcomes Partnership (OMOP) common data model vocabulary tables (46).For each code, we added a temporal modifier specifying whether the code occurred before or after ICI initiation.We then dichotomized each to presence or absence of the code.

Machine learning to identify ICIinduced inflammatory arthritis in EHR data
The EHR codes were fed into Logistic Regression with Ridge (L2) penalty and Random Forest machine learning models to classify patients who experienced ICI-IA from the ICI cohort (Figure 1C).These models were selected for their capacity to provide clinically interpretable models.We bootstrapped model development 100 times to calculate model performance metrics with 95% confidence intervals and consensus code contribution.For each round of model development (each 1 of 100 bootstrap rounds), we used a random 50/50 training-testing split, stratified for consistent case/control proportions in the training and test sets.Cases were up-sampled during cross-validation and model training to balance the low ratio of ICI-IA cases to ICI controls (47).Fivefold cross-validation with 25 iterations was used to optimize model parameters in the training set.The test set was left untouched for performance evaluation.Models were evaluated using area under the receiver operating characteristic curve (AUROC).Accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were measured for all models, optimizing Youden's J to balance sensitivity and specificity (48).EHR code contribution was calculated from the logistic regression beta coefficients and the random forest feature importance.

Key EHR codes in the ICI-induced inflammatory arthritis machine learning model
Feature importance from the random forest model was analyzed for the key EHR codes used to identify ICI-IA, averaging feature importance across the 100 bootstrapped modelscodes with higher feature importance contribute more to the model for identifying ICI-IA (Figure 1D).While not a direct equivalence, these codes could indicate clinical features potentially describing ICI-IA and allow us to determine if the models were using ICI-IA-relevant information to capture patients with ICI-IA.The Fisher Exact test was used to determine the association between ICI-IA and the codes.Models were trained on decreasing percentages of the top codes (50% -0.003%) and performance compared to the full models.

Cancer type and immune checkpoint inhibitor association with ICI-induced inflammatory arthritis
Multivariate logistic regression was used to calculate unadjusted and adjusted odds ratios (ORs) of developing ICI-IA given cancer type and first ICI regimen (specific ICI drug) (Figure 1D).Lung cancer and pembrolizumab were used as reference groups for cancer type and ICI regimen, respectively, as they represented our largest cancer type and treatment regimen.Covariates included in the calculation of adjusted ORs were sex, age, race, and ethnicity, cancer type for ICI ORs and ICI regimen for cancer ORs.Cancer type was determined by ICD-9/10 codes and ICI was determined by regex search as described in the patient population section (Supplementary Table 1).Cancer determination by ICD code compared to chart review showed <10% discrepancy.An ICI regimen was determined combination ICI therapy if two different ICIs were infused on the same date.

Determining irAE associations with ICIinduced inflammatory arthritis
Multivariate logistic regression was used to calculate unadjusted and adjusted ORs of developing non-arthritis irAEs given ICI-IA, compared to ICI-NoArthritis control patients (Figure 1D).IrAE associations were tested at three timeframes relative to ICI-IA development: irAE development any time after ICI initiation, only irAEs occurring prior to ICI-IA development, and only irAEs occurring post ICI-IA development.In the ICI-NoArthritis controls, irAE were included at any time after ICI initiation.Covariates included in the calculation of adjusted ORs were sex, age, race, ethnicity, cancer type, first ICI regimen, and presence of autoimmune disease prior to ICI initiation.

ICI-induced inflammatory arthritis cohort clinical characteristics
From the NMEDW, 2451 patients with a diagnosis of cancer who received at least one ICI dose between March 1, 2011 and January 1, 2021 were identified.Expert-guided chart review identified 89 cases of ICI-IA based on clinical suspicion without other likely etiologies in the oncology and rheumatology clinical notes.There were 2362 patients without ICI-IA remaining from the ICI cohort to serve as machine learning controls, with 871 of those patients adjudicated for other irAE serving as controls for our association tests.

EHR-based machine learning reliably identified ICI-induced inflammatory arthritis
Machine learning models were trained to identify ICI-IA in EHR data on the 89 ICI-IA cases and 2362 machine learning controls.We identified 32,682 EHR codes representing all  3).

Key EHR codes in the ICI-induced inflammatory arthritis model were potentially related to other irAEs
From the random forest model, we evaluated the 55 most important codes to elucidate clinical features relevant to identifying ICI-IA.Codes were categorized as ICI-IA relevant (codes matching elements of ICI-IA presentation, diagnosis, or management in the literature), potentially relevant to other irAEs, and those that were related to other medical history elements.Twenty-two were relevant to ICI-IA, including diagnosis codes for unspecified joint pain, knee pain, and unspecified osteoarthritis (osteoarthritis was likely a placeholder early in the diagnostic workflow); laboratory test and procedure codes for erythrocyte sedimentation rate (ESR), C-reactive protein (CRP), rheumatoid factor, and anti-CCP antibody tests; medication codes for prednisone and methylprednisolone.('ICI-IA' codes in Figure 3).Sixteen were potentially relevant to other irAEs, including codes for endocrine disorder screening, thyroid function tests, cortisol labs, medication codes for triamcinolone, and diagnosis codes for pruritus and myositis ('irAE' codes in Figure 3).Seventeen were related to other medical history elements such as COVID-19 or chemotherapy administration ('Other' codes in Figure 3).
In unadjusted models, ICI-IA patients had higher odds of developing cutaneous irAE, OR = 1.65 (95% CI 1.00-2.72),and gastrointestinal irAE, OR = 1.76 (95% CI 1.07-2.90),prior to ICI-IA development compared to the ICI-NoArthritis control developing these irAEs at any time.ICI-IA patients had higher odds of developing thyroid dysfunction, OR = 2.01 (95% CI 1.01-4.00),post ICI-IA development compared to the ICI-NoArthritis control developing thyroid dysfunction at any time (Table 5).However, these temporal associations were no longer statistically significant when adjusting for sex, age, race, ethnicity, cancer type, first ICI regimen, and presence of prior autoimmune disease.

Discussion
Using an ICI cohort of 2451 patients, 89 of which had ICI-IA, we developed an EHR-based machine learning model that could rapidly identify patients who had possible ICI-IA with AUROCs of 0.80-0.81and accuracy of 0.79-0.80.Our machine learning model captured key EHR codes relevant to ICI-IA as well as codes indicative of cutaneous and endocrine irAEs.On further investigation of these irAE related codes, we found that ICI-IA was associated with development of additional cutaneous, endocrine, and gastrointestinal irAEs independent of cancer type and ICI regimen.
To develop our EHR-based machine learning model of ICI-IA, we created a cohort of 2451 patients who had cancer and received an ICI, including 89 ICI-IA patients, and 871 patients without ICI-IA, who were fully adjudicated for irAEs by manual chart review.
We expanded on an approach described by Thangaraj et al. and used all diagnosis, laboratory test, procedure, and medication codes in the EHR to develop an ICI-IA identification algorithm (43).Both logistic regression and random forest machine learning models performed well on the full code set (32,682 codes) and maintained high performance while reducing the number of EHR codes used to develop the modelsat 31 codes, logistic regression had an AUROC of 0.80 (95% CI 0.75-0.86)while random forest had an AUROC of 0.81 (95% CI 0.75-0.86)However, both models had low PPVs of 0.13, despite good AUROC and accuracy.This is an inherent limitation of PPV in rare conditions such as ICI-IA, with a prevalence only 3.6% in our ICI cohort.This also speaks to the purpose of our ICI-IA model as a filter for potential ICI-IA cases for clinician verification rather than as a replacement of clinical expertise.
A major advantage of this approach compared to traditional manual selection of model features is that our EHR model provided information on codes useful for identifying ICI-IA patients.While these codes are not a direct equivalence, they were indicators of clinical features that could describe ICI-IA.Both models maintained high performance down to 0.1% of the total code set, suggesting the majority of information important for identifying ICI-IA patients was held by a small fraction of EHR codes.We evaluated the top codes from the random forest model to understand the key predictive elements.Twenty-two of the 55 top codes were determined to be relevant to ICI-IA including lab and procedure codes for ESR, CRP, and rheumatoid factor, diagnosis codes for joint pain, and medication codes for steroids.These factors are consistent with the ICI-IA literature regarding ICI-IA presentation, diagnostic tests, and therapy (27,29,37,49).The density of ICI-IA relevant codes in the group of features with the highest feature importancethe top 8 codes were ICI-IA relevant codesindicates that our strategy constructed a clinically sensible model to capture ICI-IA relevant information without manual guidance on which codes should be used, increasing the credibility of the other unexpected key EHR codes we found.Model performance (area under the receiver operating characteristic curve, AUROC) versus percentage of the top features used to develop random forest and logistic regression models.Models maintain high performance with decreasing percentage of top features included before dropping performance with fewer than 0.1% of features or 31 features (vertical dotted line).While many of the top codes identified by our models were similar to those identified by clinical experts in ICI-IA, 16 of the 55 top codes appeared related to other irAEs, suggesting a possible relationship between ICI-IA and other irAE.We found codes for thyroid stimulating hormone (TSH) and T3 thyroid function tests, cortisol tests, pruritus, and triamcinolone, all after ICI initiation.These codes indicated potential association of ICI-IA with endocrine and cutaneous irAE.Indeed the literature suggests comorbid cutaneous, endocrine, gastrointestinal, and other irAEs with development of ICI-IA (26,27).However, to our knowledge we are the first to statistically associate that ICI-IA patients were significantly more likely than ICI patients without ICI-IA to also develop cutaneous, OR = 2.66 (95% CI 1.63-4.35),endocrine, OR = 2.09 (95% CI 1.15-3.80),and gastrointestinal irAEs, OR = 2.88 (95% CI 1.76-4.72),adjusting for sex, age, race, ethnicity, cancer type, first ICI regimen, and presence of autoimmune disease prior to ICI initiation.ICI-IA patients were also more likely to experience any other irAE, OR = 2.53 (95% CI 1.49-4.31).To our knowledge, this is the first study that has conducted tests of association of irAEs comparing ICI-IA to a control cohort of ICI patients without ICI-IA, with both cohorts fully adjudicated for irAEs.This confirms the results of previous studies (26,30,32,38) and shows that patients with ICI-IA are actually experiencing higher rates of cutaneous, endocrine, and gastrointestinal irAE compared to the general ICI population.Additionally, our findings suggest that the association between ICI-IA and cutaneous, endocrine, and gastrointestinal irAEs is independent of cancer type, ICI regimen, and prior autoimmune disease.
Conducting temporal association tests for irAEs that occurred before and after ICI-IA, we found that ICI-IA patients were more likely to have developed cutaneous irAEs and gastrointestinal irAEs prior to ICI-IA, in an unadjusted model.Conversely, ICI-IA patients were more likely to develop thyroid dysfunction after ICI-IA.Our temporal association findings indicate a possible common phenotype of patients who receive ICI therapy, develop skin and/or gastrointestinal irAEs, followed by ICI-IA, and finally thyroid dysfunction.However, when adjusting for demographics, cancer type, ICI regimen, and prior autoimmune disease, these temporal associations were no longer statistically significant, suggesting that the temporal phenotypes may be dependent on cancer type and ICI regimen or specific to our cohort and will require larger cohort sizes and future study.
As part of our analysis, we recapitulated findings in the literature of associations between cancer and ICI regimen and development of ICI-IA.We found similar associations in our ICI-IA cohort to that observed previously by Cunningham-Bussel et al. (38) even with our differing inclusion protocol for ICI-IA, increasing our confidence in the representativeness of our ICI-IA and ICI cohorts.Melanoma patients, OR = 1.99 (95% CI 1.08-3.65),and renal cell carcinoma patients, OR = 2.03 (95% CI 1.06-3.84),had higher odds of developing ICI-IA compared to lung cancer patients, adjusting for sex, age, race, ethnicity, and first ICI regimen.We also found that patients whose first ICI regimen was combination nivolumabipilimumab had higher odds of developing ICI-IA, OR = 1.86 (95% CI 1.01-3.43),compared to patients who received pembrolizumab, adjusting for sex, age, race, ethnicity, and cancer type.We additionally found similar time elapsed from ICI initiation to ICI-IA development compared to the ICI-IA literaturemedian of 20 weeks (5 months) compared to 2.7 -5 months (30,34,50).Our study is not without limitations.Despite being larger than most published studies, our ICI-IA cohort is still relatively small (N=89) and constrained to a single site.This small cohort relative to the number of EHR codes could make training our machine learning models difficult.We mitigated this issue by performing feature reduction based on the feature importance provided by the initial random forest model and saw high-performance using only 0.1% of the total code set.Another limitation is that our ICI-IA adjudication was done by retrospective chart review, therefore relying on clinician documentation of ICI-IA, and included patients that were not seen by a rheumatologist.Thus, some of these patients may have arthralgia or polymyalgia rheumatica rather than arthritis.However, as we expand this study to further sites and patients, we will be able to better make this distinction.We chose to accept these limitations to capture a broader cohort of patients with potentially less severe ICI-IA or who had other barriers to seeing a rheumatologist.Our finding that only 18 of the 89 ICI-IA patients were seen by a rheumatologist further highlights the shortcomings of the current standard for defining ICI-IA cohorts and the importance of our model for identifying ICI-IA patients.
In summary, we developed a novel EHR-based machine learning algorithm that was able to identify ICI-IA patients in the EHR with high performance.This EHR algorithm could be adapted to other EHR systems to facilitate cohort definition for ICI-IA for multicenter research studies and to assist in clinical practice with recommending patients suspected to have ICI-IA to follow-up care with a rheumatologist.Our machine learning strategy revealed hidden relationships between ICI-IA and other irAE that were corroborated in association tests.It provided insight into the clinical features that were descriptive of ICI-IA, including features potentially pertaining to comorbid irAEs.By further examining these irAEs, we found that cutaneous, endocrine, and gastrointestinal irAEs were significantly more likely to be found in patients with ICI-IA compared to cancer patients receiving ICI therapy who did not experience ICI-IA, independent of cancer type and ICI regimen.We also found potential temporal relationships between ICI-IA and other irAEs.This indicates that other such temporal phenotypes may exist in patients who develop irAEs that could be captured in EHR data.Overall, further exploration of irAEs in EHR data and investigation of irAE temporal phenotypes may reveal leads for mechanistic studies of irAE development and improve care for these patients through more rapid identification of irAEs and timelier recommendation for follow-up care.

1
FIGURE 1 Methods diagram.(A) Manual adjudication of ICI cohort (N=2451 patients) for immune checkpoint inhibitor-induced inflammatory arthritis (ICI-IA) cases (N=89 patients).From the ICI cohort, 871 random patients without ICI-IA (ICI-NoArthritis) were adjudicated for all irAE.(B) Electronic health record data extraction of all diagnosis (ICD-9-CM, ICD-10-CM), medication (RxNorm), laboratory test (LOINC), and procedure (CPT) codes.Individual code occurrences were modified to specify whether they occurred before or after ICI initiation, and dichotomized to presence/absence of the code.Data was extracted for the full ICI cohort of 2451 patients.(C) Logistic regression (LR) and random forest (RF) machine learning models were trained on the EHR codes to identify ICI-IA.(D) Feature importance was analyzed to characterize ICI-IA patients in the EHR.Multivariate logistic regression was used to calculate odds ratios for development of ICI-IA given cancer and ICI regimen, as well as development of non-arthritis irAEs given ICI-IA versus ICI-noIA.

FIGURE 3 Key
FIGURE 3 Key EHR codes in the machine learning models and association with ICI-induced inflammatory arthritis (ICI-IA).Left: the random forest model's feature importance for identifying ICI-IA patients.Right: odds of the patient having an EHR code if they developed ICI-IA versus if they did not, by Fisher Exact test.Error bars are 95% confidence intervals.'ICI-IA' codes are those directly relevant to ICI-IA.'irAE' codes are those potentially describing other irAEs.'Other' codes are those describing other parts of the patient medical history.The top codes are predominantly ICI-IA relevant codes, with a high concentration of relevant codes occupying the topmost importance.The top irAE related codes are endocrine (cortisol, thyroid function tests, estradiol), myositis, and cutaneous (medication order for triamcinolone and pruritus).The majority of the top codes are positively associated with ICI-IA.Codes are labeled by name, code vocabulary, code, and temporal modifier (before or after ICI therapy initiation).

TABLE 1
Demographic and clinical variables for cancer patients receiving Immune Checkpoint Inhibitor Therapy between March 1, 2011 and January 1, 2021.

TABLE 2
Immune checkpoint inhibitor-induced inflammatory arthritis (ICI-IA) laboratory tests, joint involvement, and comorbid irAEs for our cohort of patients with ICI-IA.

TABLE 3 Logistic
Regression and Random Forest machine learning models performance metrics for identifying ICI-induced inflammatory arthritis (ICI-IA).Accuracy, sensitivity, specificity, PPV, and NPV were calculated optimizing Youden's J. AUROC, area under the receiver operating characteristic curve; PPV, positive predictive value; NPV, negative predictive value.Confidence intervals in parentheses are 95% confidence intervals calculated by bootstrapping model development 100 times.

TABLE 4
Odds of developing ICI-induced inflammatory arthritis (ICI-IA) given cancer and first ICI.

TABLE 5 Continued
Covariates included for adjusted odds ratios (ORs) were sex, age, race, ethnicity, cancer type, first ICI regimen, and prior autoimmune disease.IrAE anytime: in cases, irAEs included were those that developed at any time in the patient medical history.IrAE Prior to ICI-IA: in cases, irAEs included were those that developed prior to ICI-IA.IrAE post ICI-IA: irAEs included where those that developed after ICI-IA.In controls, irAEs that developed at any time in the medical history were included for all comparisons.Endocrine irAEs included thyroid, hypophysitis/adrenal insufficiency (Insuff.), and diabetes.Gastrointestinal irAEs included diarrhea/constipation and colitis.*These irAE had zero cases in our ICI-IA cohort at these timepoints.