AUTHOR=Adeluwa Temidayo , McGregor Brett A. , Guo Kai , Hur Junguk TITLE=Predicting Drug-Induced Liver Injury Using Machine Learning on a Diverse Set of Predictors JOURNAL=Frontiers in Pharmacology VOLUME=Volume 12 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/pharmacology/articles/10.3389/fphar.2021.648805 DOI=10.3389/fphar.2021.648805 ISSN=1663-9812 ABSTRACT=A major challenge in drug development is safety and toxicity concerns due to drug side effects. One such side effect, drug-induced liver injury (DILI), is considered a primary factor in regulatory clearance The Critical Assessment of Massive Data Analysis (CAMDA) 2020 CMap Drug Safety Challenge goal was to develop prediction models based on gene perturbation of preselected cell-lines (CMap), extended structural information (MOLD2), toxicity data (TOX21), and FDA reporting of adverse events (FAERS). The drug perturbation gene expression signatures are from the CMap L1000 assay of six human cell lines, namely, PHH, HEPG2, HA1E, A375, MCF7, and PC3. Four types of DILI classes were targeted, including DILI1 (severity score ≥ 6), DILI3 (withdrawn, box warning, warning, and precaution), DILI5 (assigned endpoint 1), and DILI6 (assigned endpoint 2). The L1000 cell expression data has low drug coverage across cell lines with only 247/617 drugs involved in the study measured in all cell types. We addressed this by using KRU-BOR ranked merging to generate a singular drug expression signature across all six cell lines. These merged signatures were then narrowed down to the top and bottom ranked 100, 250, 500, or 1000 genes most perturbed by drug treatment. These signatures were subject to feature selection using a Fisher’s exact test to identify genes predictive of DILI status.Models based solely on expression signatures had varying results for each DILI subtype with the 1000 merged genes demonstrating 73.2% accuracy but with a 0.065 MCC value for DILI1 prediction. However, the cell expression merged models did not perform well predicting DILI3, DILI5, or DILI6 often performing around 50% accuracy. Models developed using MOLD2 predictors achieved an accuracy of 99% with an MCC value of 0.959 in predicting DILI6. All other DILI classification models based on FAERS, Tox21, and MOLD2 performed poorly. Overall, from our experiment, these data may not be adequate to classify DILI status.