AUTHOR=Cockrell Chase , Larie Dale , An Gary 

TITLE=Preparing for the next pandemic: Simulation-based deep reinforcement learning to discover and test multimodal control of systemic inflammation using repurposed immunomodulatory agents

JOURNAL=Frontiers in Immunology

VOLUME=Volume 13 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/immunology/articles/10.3389/fimmu.2022.995395

DOI=10.3389/fimmu.2022.995395

ISSN=1664-3224

ABSTRACT=Background: Preparing to address the critical gap in a future pandemic between non-pharmacological measures and the deployment of new drugs/vaccines requires addressing two factors: 1) finding virus/pathogen-agnostic pathophysiological targets to mitigate disease severity and 2) finding a more rational approach to repurposing existing drugs. It is increasingly recognized that acute viral disease severity is heavily driven by the immune response to the infection (“cytokine storm” or “cytokine release syndrome”). There exist numerous clinically available biologics that suppress various pro-inflammatory cytokines/mediators, but it is extremely difficult to identify clinically effective treatment regimens with these agents. We propose that this is a complex control problem that resists standard methods of developing treatments and requires the application of simulation-based deep reinforcement learning (DRL) in a fashion akin to training successful game-playing artificial intelligences (AIs). This proof-of-concept study determines if simulated sepsis (e.g. infection-driven cytokine storm) can be controlled in the absence of effective antimicrobial agents by targeting cytokines for which FDA-approved biologics currently exist. 
Methods: A previously validated agent-based model, the Innate Immune Response Agent-based Model (IIRABM),  was used for control discovery using DRL. Training used a Deep Deterministic Policy Gradient (DDPG) approach with a clinically plausible control interval of 6 hrs with manipulation of 6 cytokines for which there are existing drugs: Tumor Necrosis Factor (TNF), Interleukin-1 (IL-1), Interleukin-4 (IL-4), Interleukin-8 (IL-8), Interleukin-12 (IL-12) and Interferon-𝛾 (IFNg). 
Results:  DRL trained an AI policy that could improve outcomes from a baseline Recovered Rate of 61% to 90% over ~21 days simulated time. Testing on 4 different parameterizations representing a range of host and microbe characteristics showed a range of improvement in Recovered Rate by +33% to +56%.
Discussion: This proof-of-concept study shows that disease severity mitigation can potentially be accomplished with existing anti-mediator drugs, but requires a multi-modal, adaptive treatment policy using an AI.  While the actual clinical implementation of this approach is a projection for the future, the goal of this work is to inspire the development of a research ecosystem that marries improvement of simulation models with development of sensing/assay technologies to collect the data needed to iteratively refine those models.