AUTHOR=Su Longxiang , Li Yansheng , Liu Shengjun , Zhang Siqi , Zhou Xiang , Weng Li , Su Mingliang , Du Bin , Zhu Weiguo , Long Yun 

TITLE=Establishment and Implementation of Potential Fluid Therapy Balance Strategies for ICU Sepsis Patients Based on Reinforcement Learning

JOURNAL=Frontiers in Medicine

VOLUME=Volume 9 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2022.766447

DOI=10.3389/fmed.2022.766447

ISSN=2296-858X

ABSTRACT=Objective 
This study is intended to establish and verify a model for judging the direction of fluid therapy based on machine learning.
Method
This study included 2705 sepsis patients from the Peking Union Medical College Hospital Intensive Care Medical Information System and Database (PICMISD) from Jan. 2016 to April. 2020. The training set and test set (Jan. 2016 to June. 2019) were randomly divided. Twenty-seven features were extracted for modeling, including 25 state features (bloc, vital sign, laboratory examination, blood gas assay and demographics), 1 action feature (fluid balance) and 1 outcome feature (ICU survival or death). SARSA was used to learn the data rules of the training set. Deep Q-learning (DQN) was used to learn the relationship between states and actions of the training set and predict the next balance. A double-robust estimator was used to evaluate the average expected reward of the test set in the deep Q-learning model. 
Results
The training set and test set were extracted from the same database, and the distribution of liquid balance was similar. Actions were divided into 5 intervals corresponding to 0-20%, 20-40%, 40-60%, 60-80%, and 80-100% percentiles of fluid balance. The higher the reward of Q(s,a) calculated by SARSA from the training set, the lower the mortality rate. Deep Q-learning indicates that both fluid balance differences that are too high and too low show an increase in mortality. The more consistent the fluid balance prediction with the real result, the lower the mortality rate. The smaller the difference between the prediction and the reality, the lower the mortality rate. The double-robust estimator shows that the model has satisfactory stability. The validation set indicates that the mortality rate of patients in the “predicted negative fluid balance and actual negative fluid balance” subgroup was the lowest, which was statistically significant, indicating that the model can be used for clinical verification.
Conclusion
We used reinforcement learning to propose a possible prediction model for guiding the direction of fluid therapy for sepsis patients in the ICU.