Early Detection of Sepsis With Machine Learning Techniques: A Brief Clinical Perspective

Sepsis is a major cause of death worldwide. Over the past years, prediction of clinically relevant events through machine learning models has gained particular attention. In the present perspective, we provide a brief, clinician-oriented vision on the following relevant aspects concerning the use of machine learning predictive models for the early detection of sepsis in the daily practice: (i) the controversy of sepsis definition and its influence on the development of prediction models; (ii) the choice and availability of input features; (iii) the measure of the model performance, the output, and their usefulness in the clinical practice. The increasing involvement of artificial intelligence and machine learning in health care cannot be disregarded, despite important pitfalls that should be always carefully taken into consideration. In the long run, a rigorous multidisciplinary approach to enrich our understanding in the application of machine learning techniques for the early recognition of sepsis may show potential to augment medical decision-making when facing this heterogeneous and complex syndrome.

Sepsis is a complex and evolving concept: (i) complex, because it involves two different actors (the infection and the host response) and their relative contribution to the organ damage may vary across patients and over time in the same patient (6,7); (ii) evolving, because various definitions have been developed and adopted over the last years, reflecting the complexity of the elusive concept to be defined. Although the evolution of the concept of sepsis could reflect a positive trajectory aimed to provide more precise definitions for both clinical and research purposes, the long-lasting process of defining sepsis has led to the use of different terminology and definitions across research studies (8,9). As a consequence, this has reduced the comparability of study results, and also hindered a smooth development of predictive models of sepsis. Nonetheless, the development of good predictive models of sepsis remains a timely topic, since it would help clinicians to readily identify patients at higher risk of (or likely to already have) sepsis, thereby allowing close monitoring and/or early treatment (thereby potentially reducing mortality) (10).
Prediction of sepsis through the use of machine learning models has gained particular attention over the past few years (11)(12)(13)(14)(15)(16)(17). In the present perspective, we provide a brief clinicianoriented vision on some relevant aspects concerning the use of machine learning predictive models for the prediction of sepsis in the daily practice. In particular, although predictive models may be developed for either prediction sensu stricto or early detection (i.e., as early diagnostic tools), we will mainly focus on the ability of these tools to early detect patients with sepsis.

Brief Introduction on Prediction of Clinical Events Through Machine Learning Models and the Need for a Multidisciplinary Approach
Through machine learning algorithms, computers are conferred the ability to learn from data (18,19). Conceptually, machine learning, which is a branch of artificial intelligence, is different from standard computer expert systems for helping clinicians in daily decision making (20). Indeed, the latter are explicitly programmed to perform a given task, whereas machine learning algorithms are more generally programmed to find out associations. For example, in supervised learning, an outcome (e.g., sepsis), is predicted through calculations starting from input features (e.g., patient demographic and clinical characteristics). Notably, the outcome and the input of machine learning algorithms may also represent the dependent and independent variables of classical statistical predictive models. Not surprisingly, there is an important conceptual overlap between classical statistics and machine learning techniques (21). A difference in their continuum may rely on the fact that some machine learning algorithms could be able to find out composite features not easy or impossible to be defined by humans. In turn, this may improve the accuracy of prediction (22). However, this also fuels the issue of the interpretability of the model, with computations that may become less transparent and not completely explainable once results are produced (23). In turn, this may hamper recognition of biases, thereby leading to additional ethical and legal implications connected to the use of machine learning techniques in healthcare (24)(25)(26).
In addition, it should be also taken into consideration that the field of machine learning is closely connected with the concept of "big data, " since, at least in general, the possibility of including complex features in prediction models requires far larger samples than those usually employed in classical prediction studies (22,27). Therefore, the possible future availability of large datasets of medical data from electronic medical records (EMR), laboratory databases, and vital signs monitors will inevitably require a multidisciplinary approach. This would guarantee standardized extraction of data, data security, interpretability or sufficient explanation of employed machine learning models, extrapolation of useful outputs from a clinical perspective, and full compliance to all the most updated ethical and legal laws and regulations. This will also apply to the prediction of sepsis, both for research studies and for real-time prediction at the bedside in the daily clinical practice.

The Controversy of Sepsis Definition and Its Influence on the Development of Prediction Models
According to the recent Sepsis-3 criteria, sepsis is formally defined as an acute increase of ≥2 points in the Sequential Organ Failure Assessment (SOFA) score, consequent to a suspected or proven infection (1). Increases in the SOFA score, detected through monitoring of laboratory and clinical parameters, reflect possible impairments in cardiovascular, respiratory, renal, hepatic, coagulation, and neurological systems (28). This is in line with the novel definition of sepsis as a life-threatening and progressive organ dysfunction. Previously, sepsis was defined as the presence of a systemic inflammatory response syndrome (SIRS) caused by a proven or suspected infection and based on laboratory and clinical parameters (white blood cell count, respiratory rate, heart rate, and body temperature) that could be altered before manifestation of sepsis-related organ dysfunction (29). However, this previous, more general definition, lacked specificity, since SIRS may be present in several non-infectious conditions. On the other hand, it has very high sensitivity because of the large denominator of patients with SIRS (although some patients with organ dysfunction due to infection may not present with SIRS, and a few cases may still be missed) (30). Overall, the novel definition offers a better performance than the previous one for identifying septic patients at higher risk of mortality in intensive care units (ICU) (1). However, a better prediction of mortality does not inherently reflect the best timing for intervention. For example, it still remains unclear the optimal timing for starting antimicrobial therapy. More specifically, while it is intuitive to initiate antimicrobials if sepsis is first recognized at the time of organ dysfunction (according to the novel sepsis definition), what if the patient is already suspected to have sepsis before developing any organ dysfunction (according to previous sepsis definitions)? Can we wait, or should we initiate antimicrobials immediately? Notably, there is not a high-level evidence-based response to this question yet, and the choice is usually made upon clinical judgment at the patient's bedside, on a case-by-case basis.
However, this is hardly reproducible for defining what should be predicted by both classical statistics and machine learning models to support clinicians in the administration of antimicrobials at the best time to reduce mortality of sepsis, and, at the same time, without administering antimicrobials indiscriminately to all patients with SIRS but no sepsis, in line with antimicrobial stewardship principles. Prediction of sepsis through machine learning techniques is not exempt from this unresolved controversy, as reflected by the various definitions of sepsis employed in the different models available in the literature (see Table 1) (85). Against this backdrop, it is worth noting that the possible solution of using expert physicians' judgement for labeling sepsis cases as gold standard for model training may not resolve the issue because of suboptimal agreement across physicians (86). A reasonable alternative strategy explored by some authors may be the use of unsupervised machine learning techniques (i.e., by recognizing patterns in the data without a labeled outcome as in supervised learning) for the identification of novel phenotypes of sepsis based on clinical and laboratory values (77,87,88). This could help in the identification of specific subgroups of patients to be included in dedicated studies (preferably randomized clinical trials) to assess the impact of early antimicrobial therapy (77,89,90). Furthermore, this would allow to use the trial outcome as a measure of a posteriori accuracy of sepsis classification based on the phenotypes identified with machine learning techniques.

The Choice and Availability of Input Features
The increasing use of EMR implies an immediate availability of an electronic form of relevant clinical and laboratory data, that support the development and real-time use of right-aligned predictive models for the early detection of sepsis (91)(92)(93). Indeed, these models may continuously update their prediction of sepsis by utilizing the unceasing stream of electronic data from the EMR and/or vital signs monitors (57,74,85,92,94). A good predictive model may thus allow one to correctly classify in real time a true case of sepsis in the controversial gray area in between the previous and novel sepsis definitions. Amongst others, two important questions need to be addressed: (i) which and what minimum number of input features should be used for developing a good predictive model for the early detection of sepsis? (ii) once a good predictive model is developed, would an early antimicrobial treatment improve the prognosis of these patients?
Intuitively, there can be no answer to the latter question without first developing a good predictive model. Therefore, the appropriate selection of input features remains paramount. In this regard, it is still unclear which between a parsimonious (even of a few vital and/or laboratory parameters) and an expanded (for example, considering also information from unstructured physicians' free text in EMR notes) selection of input features could be preferable, and where precisely is any possible desirable middle in between these two extremes. In the literature, there are encouraging experiences with the use of a few vital and/or laboratory parameters for the prediction of severe sepsis according to previous definitions, with also a positive impact on survival having been registered in a small but randomized single center clinical trial and in some observational studies (44,(95)(96)(97). However, the possibility of employing input features relying on patients' clinical data and medical histories remains attractive if only for attempting a concomitant prediction of etiology (e.g., of multidrug-resistant causative agent based on a combination of previous microbiological isolates and risk factors for multidrugresistant infections, such as previous antibiotic use), that may impact appropriateness of empirical antimicrobial therapy while waiting for blood cultures results (21). Notably, this paves the way to some important issues regarding the automated extraction of unstructured data from EMR. Examples are the need for standardization of extraction, sufficient accuracy of automated extraction, internal/external validation, and continuous recalibration of automated extraction over time. Furthermore, besides these technical aspects, the presence of missing data [which is inherent in EMR, since they are not an instrument designed for research purposes (50)] and their potential impact on the uncertainty of prediction should be properly handled.
It should also be considered that a different amount of information is expected between ICU patients (more closely monitored clinically and through laboratory tests) and patients in other wards. Therefore, besides inherent differences in their characteristics, also the different amount of available data may influence both the selection of the best set of features and the predictive performance of models developed for ICU patients and models developed for other populations (e.g., to be used in the emergency department), that may be not interchangeable. Finally, the choice of the input features could also be dictated by the local availability of the necessary infrastructures. For example, lack of validation or ability to properly extract clinical data from the local EMR may only allow the use of laboratory values for building predictive models (manual extraction of additional clinical data from EMR may be unfeasible or extremely timeconsuming if large samples are concerned).
ICU patients 40,336 Vital signs, laboratory data, demographics DT (6) , NB (1) , SVM (3) , Ensemble learners (5) Sepsis labeled according to Sepsis-3 criteria ( Frontiers in Medicine | www.frontiersin.org the portion of the curve that maximizes sensitivity (in order not to miss true cases and not to delay a potentially life-saving antimicrobial treatment). This would allow, once a low risk of missing true cases is determined, to compare models in terms of specificity (aiming at reducing the number of false positive cases treated with antimicrobials, thereby reducing useless toxicity and resistance selection). In any case, the definitive proof of usefulness of any classification model should be provided by randomized clinical trials comparing sepsis management based on its use vs. standard identification of sepsis with respect to a clinically relevant primary endpoint (e.g., short-term mortality). Finally, it should be kept in mind that the present perspective was aimed to highlight some issues that may hamper the comparability and extrapolation of results of available models, and not to primarily assess their performance, which is not reported in detail. Nonetheless, we think it may be of interest to highlight the wide heterogeneity of model results. For example, the best AUROC for the prediction of sepsis ranged from 0.68 to 0.99 in a recent systematic review, whereas sensitivity and specificity were inconsistently reported to allow proper comparison (92). Another important aspect is the output of the model. While sepsis is the outcome (or dependent variable), the output of the model may be different according to the type of model (for a brief summary of the technical characteristics of different ML models employed in available studies of sepsis prediction, see Supplementary Figure 1) (98). For example, the output of logistic regression (but also, for example, of a sigmoid functionbased output layer a of neural network), intended for a single patient, is his/her probability of experiencing the outcome (in this case having sepsis or not), based on his/her input features and the calculations of the trained predictive algorithm. Let's say, for example, that the algorithm calculates for a novel given patient, based on their clinical and laboratory data, a 40% probability of having sepsis. In terms of usefulness, this probability provided to doctors may be a clinically understandable value that can be easily weighted in the balance when deciding whether or not to administer antimicrobials, possibly also improving acceptance of the implementation of machine learning-based sepsis alerts in daily practice (99). However, clinicians should also be aware of the limitations of the model, in order to further improve an appropriate understanding and use of the output. In this regard, there are no standardized directives on the optimal way to present model pitfalls to clinicians together with the model output, although "model facts" labels have started to be proposed (100).

LIMITATIONS AND CONCLUSIONS
Given the narrative (and not systematic) nature of this brief narrative perspective, some original works may have not been included, a fact that, together with the lack of an in-depth description of computational aspects of the different algorithms, may represent major limitations of the present paper. Nonetheless, our intention was to provide a brief perspective for clinicians by addressing some topics of clinical interest concerning the application of machine learning techniques for the early detection of sepsis in the daily clinical practice. For this reason, we feel an extended focus on technical aspects would have been beyond the scope of the present manuscript.
In conclusion, the use of predictive tools based on machine learning may support medical decision-making by providing novel elements to improve the correct and early identification of patients with sepsis. Although at the present time it cannot be said yet whether or not this will ultimately improve patient survival and relevant antimicrobial stewardship outcomes, the increasing involvement of artificial intelligence and machine learning in health care cannot be ignored. In the long run, a rigorous multidisciplinary approach to enrich our understanding of the application of machine learning techniques to the early recognition of sepsis may be worth the trip and truly augment medical decision-making when facing this heterogeneous and complex syndrome.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
DG made substantial contributions to the study concept and design, first drafting of the manuscript, and critical revision of the manuscript for important intellectual content. AS, MG, PP, SM, LC, CR, LB, and MB made substantial contributions to the study concept and design and critical revision of the manuscript for important intellectual content. FD, AV, and FB made substantial contributions to the literature review and critical revision of the manuscript for important intellectual content. All authors read and approved the final manuscript.