- 1Department of Emergency, Affiliated Binhai Hospital, Kangda College of Nanjing Medical University, Yancheng, Jiangsu, China
- 2Department of Neurosurgery, Affiliated Binhai Hospital, Kangda College of Nanjing Medical University, Yancheng, Jiangsu, China
1 Ethical and operational implications
The authors note a 17% reduction in false positives, yet the clinical consequences of residual false alarms, a persistent challenge in AI deployment, warrant further analysis. False positives can increase clinician cognitive load, contribute to alarm fatigue, and potentially erode patient trust when interventions are triggered unnecessarily. For example, AI models may misclassify non-infectious systemic inflammatory response syndrome (SIRS) as early sepsis, prompting unnecessary antibiotic administration or invasive monitoring, thereby straining resources and exposing patients to avoidable risks (2, 3). In intensive care units (ICUs), they may also lead to inefficient resource allocation, including unwarranted antibiotic use or invasive monitoring. Existing frameworks such as the Ethics Guidelines for Trustworthy AI (European Commission, 2019) (4) or The AMA’s Policy on Augmented Intelligence in Health Care (2019) (5) provide valuable reference points. These emphasize transparency, accountability, and shared decision-making, which could be embedded into AI deployment strategies to mitigate these risks.
2 Interpretability of LDA-derived topics
The application of LDA for topic modeling in unstructured notes is innovative; however, explainability remains a key barrier to clinical adoption. While the authors use LDA to extract latent topics from clinical narratives, the study lacks explicit detail on whether these topics were validated or manually labeled to confirm alignment with known clinical indicators of sepsis. This is a critical gap, as the interpretability of extracted topics determines clinician trust and model transparency. For instance, clinical variables such as rising heart rate (tachycardia), respiratory rate (tachypnea), altered mental status, or low blood pressure are often early signs of sepsis. If an AI model highlights these parameters—or their semantic equivalents in unstructured notes—as predictive features, clinicians are more likely to trust and act on the output.
Conversely, if the model relies on opaque or non-clinical latent topics, adoption may be hindered (6). For example, topics dominated by administrative language such as “insurance documentation”, “bed transfer”, or “discharge planning” could appear predictive due to correlations in the training dataset, yet lack direct physiological relevance to sepsis (7). If such opaque topics were emphasized without clinician oversight, they could undermine trust, leading physicians to discount the algorithm’s recommendations (8–11). Enhancing transparency through topic coherence scores, clinical expert annotation, and mapping extracted features to established ontologies like SNOMED-CT would help translate complex AI decisions into actionable insights. Future work should prioritize post hoc labeling of LDA topics and associate them explicitly with sepsis-relevant pathophysiological constructs to bridge the gap between machine reasoning and clinical intuition.
3 External validation in diverse healthcare settings is essential
The study’s single-center design limits generalizability, as variations in EMR systems, documentation practices, and patient populations across institutions could affect model performance. External validation in diverse healthcare settings is essential. Additionally, while the algorithm’s accuracy is impressive, its “black-box” nature may hinder clinical adoption. Explainability, such as identifying which topics or variables drive predictions, could bridge this gap. Ethical and operational challenges, such as false positive management and liability concerns, also warrant attention. For instance, while the algorithm reduces false positives by 17%, their impact on resource utilization remains unexplored. Finally, comparative benchmarking against other AI sepsis tools, such as Sun B et al.’s prediction of sepsis model (12), would clarify SERA’s unique contributions.
Future research should build on this work by exploring multimodal data integration, like real-time vitals and wearable devices, to further refine predictions, particularly in the critical 4–6 hours window before sepsis onset (13, 14). To support cross-institutional validation and protect patient privacy, federated learning frameworks, such as FedAvg or SplitNN, can be employed to train decentralized models across institutions without transferring raw patient data. Dynamic risk stratification, where predictions update in real-time with new clinical data, could enhance responsiveness. Additionally, adapting the model for low-resource settings, where sepsis burden is high but EMR infrastructure is limited, would broaden its global impact (15), lightweight NLP models such as DistilBERT, MobileBERT, or TinyBERT can be adapted for local deployment, offering efficient language processing with reduced computational overhead (16–18). These models can extract sepsis-relevant clinical patterns from brief physician notes or basic triage descriptions. Additionally, real-time streaming pipelines using platforms like Apache Kafka, Apache Flink, or TensorFlow Serving can facilitate continuous data ingestion and model updates, enabling near-instantaneous risk recalibration. By adopting these scalable, efficient strategies, future iterations of the SERA algorithm may achieve broader utility across diverse healthcare ecosystems, including resource-limited settings and distributed hospital networks.
4 Causal relevance and target trial emulation
While the SERA algorithm demonstrates strong predictive performance, whether early identification of sepsis causally improves clinical outcomes remains an open question. Most existing models, including SERA, are evaluated using retrospective association metrics such as AUC, sensitivity, and specificity, which do not guarantee that earlier prediction will lead to better patient outcomes. To address this, emerging frameworks such as target trial emulation offer a promising approach to infer causal relationships from observational data. Specifically, retrospective ICU datasets containing timestamped interventions such as antibiotic initiation, fluid resuscitation, or admission to the ICU could be used to emulate randomized controlled trials (19–21). Patients with comparable baseline risk profiles could be contrasted based on whether they received earlier intervention following AI-based alerts versus standard care (22). This methodology simulates randomized controlled trials using routinely collected clinical data, providing more robust evidence of effectiveness (23). Moreover, causal inference methods such as inverse probability weighting (24, 25), g-computation (26), and marginal structural models (27) may further support estimation of the effect of early prediction on sepsis-related morbidity and mortality. Applying these tools to intervention-timestamped ICU data would allow researchers to evaluate whether timely alerts from the SERA algorithm result in earlier antibiotic administration, fluid management, or escalation of care, and ultimately improve survival and reduce complications (1, 28–30). Future research incorporating these methods is essential to bridge the gap between prediction and clinical impact.
5 Conclusion
Goh et al. have developed a sophisticated AI tool that addresses a critical unmet need in sepsis care. Their work highlights the untapped potential of unstructured clinical data and sets a new benchmark for early sepsis prediction. Nevertheless, the reliability and equity of deployment will hinge on explicit management of training-data quality, documentation heterogeneity, and clinical bias—with performance and calibration reported across pediatric, geriatric, and non-English documentation cohorts. Coupling these safeguards with rigorous external validation, enhanced model interpretability, and deeper consideration of ethical and operational challenges. By incorporating domain-specific explainability, ethical safeguards, and practical deployment strategies, along with causal validation frameworks such as target trial emulation, future research can transform SERA into a scalable, dynamic, and globally applicable AI solutions for sepsis and other time-sensitive conditions.
Author contributions
AH: Project administration, Visualization, Funding acquisition, Validation, Resources, Data curation, Formal analysis, Conceptualization, Supervision, Writing – review & editing, Methodology, Writing – original draft, Software, Investigation. LX: Supervision, Conceptualization, Project administration, Validation, Writing – review & editing, Software, Funding acquisition, Data curation, Writing – original draft, Methodology, Formal analysis, Resources, Investigation, Visualization. SY: Methodology, Data curation, Writing – original draft, Investigation, Validation, Project administration, Resources, Software, Funding acquisition, Supervision, Formal analysis, Writing – review & editing, Visualization, Conceptualization.
Funding
The author(s) declare financial support was received for the research and/or publication of this article. This study was sponsored by the Yancheng Science and Technology Bureau (YCBE202365), the Jiangsu Vocational College of Medicine’s School-Local Collaborative Innovation Research Project (202491001).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1. Goh KH, Wang L, Yeow AYK, Poh H, Li K, Yeow JJL, et al. Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare. Nat Commun. (2021) 12:711. doi: 10.1038/s41467-021-20910-4
2. Bae EY, Smith TT, and Monogue ML. A case-control study evaluating the unnecessary use of intravenous broad-spectrum antibiotics in presumed sepsis and septic-shock patients in the emergency department. Antimicrob Steward Healthc Epidemiol. (2022) 2:e193. doi: 10.1017/ash.2022.341
3. Dykes LA, Heintz SJ, Heintz BH, Livorsi DJ, Egge JA, Lund BC, et al. Contrasting qSOFA and SIRS criteria for early sepsis identification in a veteran population. Fed Pract. (2019) 36:S21–s24.
4. Ryan M. In AI We Trust: ethics, artificial intelligence, and reliability. Sci Eng Ethics. (2020) 26(5):2749–67. doi: 10.1007/s11948-020-00228-y
5. Crigger E and Khoury C. Making policy on augmented intelligence in health care. AMA J Ethics. (2019) 21:188–91. doi: 10.1001/amajethics.2019.188
6. Wong A, Otles E, Donnelly JP, Krumm A, McCullough J, DeTroyer-Cooley O, et al. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern Med. (2021) 181:1065–70. doi: 10.1001/jamainternmed.2021.2626
7. Weissman GE, Hubbard RA, Himes BE, Goodman-O'Leary KL, Harhay MO, Ginestra JC, et al. Sepsis prediction models are trained on labels that diverge from clinician-recommended treatment times. AMIA Annu Symp Proc. (2024) 2024:1215–24.
8. Davis SE, Matheny ME, Balu S, and Sendak MP. A framework for understanding label leakage in machine learning for health care. J Am Med Inform Assoc. (2023) 31:274–80. doi: 10.1093/jamia/ocad178
9. Arnold CW, Oh A, Chen S, and Speier W. Evaluating topic model interpretability from a primary care physician perspective. Comput Methods Programs BioMed. (2016) 124:67–75. doi: 10.1016/j.cmpb.2015.10.014
10. Ramon-Gonen R, Dori A, and Shelly S. Towards a practical use of text mining approaches in electrodiagnostic data. Sci Rep. (2023) 13:19483. doi: 10.1038/s41598-023-45758-0
11. Brosula R, Corbin CK, and Chen JH. Pathophysiological features in electronic medical records sustain model performance under temporal dataset shift. AMIA Jt Summits Transl Sci Proc. (2024) 2024:95–104.
12. Sun B, Lei M, Wang L, Wang X, Li X, Mao Z, et al. Prediction of sepsis among patients with major trauma using artificial intelligence: a multicenter validated cohort study. Int J Surg. (2025) 111:467–80. doi: 10.1097/JS9.0000000000001866
13. Sadasivuni S, Saha M, Bhatia N, Banerjee I, and Sanyal A. Fusion of fully integrated analog machine learning classifier with electronic medical records for real-time prediction of sepsis onset. Sci Rep. (2022) 12:5711. doi: 10.1038/s41598-022-09712-w
14. Ghiasi S, Zhu T, Lu P, Hagenah J, Khanh PNQ, Hao NV, et al. Sepsis mortality prediction using wearable monitoring in low-middle income countries. Sensors (Basel). (2022) 22:3866. doi: 10.3390/s22103866
15. Zharima C, Griffiths F, and Goudge J. Exploring the barriers and facilitators to implementing electronic health records in a middle-income country: a qualitative study from South Africa. Front Digit Health. (2023) 5:1207602. doi: 10.3389/fdgth.2023.1207602
16. Silva Barbon R and Akabane AT. Towards transfer learning techniques-BERT, DistilBERT, BERTimbau, and DistilBERTimbau for automatic text classification from different languages: A case study. Sensors (Basel). (2022) 22:8184. doi: 10.3390/s22218184
17. Majid I, Mishra V, Ravindranath R, and Wang SY. Evaluating the performance of large language models for named entity recognition in ophthalmology clinical free-text notes. AMIA Annu Symp Proc. (2024) 2024:778–87.
18. Bologna F, Thalken R, Pepin K, and Wilkens M. Endometriosis communities on reddit: quantitative analysis. J Med Internet Res. (2025) 27:e57987. doi: 10.2196/57987
19. Li H, Zang C, Xu Z, Pan W, Rajendran S, Chen Y, et al. Federated target trial emulation using distributed observational data for treatment effect estimation. NPJ Digit Med. (2025) 8:387. doi: 10.1038/s41746-025-01803-y
20. White KC, Costa-Pinto R, Blank S, Whebell S, Quick L, Luke S, et al. Effect of early adjunctive vasopressin initiation for septic shock patients: a target trial emulation. Crit Care. (2025) 29:188. doi: 10.1186/s13054-025-05401-y
21. Liu R, Hunold KM, Caterino JM, and Zhang P. Estimating treatment effects for time-to-treatment antibiotic stewardship in sepsis. Nat Mach Intell. (2023) 5:421–31. doi: 10.1038/s42256-023-00638-0
22. Pak TR, Young J, McKenna CS, Agan A, DelloStritto L, Filbin MR, et al. Risk of misleading conclusions in observational studies of time-to-antibiotics and mortality in suspected sepsis. Clin Infect Dis. (2023) 77:1534–43. doi: 10.1093/cid/ciad450
23. Porcellato E, Lanera C, Ocagli H, and Danielis M. Exploring applications of artificial intelligence in critical care nursing: a systematic review. Nursing reports (Pavia, Italy), (2025) 15(2):55. doi: 10.3390/nursrep15020055
24. Shen C, Li X, Li L, and Were MC. Sensitivity analysis for causal inference using inverse probability weighting. Biom J. (2011) 53:822–37. doi: 10.1002/bimj.201100042
25. Syriopoulou E, Rutherford MJ, and Lambert PC. Inverse probability weighting and doubly robust standardization in the relative survival framework. Stat Med. (2021) 40:6069–92. doi: 10.1002/sim.9171
26. Tchetgen Tchetgen EJ, Fulcher IR, and Shpitser I. Auto-G-computation of causal effects on a network. J Am Stat Assoc. (2021) 116:833–44. doi: 10.1080/01621459.2020.1811098
27. Rodriguez Duque D, Stephens DA, Moodie EEM, and Klein MB. Semiparametric Bayesian inference for optimal dynamic treatment regimes via dynamic marginal structural models. Biostatistics. (2023) 24:708–27. doi: 10.1093/biostatistics/kxac007
28. Hayden GE, Tuuri RE, Scott R, Losek JD, Blackshaw AM, Schoenling AJ, et al. Triage sepsis alert and sepsis protocol lower times to fluids and antibiotics in the ED. Am J Emerg Med. (2016) 34:1–9. doi: 10.1016/j.ajem.2015.08.039
29. Henry KE, Adams R, Parent C, Soleimani H, Sridharan A, Johnson L, et al. Factors driving provider adoption of the TREWS machine learning-based early warning system and its effects on sepsis treatment timing. Nat Med. (2022) 28:1447–54. doi: 10.1038/s41591-022-01895-z
Keywords: artificial intelligence, sepsis, early prediction, diagnosis, sepsis early risk assessment
Citation: He A, Xu L and Yang S (2025) Opinion on “Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare”. Front. Immunol. 16:1629766. doi: 10.3389/fimmu.2025.1629766
Received: 16 May 2025; Accepted: 26 August 2025;
Published: 15 September 2025.
Edited by:
Simon Mitchell, Brighton and Sussex Medical School, United KingdomReviewed by:
Zhongheng Zhang, Sir Run Run Shaw Hospital, ChinaQinghe Meng, Upstate Medical University, United States
Binggang Liu, The Central Hospital of Yongzhou, China
Copyright © 2025 He, Xu and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Shengkai Yang, MTczNTIzNjE3MzZAMTYzLmNvbQ==
†These authors have contributed equally to this work
Leiming Xu1†