AUTHOR=Lehmann Sebastian , Vychopen Martin , Güresir Erdem , Wach Johannes TITLE=GPT-based prediction of short-term survival following decompressive hemicraniectomy in malignant middle cerebral artery infarction JOURNAL=Frontiers in Neurology VOLUME=Volume 16 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/neurology/articles/10.3389/fneur.2025.1603536 DOI=10.3389/fneur.2025.1603536 ISSN=1664-2295 ABSTRACT=IntroductionAn analysis of the prognostic ability of the large language model (LLM) Generative Pre-trained Transformer (GPT) to predict short-term survival and functional outcomes in patients with malignant middle cerebral artery (MCA) infarction following decompressive hemicraniectomy.MethodsThis retrospective study included 100 patients with malignant MCA infarction who underwent decompressive craniectomy (DC). GPT-4 and GPT-4 Omni were used to predict patient outcomes based on 20 patient-specific factors. Each version of GPT was tested with and without context enrichment (CE). CE versions were provided with the current AHA/ASA 2019 guidelines and meta-analyses of RCTs to inform decision-making. The real-life outcome of the patients, measured by the modified Rankin Scale (mRS), served as a reference. The following endpoints were evaluated: survival during inpatient stay, achievement of a functional status of mRS 0–4 at discharge, and at 3-, 6-, and 12-months post-discharge. We analyzed the prognostic prediction of GPT by calculating the area under the curve (AUC) and determining the optimal cutoff using the Youden index for divergent prediction outcomes. After dichotomization according to the cutoff set, a chi-squared test (two-sided) was performed.ResultsGPT-4 and GPT-4 Omni demonstrated the ability to estimate survival during in-hospital stay. In both versions, the CE GPT outperformed the non-CE versions. GPT-4 Omni (CE) achieved an AUC of 0.67 (95% CI: 0.54–0.79; p = 0.002), while GPT-4 (CE) reached an AUC of 0.70 (95% CI: 0.57–0.82; p = 0.018). GPT-4 also achieved statistical significance even without CE (AUC of 0.66; 95% CI: 0.53–0.78; p = 0.018). In contrast, the non-CE version of GPT-4 Omni did not reach significance in predicting the survival of hospitalization (AUC of 0.60; 95% CI: 0.48–0.73; p = 0.07). For questions regarding the functional outcome of patients, neither version of GPT was able to make a sufficient prognostic prediction. However, when provided with the pre-stroke mRS, GPT-4 Omni was able to predict the mRS at discharge (p = 0.01; Pearson's correlation coefficient = 0.696).ConclusionThe study shows the already existing high potential of AI in predicting short-term outcomes. It also shows the existing limitations for the evaluation of more complex questions, such as functional outcomes.