OPINION article

Front. Surg., 25 February 2025

Sec. Orthopedic Surgery

Volume 12 - 2025 | https://doi.org/10.3389/fsurg.2025.1524396

ChatGPT4o's theranostic performance in the management of thoracolumbar spine fractures

  • 1. Department of Orthopedics, Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, Sichuan, China

  • 2. The First Affiliated Hospital of Shihezi University, Shihezi, Xinjiang Uyghur, China

Article metrics

View details

1,3k

Views

568

Downloads

Introduction

ChatGPT, developed by OpenAI (https://chat.openai.com), is a publicly accessible tool that utilizes advanced machine learning algorithms to process and analyze extensive data, generating responses to user inquiries. On May 13, 2024, OpenAI launched the ChatGPT4o model, which, according to information on the OpenAI website, represents the latest, fastest, and most advanced version. This model supports a context length of up to 128k tokens (equivalent to the length of a long novel) and offers multimodal capabilities, including text and image inputs, as well as text, image, and audio outputs (https://help.openai.com). While numerous studies have explored ChatGPT's potential applications and challenges in the biomedical field (1, 2), limited research has been conducted on the specific capabilities of ChatGPT4o in the medical domain. A REVIEW article (3) published in Frontiers in Surgery mentions that ChatGPT lacks sufficient expertise and background understanding in specialized fields. However, the application of ChatGPT4o may have the potential to change this situation. To validate this model, we investigate the theranostic performance of ChatGPT4o in managing thoracolumbar spine fractures to assess its potential effectiveness and applications in clinical practice.

Method

For our evaluation, we formulated 38 clinical questions based on the diagnostic, treatment, and management guidelines for thoracolumbar fractures established by the Congress of Neurological Surgeons (CNS) (414) and the Chinese Medical Association (CMA) (15). We input all 38 questions into ChatGPT-4o (OpenAI, accessed November 3, 2024) without providing additional context or guidelines. Each question was posed once, and the initial generated response was recorded. To minimize variability, no iterative refinement of prompts was performed. The responses were anonymized and compiled in Supplementary Material S1. Each response was subsequently reviewed by three independent spine surgery experts, who evaluated the responses according to both the established guidelines and their own clinical experience. Each expert used a five-point Likert scale to rate the responses: (1) indicating completely incorrect; (2) more incorrect than correct; (3) an equal mix of correct and incorrect; (4) more correct than incorrect; and (5) completely correct. The median score from the three experts was used as the final rating to minimize bias.

Result

When ChatGPT4o was presented with “yes or no” questions, it typically responded with comprehensive diagnostic criteria and therapeutic principles rather than a simple “yes” or “no.” According to our results (Table 1), 0 responses (0%) received a score of 1, 1 response (2.63%) received a score of 2, 1 response (2.63%) scored a 3, 8 responses (21.05%) scored a 4, and 28 responses (73.68%) scored a 5. Approximately 94.7% of the responses were largely or entirely accurate.

Table 1

QuestionsFive-point Likert scoresa
Expert 1Expert 2Expert 3Median
1. Which patients need to consider combined thoracolumbar spinal cord injury?5545
2. How to immobilize, transport, and transfer patients with suspected thoracolumbar spinal cord injury?5555
3. How to assess the degree of neurological injury in patients with acute thoracolumbar spinal cord injury?5545
4. How to perform radiological assessment for patients with suspected acute thoracolumbar spinal cord injury?4444
5. How to assess the morphology of injury in patients with acute thoracolumbar spinal cord injury?4444
6. Is the use of high-dose corticosteroids and gangliosides recommended for the treatment of spinal cord injury?5555
7. What are the indications for the treatment selection in patients with acute thoracolumbar spinal cord injury?5555
8. What are the recommended conservative treatment methods for thoracolumbar spinal cord injury?5555
9. What is the timing for surgery in patients with acute thoracolumbar spinal cord injury?4544
10. How to choose the surgical approach for patients with acute thoracolumbar spinal cord injury?5545
11. When is laminectomy necessary for patients with acute thoracolumbar spinal cord injury?5555
12. How to select the fixation segment for posterior surgery in patients with acute thoracolumbar spinal cord injury?5555
13. Is it necessary to fix the injured vertebrae posteriorly in patients with acute thoracolumbar spinal cord injury?4544
14. What are the indications for percutaneous internal fixation in patients with acute thoracolumbar spinal cord injury?5555
15. Is it necessary to perform bone graft fusion for patients with acute thoracolumbar spinal cord injury who undergo surgery?5455
16. Is simple pedicle-based bone grafting effective in patients with acute thoracolumbar spinal cord injury?5555
17. How to manage the urinary system in patients with acute thoracolumbar spinal cord injury?5555
18. How to prevent and treat deep vein thrombosis in patients with acute thoracolumbar spinal cord injury?5555
19. Is it necessary to prevent and treat pressure sores in patients with acute thoracolumbar spinal cord injury?5555
20. How to manage neurogenic bowel in patients with acute thoracolumbar spinal cord injury?5555
21. Does early surgical intervention improve outcomes for patients with thoracic and lumbar fractures?4544
22. Does the choice of surgical approach (anterior, posterior, or combined anterior-posterior) improve clinical outcomes in patients with thoracic and lumbar fractures?2222
23. Are there radiographic findings in patients with traumatic thoracolumbar fractures that can predict the need for surgical intervention?5555
24. Are there radiographic findings in patients with traumatic thoracolumbar fractures that can assist in predicting clinical outcomes?3433
25. Does routine screening for DVT prevent PE (or VTE-associated morbidity and mortality) in patients with thoracic and lumbar fractures?5555
26. For patients with thoracic and lumbar fractures, is one regimen of VTE prophylaxis superior to others with respect to prevention of PE (or VTE-associated morbidity and mortality)?5555
27. Is there a specific treatment regimen for documented VTE that provides fewer complications than other treatments in patients with thoracic and lumbar fractures?4444
28. Does the administration of a specific pharmacologic agent (e.g., methylprednisolone) improve clinical outcomes in patients with thoracic and lumbar fractures and spinal cord injury?5555
29. Does the surgical treatment of burst fractures of the thoracic and lumbar spine improve clinical outcomes compared to nonoperative treatment?4555
30. Does the surgical treatment of nonburst fractures of the thoracic and lumbar spine improve clinical outcomes compared to nonoperative treatment?4544
31. Does the addition of arthrodesis to instrumented fixation improve outcomes in patients with thoracic and lumbar burst fractures?5555
32. How does the use of minimally invasive techniques (including percutaneous instrumentation) affect outcomes in patients undergoing surgery for thoracic and lumbar fractures compared to conventional open techniques?5555
33. Does the use of external bracing improve outcomes in the nonoperative treatment of neurologically intact patients with thoracic and lumbar burst fractures?5555
34. Which neurological assessment tools have demonstrated internal reliability and validity in the management of patients with thoracic and lumbar fractures (i.e., do these instruments provide consistent information between different care providers)?4544
35. Are there any clinical findings (e.g., presenting neurological grade/function) in patients with thoracic and lumbar fractures that can assist in predicting clinical outcomes?5555
36. Does the active maintenance of arterial blood pressure after injury affect clinical outcomes in patients with thoracic and lumbar fractures?5555
37. Are there classification systems for fractures of the thoracolumbar spine that have been shown to be internally valid and reliable (i.e., do these instruments provide consistent information between different care providers)?5455
38. In treating patients with thoracolumbar fractures, does employing a formally tested classification system for treatment decision-making affect clinical outcomes?5555

Five-point Likert scores for responses from inquires posed to chat-GPT4o.

DVT, deep vein thrombosis; PE, pulmonary embolism.

a

Five-point Likert score system: 1 means completely incorrect; 2 means more incorrect than correct; 3 means equally incorrect and correct; 4 means more correct than incorrect; 5 means completely correct.

Discussion

When asked, “Does the choice of surgical approach (anterior, posterior, or combined anterior-posterior) improve clinical outcomes in patients with thoracic and lumbar fractures?”, ChatGPT4o provided an affirmative answer along with detailed explanations. However, according to CNS guidelines, for patients with burst fractures of the thoracolumbar spine, surgeons may use an anterior, posterior, or combined approach, as the choice of approach does not significantly affect clinical or neurological outcomes, a Grade B recommendation. Although ChatGPT4o provided a detailed explanation of the indications for each approach, the experts noted that while the response was generally accurate, the final conclusion was not entirely consistent with guideline recommendations. Furthermore, while ChatGPT4o appears capable of conducting targeted searches on open websites, its “independent reasoning” abilities require further refinement.

In summary, ChatGPT4o demonstrates promising performance in diagnosing and treating thoracolumbar trauma. Its ability to search open websites and provide detailed responses could be a useful reference for clinical practitioners. However, ChatGPT4o does not consistently provide fully accurate answers, particularly with “yes or no” questions. Its dependence on specific sources for data retrieval may introduce biases that limit its broader application in the field of spine surgery. ChatGPT requires substantial medical data for further training to enhance model performance. Moreover, given the specific ethical considerations in medicine, ChatGPT4o's use in clinical settings must ensure patient safety, data privacy, ethical standards, and adherence to relevant “AI regulations”. Although ChatGPT4o's responses may improve clinical efficiency, it should only serve as a clinical assistant, with spine surgeons validating the accuracy of its information.

This study has several methodological limitations: firstly, the lack of comparative analyses with established AI systems (e.g., Google Med-PaLM, IBM Watson) or traditional decision-support tools hinders definitive performance benchmarking; secondly, simulated testing environments may overestimate system efficacy, as diagnostic performance degradation in real-world clinical settings requires urgent empirical validation; finally, the rapid evolution of AI technology necessitates dynamically updated training databases and ethical evaluation frameworks. To address these gaps, subsequent research will incorporate the Partial Credit Model (PCM) and Item Response Theory (IRT) through latent trait modeling, systematically quantifying AI response difficulty levels, refining multidimensional scoring criteria, and strengthening clinical applicability assessments to establish a psychometrically-based evaluation framework. This methodological advancement will enhance the granular understanding of AI's role in complex medical decision-making (e.g., surgical approach selection, prognostic stratification). Future research priorities include: (1) comparative effectiveness studies across AI systems, (2) real-world clinical validation of performance, and (3) development of specialty-specific human-AI collaboration guidelines to systematically improve the clinical utility of intelligent assistive tools in spinal surgery.

Statements

Author contributions

XJ: Investigation, Methodology, Writing – original draft. LM: Supervision, Writing – review & editing. YY: Data curation, Validation, Writing – original draft. YD: Data curation, Investigation, Writing – original draft. CS: Methodology, Validation, Writing – original draft. KZ: Data curation, Investigation, Methodology, Writing – original draft. YL: Data curation, Investigation, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that Generative AI was used in the creation of this manuscript. Provided answers to guideline-related questions regarding thoracolumbar spine fractures.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fsurg.2025.1524396/full#supplementary-material

References

  • 1.

    TianSJinQYeganovaLLaiP-TZhuQChenXet alOpportunities and challenges for ChatGPT and large language models in biomedicine and health. Brief Bioinform. (2023) 25:bbad493. 10.1093/bib/bbad493

  • 2.

    ZhangJSunKJagadeeshAFalakaflakiPKayayanETaoGet alThe potential and pitfalls of using a large language model such as ChatGPT, GPT-4, or LLaMA as a clinical assistant. J Am Med Inform Assoc. (2024) 31:188491. 10.1093/jamia/ocae184

  • 3.

    GiorginoRAlessandri-BonettiMLucaAMiglioriniFRossiNPerettiGMet alChatGPT in orthopedics: a narrative review exploring the potential of artificial intelligence in orthopedic practice. Front Surg. (2023) 10:1284015. 10.3389/fsurg.2023.1284015

  • 4.

    DaileyATArnoldPMAndersonPAChiJHDhallSSEichholzKMet alCongress of neurological surgeons systematic review and evidence-based guidelines on the evaluation and treatment of patients with thoracolumbar spine trauma: classification of injury. Neurosurgery. (2019) 84:E247. 10.1093/neuros/nyy372

  • 5.

    DhallSSDaileyATAndersonPAArnoldPMChiJHEichholzKMet alCongress of neurological surgeons systematic review and evidence-based guidelines on the evaluation and treatment of patients with thoracolumbar spine trauma: hemodynamic management. Neurosurgery. (2019) 84:E435. 10.1093/neuros/nyy368

  • 6.

    HarropJSChiJHAndersonPAArnoldPMDaileyATDhallSSet alCongress of neurological surgeons systematic review and evidence-based guidelines on the evaluation and treatment of patients with thoracolumbar spine trauma: neurological assessment. Neurosurgery. (2019) 84:E325. 10.1093/neuros/nyy370

  • 7.

    HohDJQureshiSAndersonPAArnoldPMJohnHCDaileyATet alCongress of neurological surgeons systematic review and evidence-based guidelines on the evaluation and treatment of patients with thoracolumbar spine trauma: nonoperative care. Neurosurgery. (2019) 84:E469. 10.1093/neuros/nyy369

  • 8.

    ChiJHEichholzKMAndersonPAArnoldPMDaileyATDhallSSet alCongress of neurological surgeons systematic review and evidence-based guidelines on the evaluation and treatment of patients with thoracolumbar spine trauma: novel surgical strategies. Neurosurgery. (2019) 84:E5962. 10.1093/neuros/nyy364

  • 9.

    RabbCHHohDJAndersonPAArnoldPMChiJHDaileyATet alCongress of neurological surgeons systematic review and evidence-based guidelines on the evaluation and treatment of patients with thoracolumbar spine trauma: operative versus nonoperative treatment. Neurosurgery. (2019) 84:E502. 10.1093/neuros/nyy361

  • 10.

    ArnoldPMAndersonPAChiJHDaileyATDhallSSEichholzKMet alCongress of neurological surgeons systematic review and evidence-based guidelines on the evaluation and treatment of patients with thoracolumbar spine trauma: pharmacological treatment. Neurosurgery. (2019) 84:E368. 10.1093/neuros/nyy371

  • 11.

    RaksinPBHarropJSAndersonPAArnoldPMChiJHDaileyATet alCongress of neurological surgeons systematic review and evidence-based guidelines on the evaluation and treatment of patients with thoracolumbar spine trauma: prophylaxis and treatment of thromboembolic events. Neurosurgery. (2019) 84:E3942. 10.1093/neuros/nyy367

  • 12.

    QureshiSDhallSSAndersonPAArnoldPMChiJHDaileyATet alCongress of neurological surgeons systematic review and evidence-based guidelines on the evaluation and treatment of patients with thoracolumbar spine trauma: radiological evaluation. Neurosurgery. (2019) 84:E2831. 10.1093/neuros/nyy373

  • 13.

    AndersonPARaksinPBArnoldPMChiJHDaileyATDhallSSet alCongress of neurological surgeons systematic review and evidence-based guidelines on the evaluation and treatment of patients with thoracolumbar spine trauma: surgical approaches. Neurosurgery. (2019) 84:E568. 10.1093/neuros/nyy363

  • 14.

    EichholzKMRabbCHAndersonPAArnoldPMChiJHDaileyATet alCongress of neurological surgeons systematic review and evidence-based guidelines on the evaluation and treatment of patients with thoracolumbar spine trauma: timing of surgical intervention. Neurosurgery. (2019) 84:E535. 10.1093/neuros/nyy362

  • 15.

    Chinese Medical Doctor Association Orthopedics Branch, Editorial Board of the Evidence-based Clinical Practice Guidelines for Acute Thoracolumbar Spinal Cord Injury in Adults by the Chinese Medical Doctor Association Orthopedics Branch. Evidence-based clinical practice guidelines for orthopedics by the Chinese medical doctor association orthopedics branch: evidence-based clinical practice guidelines for acute thoracolumbar spinal cord injury in adults. Chin J Surg. (2019) 57(3):1615. 10.3760/cma.j.issn.0529-5815.2019.03.001

Summary

Keywords

ChatGPT4o, thoracolumbar spine fractures, theranostic performance, clinical practice, AI in medicine

Citation

Jia X, Ma L, Yang Y, Deng Y, Shen C, Zhang K and Li Y (2025) ChatGPT4o's theranostic performance in the management of thoracolumbar spine fractures. Front. Surg. 12:1524396. doi: 10.3389/fsurg.2025.1524396

Received

07 November 2024

Accepted

12 February 2025

Published

25 February 2025

Volume

12 - 2025

Edited by

Wencai Liu, Shanghai Jiao Tong University, China

Reviewed by

Harish Kempegowda, Boston University, United States

Hartanto Hartanto, Universitas Widya Dharma, Indonesia

Nicola Manocchio, University of Rome Tor Vergata, Italy

Updates

Copyright

*Correspondence: Litai Ma

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics