Optimized BERT-based NLP outperforms Zero-Shot Methods for Automated Symptom Detection in Clinical Practice

Diaz Ochoa, Juan  Guillermo; Layer, Natalie; Mahr, Jonas; Mustafa, Faizan  E; Menzel, Christian  U.; Müller-Schilling, Martina; Schilling, Tobias; Illerhaus, Gerald; Knott, Markus; Krohn, Alexander

doi:10.3389/fdgth.2025.1623922

ORIGINAL RESEARCH article

Front. Digit. Health

Sec. Health Informatics

This article is part of the Research TopicAI in Healthcare: Transforming Clinical Risk Prediction, Medical Large Language Models, and BeyondView all 12 articles

Optimized BERT-based NLP outperforms Zero-Shot Methods for Automated Symptom Detection in Clinical Practice

Provisionally accepted

Juan Guillermo Diaz Ochoa^1*

Natalie Layer²

Jonas Mahr³

Faizan E Mustafa³

Christian U. Menzel⁴

Martina Müller-Schilling⁵

Tobias Schilling⁴

Gerald Illerhaus²

Markus Knott⁶

Alexander Krohn^4,7*

¹PERMEDIQ GmbH, Wang, Germany
²Klinikum Stuttgart, Stuttgart Cancer Center - Tumorzentrum Eva Mayr-Stihl DE, Stuttgart, Germany
³QuiBiQ GmbH, Stuttgart, Germany
⁴Department for Emergency and Intensive Care Medicine (DIANI), Klinikum Stuttgart, Stuttgart, Germany
⁵Department of Internal Medicine I, University Hospital Regensburg, Regensburg, Germany
⁶Klinikum Stuttgart, Stuttgart Cancer Center - Tumorzentrum Eva Mayr-Stihl DE,, Stuttgart, Germany
⁷Department of Internal Medicine I, University Hospital Regensburg, Regensgurg, Germany

The final, formatted version of the article will be published soon.

BACKGROUND: Large Language Models (LLMs) have raised broad expectations for clinical use, particularly in the processing of complex medical narratives. However, in practice, more targeted Natural Language Processing (NLP) approaches may offer higher precision and feasibility for symptom extraction from real-world clinical texts. NLP provides promising tools for extracting clinical information from unstructured medical narratives. However, few studies have focused on integrating symptom information from free texts in German, particularly for complex patient groups such as emergency department (ED) patients. The ED setting presents specific challenges: high documentation pressure, heterogeneous language styles, and the need for secure, locally deployable models due to strict data protection regulations. Furthermore, German remains a low-resource language in clinical NLP. METHODS: We implemented and compared two models for zero-shot learning—GLiNER and Mistral—and a fine-tuned BERT-based SCAI-BIO/BioGottBERT model for named entity recognition (NER) of symptoms, anatomical terms, and negations in German ED anamnesis texts in an on-premises environment in a hospital. Manual annotations of 150 narratives were used for model validation. The postprocessing steps included confidence-based filtering, negation exclusion, symptom standardization, and integration with structured oncology registry data. All computations were performed on local hospital servers in an on-premises implementation to ensure full data protection compliance. RESULTS: The fine-tuned SCAI-BIO/BioGottBERT model outperformed both zero-shot approaches, achieving an F1 score of 0.84 for symptom extraction and demonstrating superior performance in negation detection. The validated pipeline enabled systematic extraction of affirmed symptoms from ED-free text, transforming them into structured data. This method allows large-scale analysis of symptom profiles across patient populations and serves as a technical foundation for symptom-based clustering and subgroup analysis. CONCLUSIONS: Our study demonstrates that modern NLP methods can reliably extract clinical symptoms from German ED free text, even under strict data protection constraints and with limited training resources. Fine-tuned models offer a precise and practical solution for integrating unstructured narratives into clinical decision-making. This work lays the methodological foundation for a new way of systematically analyzing large patient cohorts on the basis of free-text data. Beyond symptoms, this approach can be extended to extracting diagnoses, procedures, or other clinically relevant entities.

Keywords: natural language processing (NLP), Named Entity Recognition (NER), Symptom extraction, large language models (LLM), Fine-tuning, Clinical NLP

Received: 06 May 2025; Accepted: 27 Oct 2025.

Copyright: © 2025 Diaz Ochoa, Layer, Mahr, Mustafa, Menzel, Müller-Schilling, Schilling, Illerhaus, Knott and Krohn. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Juan Guillermo Diaz Ochoa, juan.diaz@permediq.de
Alexander Krohn, a.krohn@outlook.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.