REVIEW article

Front. Artif. Intell.

Sec. Natural Language Processing

Volume 8 - 2025 | doi: 10.3389/frai.2025.1584203

Artificial Intelligence in Healthcare Text Processing: A Review Applied to Named Entity Recognition

Provisionally accepted
  • 1Federal University of Sergipe, São Cristóvão, Brazil
  • 2Instituto Federal do Rio Grande do Norte, Natal, Rio Grande do Norte, Brazil
  • 3Federal University of Rio Grande do Norte, Natal, Rio Grande do Norte, Brazil
  • 4School of Engineering of the City of Paris, Paris, France
  • 5Department of Informatics Engineering, Faculty of Engineering, University of Porto, Porto, Portugal
  • 6Coimbra Nursing School, Coimbra, Coimbra, Portugal

The final, formatted version of the article will be published soon.

We examine the growing importance of NER in the analysis of healthcare texts. NER, a fundamental technique in Natural Language Processing (NLP), automatically identifies and categorizes named entities in the text, such as names of people and organizations, in medical texts, medical conditions and drug names. This facilitates better information retrieval, personalized medicine approaches and clinical decision support systems. Problem: Traditional methods such as rule-based systems, word embeddings (e.g. Word2Vec, GloVe) and sequence tagging models such as CRFs and HMMs have difficulty capturing the complex and nuanced context of medical texts, leading to low precision and inflexibility. These methods also often require large and difficult-to-obtain labeled datasets. Solution: Systematic mapping has focused on advanced language models, specifically transformation-based models such as BERT. These models are known for capturing complex semantic dependencies and linguistic nuances, which are crucial for accurate processing of medical texts. Transformation architectures, unlike traditional techniques such as CNNs and RNNs, are better suited to dealing with the contextual and semantic nature of medical texts due to their ability to manage long sequences and the need for high precision. Results:The results indicate that transformation-based models, in particular BERT and its specialized 1 Samuel Santana de Almeida et al.variants (e.g. ClinicalBERT), consistently demonstrate high performance on NER tasks, with F1 scores often exceeding 97%, outperforming traditional and hybrid methods. When examining the geographical distribution of contributions, the research identifies a significant contribution from China, followed by the United States. These findings have crucial implications for the integration of NER technologies into the Brazilian National Health System (SUS). In conclusion, this systematic review contributes to the advancement of NER in health texts by evaluating methods, showing results and highlighting the wider implications for the field. The article is systematically structured into the following sections: Methodology, Bibliometric analysis, Results and discussion, Threats to validity, Future work and Conclusion. This systematic organization provides a comprehensive review of the research, its impact and future directions, highlighting the importance of keeping up to date with advances in the field to increase the relevance of NER applications in healthcare.

Keywords: Named Entity Recognition (NER), Health texts, BERT model, Advanced language models, ChatGPT, SUS

Received: 27 Feb 2025; Accepted: 12 May 2025.

Copyright: © 2025 Santana De Almeida, Silva Fontes, Pareja Credidio Freire Alves, Júnior, José Pinheiro Caldeira Silva, Ramalho Cortez, Morais, Medeiros Machado, Gonçalo Oliveira, Cunha-Oliveira, dos Santos and Valentim. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Samuel Santana De Almeida, Federal University of Sergipe, São Cristóvão, Brazil

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.