Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Natural Language Processing

Volume 8 - 2025 | doi: 10.3389/frai.2025.1634774

This article is part of the Research TopicMedical Knowledge-Assisted Machine Learning Technologies in Individualized Medicine Volume IIView all 21 articles

Named Entity Recognition for Chinese Electronic Medical Records by Integrating Knowledge Graph and ClinicalBERT

Provisionally accepted
Xiang  XuXiang XuKai  MaKai Ma*
  • Xuzhou Medical University, Xuzhou, China

The final, formatted version of the article will be published soon.

General purpose language models often struggle to accurately identify domain specific terminology in the medical domain, leading to suboptimal performance in named entity recognition tasks. This study proposes a method for named entity recognition in Chinese electronic medical records that combines ClinicalBERT, a language model pre-trained on clinical corpora, with structured knowledge from a medical knowledge graph. To enhance semantic understanding, entity representations derived using Translating Embeddings (TransE) are incorporated into the model. In addition, the method integrates multiple character level features, including positional labels, contextual category clues, and semantic embeddings, which help improve boundary detection in Chinese texts that lack explicit word delimiters. The input texts are first annotated using the Begin, Inside, Outside, End, Single (BIOES) tagging scheme, then encoded by ClinicalBERT and passed through a bidirectional long short term memory (BiLSTM) network followed by a conditional random field (CRF) layer for label prediction. Experimental results on public datasets show that the proposed method achieves an F1 score of 89.44 percent, outperforming existing baselines and demonstrating strong effectiveness for clinical applications.

Keywords: Named entity recognition1, ClinicalBERT2, Chinese electronic medical records3, Knowledge graphs4, BiLSTM5, CRF6

Received: 25 May 2025; Accepted: 18 Aug 2025.

Copyright: © 2025 Xu and Ma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Kai Ma, Xuzhou Medical University, Xuzhou, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.