AUTHOR=Xu Xiang , Li Zhengxiong , Zhang Hongwei , Ma Kai TITLE=Named entity recognition for Chinese electronic medical records by integrating knowledge graph and ClinicalBERT JOURNAL=Frontiers in Artificial Intelligence VOLUME=Volume 8 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1634774 DOI=10.3389/frai.2025.1634774 ISSN=2624-8212 ABSTRACT=IntroductionGeneral purpose language models often struggle with accurately identifying domain specific terminology in the medical field, resulting in suboptimal performance in named entity recognition (NER) tasks. This challenge is particularly pronounced in Chinese electronic medical records (EMRs), which lack clear word boundaries and contain complex medical expressions.MethodsThis study proposes a novel NER method for Chinese EMRs that integrates ClinicalBERT, a language model pre trained on clinical corpora, with structured knowledge from a medical knowledge graph. Entity representations derived via Translating Embeddings (TransE) are incorporated to inject external semantic knowledge. Furthermore, the model fuses multiple character level features, including positional labels, contextual category clues, and semantic embeddings, to enhance boundary detection. The input text is annotated using the BIOES (Begin, Inside, Outside, End, Single) tagging scheme and subsequently encoded by ClinicalBERT. The encoded features are then passed through a bidirectional long short term memory (BiLSTM) network and a conditional random field (CRF) layer for final label prediction.ResultsExperiments conducted on publicly available datasets demonstrate that the proposed approach achieves an F1 score of 89.44 percent, surpassing multiple existing baseline models in performance.DiscussionThese findings confirm that the integration of domain specific language modeling, structured medical knowledge, and enriched character level features significantly enhances NER accuracy in Chinese EMRs. The proposed method shows strong potential for practical deployment in clinical information extraction systems.