Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Machine Learning and Artificial Intelligence

Volume 8 - 2025 | doi: 10.3389/frai.2025.1638971

This article is part of the Research TopicThe Use of Large Language Models to Automate, Enhance, and Streamline Text Analysis Processes. Large Language Models Used to Analyze and Check Requirement Compliance.View all articles

LegNER: A Domain-Adapted Transformer for Legal Named Entity Recognition and Text Anonymization

Provisionally accepted
  • 1Rochester Institute of Technology - Dubai Campus, Dubai, United Arab Emirates
  • 2Ionio Panepistemio, Corfu, Greece
  • 3Department of Informatics, Ionian University, Corfu, Greece

The final, formatted version of the article will be published soon.

The increasing demand for scalable and privacy-preserving processing of legal documents has intensified the need for accurate Named Entity Recognition (NER) systems tailored to the legal domain. In this work, we introduce LegNER, a domain-adapted transformer model designed for both legal NER and text anonymization. The model is trained on a corpus of 1,542 manually annotated court cases and enriched with an extended legal vocabulary, enabling robust recognition of six critical entity types, including PERSON, ORGANIZATION, LAW, and CASE REFERENCE. Built on BERT-base and enhanced through domain-specific pretraining and span-level supervision, LegNER consistently outperforms established legal NER baselines. Experimental results demonstrate significant gains in accuracy (99%), F1 score (over 99%), and inference efficiency (processing more than 12 documents per second), confirming both its precision and scalability. Beyond quantitative improvements, qualitative evaluation highlights LegNER's ability to generate coherent anonymized outputs, a crucial requirement for GDPR-compliant redaction and automated legal analytics. Taken together, these results establish LegNER as a reliable and effective solution for high-precision entity recognition and anonymization in compliance-sensitive legal workflows.

Keywords: Legal NLP, named entity recognition, Transformer models, Text anonymization, Domain adaptation, GDPR compliance

Received: 31 May 2025; Accepted: 20 Oct 2025.

Copyright: © 2025 Karamitsos, Roufas, Al-Hussaeni and Kanavos. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Andreas Kanavos, andreas.kan@gmail.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.