Your new experience awaits. Try the new design now and help us make it even better

METHODS article

Front. Artif. Intell.

Sec. Medicine and Public Health

Volume 8 - 2025 | doi: 10.3389/frai.2025.1644084

This article is part of the Research TopicGenAI in Healthcare: Technologies, Applications and EvaluationView all 6 articles

Privacy-, Linguistic-, and Information-Preserving Synthesis of Clinical Documentation through Generative Agents

Provisionally accepted
Mark  van VelzenMark van Velzen1,2Robert  Frans van der WilligenRobert Frans van der Willigen2,3,4*Vincent  J. De BeerVincent J. De Beer2,4Helen  I. de Graaf-WaarHelen I. de Graaf-Waar1,2Ester  R.C. De Graaf-WaarEster R.C. De Graaf-Waar5,6,7Esther  van LeeuwenEsther van Leeuwen8Micha  F. van der WilligenMicha F. van der Willigen2,3Martijn  J. Van Der WilligenMartijn J. Van Der Willigen2,3,9Gavin  RenardusGavin Renardus2,3Rayan  El MaaroufiRayan El Maaroufi2,3Sven  J SatiminSven J Satimin2,3Larissa  M. HartogLarissa M. Hartog10,2Tim  HulsenTim Hulsen11,2Nico  L.U. van MeetrerenNico L.U. van Meetreren12,13Mark  C. ScheperMark C. Scheper14,15,2,3
  • 1Erasmus MC Afdeling Anesthesiologie, Rotterdam, Netherlands
  • 2Data Supported Healthcare: Data‑Science unit, Research Center Innovations in Care, Rotterdam University of Applied Sciences, Rotterdam, Netherlands
  • 3School of Communication, Media and Information Technology, Rotterdam University of Applied Sciences, Rotterdam, Netherlands
  • 4HR Datalab EAS, School of Engineering and Applied Science, Rotterdam University of Applied Sciences, Rotterdam, Netherlands
  • 5Department of Orthopedic Surgery, VieCuri Medical Centre, Venlo, The Netherlands, Venlo, Netherlands
  • 6Radboud Institute for Health Sciences, IQ Health, Radboud University Medical Center, Nijmegen, The Netherlands, Nijmegen, Netherlands
  • 7School of Allied Health, HAN University of Applied Sciences, Nijmegen, Nijmegen, Netherlands
  • 8Medifit Bewegingscentrum, Oss, Netherlands
  • 9Hogeschool Rotterdam Instituut voor Engineering en Applied Science, Rotterdam, Netherlands
  • 10Top Sector Life Sciences and Health (Health~Holland),, The Hague, Netherlands
  • 11Data Science & AI Engineering, Philips, Eindhoven, Netherlands
  • 12Department of Anesthesiology, Erasmus Medical Center, Rotterdam, Netherlands
  • 13Top Sector Life Sciences and Health (Health~Holland), The Hague, Netherlands
  • 14Allied Health Professions, Faculty of Medicine and Acience, Macquarrie University, Sydney, Australia
  • 15Solid Start Coalition, Erasmus Medical Center, Rotterdam, Netherlands

The final, formatted version of the article will be published soon.

The widespread adoption of generative agents (GAs) is reshaping the healthcare landscape. Nonetheless, broad utilization is impeded by restricted access to high-quality, interoperable clinical documentation from electronic health records (EHRs) due to persistent legal, ethical, and technical barriers. Synthetic health data generation (SHDG), leveraging pre-trained large language models (LLMs) instantiated as GAs, could offer a practical solution by creating synthetic patient information that mimics genuine EHRs. The use of LLMs, however, is not without issues; significant concerns remain regarding privacy, potential bias propagation, the risk of generating inaccurate or misleading content, and the lack of transparency in how these models make decisions. We therefore propose a privacy-, linguistic-, and information-preserving SHDG protocol that employs multiple context-aware, role-specific GAs. Guided by targeted prompting and authentic EHRs—serving as structural and linguistic templates—role-specific GAs can, in principle, operate collaboratively through multi-turn interactions. We theorized that utilizing GAs in this fashion permits LLMs not only to produce synthetic EHRs that are accurate, consistent, and contextually appropriate, but also to expose the underlying decision-making process.To test this hypothesis, we developed a no-code GA-driven SHDG workflow as a proof of concept, which was implemented within a predefined, multi-layered data science infrastructure (DSI) stack—an integrated ensemble of software and hardware designed to support rapid prototyping and deployment. The DSI stack streamlines implementation for healthcare professionals, improving accessibility, usability, and cybersecurity. To deploy and validate GA-assisted workflows, we implemented a fully automated SHDG evaluation framework—co-developed with GenAI technology—which holistically compares the informational and linguistic features of synthetic, anonymized, and real EHRs at both the document and corpus levels. Our findings highlight that SHDG implemented through GAs offers a scalable, transparent, and reproducible methodology for unlocking the potential of clinical documentation to drive innovation, accelerate research, and advance the development of learning health systems. The source code, synthetic datasets, toolchains and prompts created for this study can be accessed at the GitHub repository: RESEARCH_SUPPORT/PROJECTS/Generative_Agent_based_Data-Synthesis at main · HR-DataLab-Healthcare/RESEARCH_SUPPORT

Keywords: healthcare, Data synthesis, Privacy, generative agents, Linguistics, Information Theory, Synthetic Health Data Generation (SHDG), Clinical Natural Language Processing (NLP)

Received: 09 Jun 2025; Accepted: 25 Aug 2025.

Copyright: © 2025 Velzen, van der Willigen, De Beer, de Graaf-Waar, De Graaf-Waar, van Leeuwen, van der Willigen, Van Der Willigen, Renardus, El Maaroufi, Satimin, Hartog, Hulsen, van Meetreren and Scheper. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Robert Frans van der Willigen, Data Supported Healthcare: Data‑Science unit, Research Center Innovations in Care, Rotterdam University of Applied Sciences, Rotterdam, Netherlands

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.