METHODS article
Front. Artif. Intell.
Sec. Medicine and Public Health
Volume 8 - 2025 | doi: 10.3389/frai.2025.1644084
This article is part of the Research TopicGenAI in Healthcare: Technologies, Applications and EvaluationView all 6 articles
Privacy-, Linguistic-, and Information-Preserving Synthesis of Clinical Documentation through Generative Agents
Provisionally accepted- 1Erasmus MC Afdeling Anesthesiologie, Rotterdam, Netherlands
- 2Data Supported Healthcare: Data‑Science unit, Research Center Innovations in Care, Rotterdam University of Applied Sciences, Rotterdam, Netherlands
- 3School of Communication, Media and Information Technology, Rotterdam University of Applied Sciences, Rotterdam, Netherlands
- 4HR Datalab EAS, School of Engineering and Applied Science, Rotterdam University of Applied Sciences, Rotterdam, Netherlands
- 5Department of Orthopedic Surgery, VieCuri Medical Centre, Venlo, The Netherlands, Venlo, Netherlands
- 6Radboud Institute for Health Sciences, IQ Health, Radboud University Medical Center, Nijmegen, The Netherlands, Nijmegen, Netherlands
- 7School of Allied Health, HAN University of Applied Sciences, Nijmegen, Nijmegen, Netherlands
- 8Medifit Bewegingscentrum, Oss, Netherlands
- 9Hogeschool Rotterdam Instituut voor Engineering en Applied Science, Rotterdam, Netherlands
- 10Top Sector Life Sciences and Health (Health~Holland),, The Hague, Netherlands
- 11Data Science & AI Engineering, Philips, Eindhoven, Netherlands
- 12Department of Anesthesiology, Erasmus Medical Center, Rotterdam, Netherlands
- 13Top Sector Life Sciences and Health (Health~Holland), The Hague, Netherlands
- 14Allied Health Professions, Faculty of Medicine and Acience, Macquarrie University, Sydney, Australia
- 15Solid Start Coalition, Erasmus Medical Center, Rotterdam, Netherlands
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
The widespread adoption of generative agents (GAs) is reshaping the healthcare landscape. Nonetheless, broad utilization is impeded by restricted access to high-quality, interoperable clinical documentation from electronic health records (EHRs) due to persistent legal, ethical, and technical barriers. Synthetic health data generation (SHDG), leveraging pre-trained large language models (LLMs) instantiated as GAs, could offer a practical solution by creating synthetic patient information that mimics genuine EHRs. The use of LLMs, however, is not without issues; significant concerns remain regarding privacy, potential bias propagation, the risk of generating inaccurate or misleading content, and the lack of transparency in how these models make decisions. We therefore propose a privacy-, linguistic-, and information-preserving SHDG protocol that employs multiple context-aware, role-specific GAs. Guided by targeted prompting and authentic EHRs—serving as structural and linguistic templates—role-specific GAs can, in principle, operate collaboratively through multi-turn interactions. We theorized that utilizing GAs in this fashion permits LLMs not only to produce synthetic EHRs that are accurate, consistent, and contextually appropriate, but also to expose the underlying decision-making process.To test this hypothesis, we developed a no-code GA-driven SHDG workflow as a proof of concept, which was implemented within a predefined, multi-layered data science infrastructure (DSI) stack—an integrated ensemble of software and hardware designed to support rapid prototyping and deployment. The DSI stack streamlines implementation for healthcare professionals, improving accessibility, usability, and cybersecurity. To deploy and validate GA-assisted workflows, we implemented a fully automated SHDG evaluation framework—co-developed with GenAI technology—which holistically compares the informational and linguistic features of synthetic, anonymized, and real EHRs at both the document and corpus levels. Our findings highlight that SHDG implemented through GAs offers a scalable, transparent, and reproducible methodology for unlocking the potential of clinical documentation to drive innovation, accelerate research, and advance the development of learning health systems. The source code, synthetic datasets, toolchains and prompts created for this study can be accessed at the GitHub repository: RESEARCH_SUPPORT/PROJECTS/Generative_Agent_based_Data-Synthesis at main · HR-DataLab-Healthcare/RESEARCH_SUPPORT
Keywords: healthcare, Data synthesis, Privacy, generative agents, Linguistics, Information Theory, Synthetic Health Data Generation (SHDG), Clinical Natural Language Processing (NLP)
Received: 09 Jun 2025; Accepted: 25 Aug 2025.
Copyright: © 2025 Velzen, van der Willigen, De Beer, de Graaf-Waar, De Graaf-Waar, van Leeuwen, van der Willigen, Van Der Willigen, Renardus, El Maaroufi, Satimin, Hartog, Hulsen, van Meetreren and Scheper. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Robert Frans van der Willigen, Data Supported Healthcare: Data‑Science unit, Research Center Innovations in Care, Rotterdam University of Applied Sciences, Rotterdam, Netherlands
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.