Your new experience awaits. Try the new design now and help us make it even better

TECHNOLOGY AND CODE article

Front. Digit. Health

Sec. Digital Mental Health

Volume 7 - 2025 | doi: 10.3389/fdgth.2025.1625444

This article is part of the Research TopicAdvances in Generative Artificial Intelligence for Mental HealthView all 3 articles

Synthetic Patient and Interview Transcript Creator: An Essential Tool for LLMs in Mental Health

Provisionally accepted
Aleyna  WarnerAleyna WarnerJeffrey  LedueJeffrey LedueYutong  CaoYutong CaoJoseph  ThamJoseph ThamTim  MurphyTim Murphy*
  • University of British Columbia, Vancouver, Canada

The final, formatted version of the article will be published soon.

Developing high-quality training data is essential for tailoring large language models (LLMs) to specialized applications like mental health. To address privacy and legal constraints associated with real patient data, we designed a synthetic patient and interview generation framework that can be tailored to regional patient demographics. This system employs two locally run instances of Llama 3.3:70B: one as the interviewer and the other as the patient. These models produce contextually rich interview transcripts, structured by a customizable question bank, with lexical diversity similar to normal human conversation. We calculate median Distinct-1 scores of 0.44 and 0.33 for the patient and interview assistant model outputs respectively compared to 0.50 +/-0.11 as the average for 10,000 episodes of a radio program dialog. Central to this approach is the patient generation process, which begins with a locally run Llama 3.3:70B model. Given the full question bank, the model generates a detailed profile template, combining predefined variables (e.g., demographic data or specific conditions) with LLM-generated content to fill in contextual details. This hybrid method ensures that each patient profile is both diverse and realistic, providing a strong foundation for generating dynamic interactions. Demographic distributions of generated patient profiles were not significantly different from real-world population data and exhibited expected variability. Additionally, for the patient profiles we assessed LLM metrics and found an average Distinct-1 score of 0.8 (max=1) indicating diverse word usage. By integrating detailed patient generation with dynamic interviewing, the framework produces synthetic datasets that may aid the adoption and deployment of LLMs in mental health settings.

Keywords: LLM, python, Psychiatry, Large language models, synthetic data, Automation

Received: 08 May 2025; Accepted: 18 Aug 2025.

Copyright: © 2025 Warner, Ledue, Cao, Tham and Murphy. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Tim Murphy, University of British Columbia, Vancouver, Canada

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.