Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Digit. Health

Sec. Digital Mental Health

This article is part of the Research TopicDigital Health Past, Present, and FutureView all 40 articles

High-Accuracy Prediction of Mental Health Scores from English BERT Embeddings Trained on LLM-Generated Synthetic Self-Reports: A Synthetic-Only Method Development Study

Provisionally accepted
  • 1Division of Speech Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
  • 2Karolinska Institutet, Stockholm, Sweden
  • 3Karolinska Universitetssjukhuset, Stockholm, Sweden

The final, formatted version of the article will be published soon.

Objective. To assess whether synthetic-only first-person clinical self-reports generated by a large language model (LLM) can support accurate prediction of standardized mental-health scores, enabling a privacy-preserving path for method development and rapid prototyping when real clinical text is unavailable. Methods. We prompted an LLM (Gemini 2.5; July 2025 snapshot) to produce English-language first-person narratives that are paired with target scores for three instruments—PHQ-9 (including suicidal ideation), LSAS, and PCL-5. No real patients or clinical notes were used. Narratives and labels were created synthetically and manually screened for coherence and label alignment. Each narrative was embedded using bert-base-uncased (mean-pooled 768-d vectors). We trained linear/regularized linear (Linear, Ridge, Lasso) and ensemble models (Random Forest, Gradient Boosting) for regression, and Logistic Regression/Random Forest for suicidal-ideation classification. Evaluation used 5-fold cross-validation (PHQ-9/SI) and 80/20 held-out splits (LSAS/PCL-5). Metrics: MSE, R2, MAE; classification metrics are reported for SI. Results. Within the synthetic distribution, models fit the label–text signal strongly (e.g., PHQ-9 Ridge: MSE 4.41 ± 0.56, R2 0.92 ± 0.02; LSAS Gradient Boosting test: MSE 75.00, R2 0.95; PCL-5 Ridge test: MSE 35.62, R2 0.85). Conclusions. LLM-generated self-reports encode a score-aligned signal that standard ML models can learn, indicating utility for privacy-preserving, synthetic-only prototyping. This is not a clinical tool: results do not imply generalization to real patient text. We clarify terminology (synthetic text vs. real text) and provide a roadmap for external validation, bias/fidelity assessment, and scope-limited deployment considerations before any clinical use.

Keywords: BERT, digital mental health, Large language models, LSAS, Natural Language Processing, PCL-5, PHQ-9, Privacy-preserving evaluation

Received: 28 Aug 2025; Accepted: 15 Dec 2025.

Copyright: © 2025 Moell and Sand Aronsson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Birger Moell

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.