ORIGINAL RESEARCH article
Front. Digit. Health
Sec. Digital Mental Health
This article is part of the Research TopicDigital Health Past, Present, and FutureView all 40 articles
High-Accuracy Prediction of Mental Health Scores from English BERT Embeddings Trained on LLM-Generated Synthetic Self-Reports: A Synthetic-Only Method Development Study
Provisionally accepted- 1Division of Speech Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
- 2Karolinska Institutet, Stockholm, Sweden
- 3Karolinska Universitetssjukhuset, Stockholm, Sweden
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Objective. To assess whether synthetic-only first-person clinical self-reports generated by a large language model (LLM) can support accurate prediction of standardized mental-health scores, enabling a privacy-preserving path for method development and rapid prototyping when real clinical text is unavailable. Methods. We prompted an LLM (Gemini 2.5; July 2025 snapshot) to produce English-language first-person narratives that are paired with target scores for three instruments—PHQ-9 (including suicidal ideation), LSAS, and PCL-5. No real patients or clinical notes were used. Narratives and labels were created synthetically and manually screened for coherence and label alignment. Each narrative was embedded using bert-base-uncased (mean-pooled 768-d vectors). We trained linear/regularized linear (Linear, Ridge, Lasso) and ensemble models (Random Forest, Gradient Boosting) for regression, and Logistic Regression/Random Forest for suicidal-ideation classification. Evaluation used 5-fold cross-validation (PHQ-9/SI) and 80/20 held-out splits (LSAS/PCL-5). Metrics: MSE, R2, MAE; classification metrics are reported for SI. Results. Within the synthetic distribution, models fit the label–text signal strongly (e.g., PHQ-9 Ridge: MSE 4.41 ± 0.56, R2 0.92 ± 0.02; LSAS Gradient Boosting test: MSE 75.00, R2 0.95; PCL-5 Ridge test: MSE 35.62, R2 0.85). Conclusions. LLM-generated self-reports encode a score-aligned signal that standard ML models can learn, indicating utility for privacy-preserving, synthetic-only prototyping. This is not a clinical tool: results do not imply generalization to real patient text. We clarify terminology (synthetic text vs. real text) and provide a roadmap for external validation, bias/fidelity assessment, and scope-limited deployment considerations before any clinical use.
Keywords: BERT, digital mental health, Large language models, LSAS, Natural Language Processing, PCL-5, PHQ-9, Privacy-preserving evaluation
Received: 28 Aug 2025; Accepted: 15 Dec 2025.
Copyright: © 2025 Moell and Sand Aronsson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Birger Moell
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.