Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Medicine and Public Health

Volume 8 - 2025 | doi: 10.3389/frai.2025.1669896

This article is part of the Research TopicThe Use of Large Language Models to Automate, Enhance, and Streamline Text Analysis Processes. Large Language Models Used to Analyze and Check Requirement Compliance.View all articles

LLMCARE: Alzheimer’s Detection via Transformer Models Enhanced by LLM-Generated Synthetic Data

Provisionally accepted
Ali  ZolnourAli Zolnour1Hossein  AzadmalekiHossein Azadmaleki1Yasaman  HaghbinYasaman Haghbin1Fatemeh  TaherinezhadFatemeh Taherinezhad1Mohamad Javad  Momeni NezhadMohamad Javad Momeni Nezhad1Sina  RashidiSina Rashidi1Masoud  KhaniMasoud Khani2AmirSajjad  TalebanAmirSajjad Taleban2Samin  Mahdizadeh SaniSamin Mahdizadeh Sani3Maryam  DadkhahMaryam Dadkhah1James  M. NobleJames M. Noble1Suzanne  BakkenSuzanne Bakken1Yadollah  YaghoobzadehYadollah Yaghoobzadeh3Abdol-Hossein  VahabieAbdol-Hossein Vahabie3Masoud  RouhizadehMasoud Rouhizadeh4Maryam  ZolnooriMaryam Zolnoori1*
  • 1Columbia University, New York City, United States
  • 2University of Wisconsin-Milwaukee, Milwaukee, United States
  • 3University of Tehran, Tehran, Iran
  • 4University of Florida, Gainesville, United States

The final, formatted version of the article will be published soon.

Background: Alzheimer's disease and related dementias (ADRD) affect nearly five million older adults in the United States, yet more than half remain undiagnosed. Speech-based natural language processing (NLP) provides a scalable approach to identify early cognitive decline by detecting subtle linguistic markers that may precede clinical diagnosis. Objective: This study aims to develop and evaluate a speech-based screening pipeline that integrates transformer-based embeddings with handcrafted linguistic features, incorporates synthetic augmentation using large language models (LLMs), and benchmarks unimodal and multimodal LLM classifiers. External validation was performed to assess generalizability to an MCI-only cohort. Methods: Transcripts were obtained from the ADReSSo 2021 benchmark dataset (n = 237; derived from the Pitt Corpus, DementiaBank) and the DementiaBank Delaware corpus (n = 205; clinically diagnosed mild cognitive impairment [MCI] vs controls). Audio was automatically transcribed using Amazon Web Services Transcribe (general model). Ten transformer models were evaluated under three fine-tuning strategies. A late-fusion model combined embeddings from the best-performing transformer with 110 linguistically derived features. Five LLMs (LLaMA-8B/70B, MedAlpaca-7B, Ministral-8B, GPT-4o) were fine-tuned to generate label-conditioned synthetic speech for data augmentation. Three multimodal LLMs (GPT-4o, Qwen-Omni, Phi-4) were tested in zero-shot and fine-tuned settings. Results: On the ADReSSo dataset, the fusion model achieved an F1-score of 83.32 (AUC = 89.48), outperforming both transformer-only and linguistic-only baselines. Augmentation with MedAlpaca-7B synthetic speech improved performance to F1 = 85.65 at 2× scale, whereas higher augmentation volumes reduced gains. Fine-tuning improved unimodal LLM classifiers (e.g., MedAlpaca-7B, F1 = 47.73 → 78.69), while multimodal models demonstrated lower performance (Phi-4 = 71.59; GPT-4o omni = 67.57). On the Delaware corpus, the pipeline generalized to an MCI-only cohort, with the fusion model plus 1× MedAlpaca-7B augmentation achieving F1 = 72.82 (AUC = 69.57). Conclusions: Integrating transformer embeddings with handcrafted linguistic features enhances ADRD detection from speech. Distributionally aligned LLM-generated narratives provide effective but bounded augmentation, while current multimodal models remain limited. Crucially, validation on the Delaware corpus demonstrates that the proposed pipeline generalizes to early-stage impairment, supporting its potential as a scalable approach for clinically relevant early screening. All codes for LLMCARE are publicly available at: GitHub

Keywords: Alzheimer's disease, mild cognitive impairment (MCI), Large language models, Data augmentation, transformers, Natural Language Processing

Received: 20 Jul 2025; Accepted: 13 Oct 2025.

Copyright: © 2025 Zolnour, Azadmaleki, Haghbin, Taherinezhad, Momeni Nezhad, Rashidi, Khani, Taleban, Mahdizadeh Sani, Dadkhah, Noble, Bakken, Yaghoobzadeh, Vahabie, Rouhizadeh and Zolnoori. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Maryam Zolnoori, m.zolnoori@gmail.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.