Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Lang. Sci.

Sec. Psycholinguistics

This article is part of the Research TopicInsights in Psycholinguistics: 2025View all 14 articles

ChatGPT-simulated sentence-plausibility in event contexts, with teens, younger and older adults, in fiction and newspaper texts

Provisionally accepted
  • Faculty of Language, Literature and Humanities, Humboldt University of Berlin, Berlin, Germany

The final, formatted version of the article will be published soon.

In 4 pilot studies and two main experiments, Chat-GPT4o / 5 plausibility ratings were simulated from the graphical user interface using written prompts, factorial designs, Latin-Square counterbalanced lists, and N=200 simulated participants per between-participant factor level. In this way, an experiment setup much like for in-laboratory experiments with human participants was simulated. At issue was to what extent the large language model would produce simulations that are close enough to human-based world knowledge to serve as pilot data for human experimentation: LLMs are developing rapidly and if they were sufficiently accurate databases of human world knowledge, this would open up interesting opportunities for empirical research; with their advent we may have the opportunity of accessing a very comprehensive model of world knowledge. This claim was assessed via simulating human plausibility ratings and their variation depending on (i) the presence versus absence of an event description, and (ii) the age of LLM-simulated participants (Pilot 1, Pilot 2, and Experiment 1a), and (iii) LLM-simulated participant expectations of distinct text sources / genres (Experiment 1b). As a baseline, plausibility ratings generated via LLM chat interface were compared against human plausibility ratings reported in prior research. Overall, ChatGPT produced simulated ratings which on average were higher for plausible than implausible sentences and higher when an event description supported the event conveyed by the target sentence. The model also revealed fine-grained differences depending on simulated participant age and context-sentence relations, and genre. These can guide the formulation of testable hypotheses for future research with human participants.

Keywords: ChatGPT, Inter- and intra-individual differences, Large language models, plausibility ratings, Simulations, world knowledge

Received: 28 Oct 2025; Accepted: 19 Jan 2026.

Copyright: © 2026 Knoeferle. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Pia Knoeferle

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.