AUTHOR=Acciai Alessandro , Guerrisi Lucia , Perconti Pietro , Plebe Alessio , Suriano Rossella , Velardi Andrea 

TITLE=Narrative coherence in neural language models

JOURNAL=Frontiers in Psychology

VOLUME=Volume 16 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2025.1572076

DOI=10.3389/fpsyg.2025.1572076

ISSN=1664-1078

ABSTRACT=Neural language models, although at first approximation they may be simply described as predictors of the next token in a given sequence, surprisingly exhibit linguistic behaviors akin to human ones. This suggests the existence of an underlying sophisticated cognitive system in language production. This intriguing circumstance has inspired the adoption of psychological theories as investigative tools and the present research falls within this line of inquiry. What we aim to establish is the potential existence of a core of coherent integration in language production, metaphorically parallel to a human speaker's personal identity. To investigate this, we employed a well-established psychological theory on narrative coherence in autobiographical stories. This theory offers the theoretical advantage of a strong correlation between narrative coherence and a high integrative level of the personal knowledge system. It also provides the empirical advantage of methodologies for quantifying coherence and its characteristic dimensions through the analysis of autobiographical texts. The same methodology was applied to 2010 autobiographical stories generated by GPT-3.5 and an equal number from GPT-4, elicited by asking the models to assume roles that included a variety of variables such as gender, mood, and age. The large number of stories ensures adequate sampling given the stochastic nature of the models, and was made possible thanks to the adoption of an automated coherence evaluation procedure. We initially asked the models to generate 192 autobiographical stories, which were then analyzed by a team of professional psychologists. Based on this sample, we constructed a training set for the fine-tuning of GPT-3.5 as an automatic evaluator. Our results from the 4020 autobiographical stories overall show a level of narrative coherence in the models fully in line with data on human subjects, with slightly higher values in the case of GPT-4. These results suggest a high level of knowledge unification in the models, comparable to the integration of the self in human beings.