AUTHOR=Polonioli Andrea 

TITLE=Moving LLM evaluation forward: lessons from human judgment research

JOURNAL=Frontiers in Artificial Intelligence

VOLUME=Volume 8 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1592399

DOI=10.3389/frai.2025.1592399

ISSN=2624-8212

ABSTRACT=This paper outlines a path toward more reliable and effective evaluation of Large Language Models (LLMs). It argues that insights from the study of human judgment and decision-making can illuminate current challenges in LLM assessment and help close critical gaps in how models are evaluated. By drawing parallels between human reasoning and model behavior, the paper advocates moving beyond narrow metrics toward more nuanced, ecologically valid frameworks.