AUTHOR=Baldwin Peter 

TITLE=Audit-style framework for evaluating bias in large language models

JOURNAL=Frontiers in Education

VOLUME=Volume 10 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2025.1592037

DOI=10.3389/feduc.2025.1592037

ISSN=2504-284X

ABSTRACT=One concern with AI systems is their potential to produce biased output. These biases can be difficult to detect due to the complex and often proprietary nature of the systems, which limits transparency. We propose an evaluation framework for assessing whether a system exhibits biased behavior. This evaluation consists of a series of tasks in which an AI system is instructed to select one of two students for an award based on their performance on a standardized assessment. The two students are implicitly associated with different demographic subgroups, and the evaluation is designed such that students from each subgroup perform equally well on average. In this way, any consistent preference for a particular subgroup can be attributed to bias in the system’s output. The proposed framework is illustrated using GPT-3.5 and GPT-4, with racial subgroups (Black/White) and an assessment composed of math items. In this demonstration, GPT-3.5 favored Black students over White students by a factor of approximately 2:1 (66.5%; 1,061 out of 1,596 non-equivocal choices). In contrast, GPT-4 showed a slight numerical preference for Black students (291 vs. 276; 51.3%), but this difference was not statistically significant (p = 0.557), indicating no detectable bias. These results suggest that the proposed audit is sensitive to differences in system bias in this context.