Large Language Models in Forensic Analysis: A Utility-Based Risk Assessment Framework for Mental Health Evaluations

Marcos Abreu Filho, Regio

doi:10.3389/fpsyt.2026.1785052

ORIGINAL RESEARCH article

Front. Psychiatry

Sec. Computational Psychiatry

This article is part of the Research TopicAdvancing Biostatistics and Informatics Applications in Mental Health ResearchView all 8 articles

Large Language Models in Forensic Analysis: A Utility-Based Risk Assessment Framework for Mental Health Evaluations

Provisionally accepted

Regio Marcos Abreu Filho^*

PMERJ, Rio de Janeiro, Brazil

The final, formatted version of the article will be published soon.

Large language models (LLMs) are increasingly used for summarization, triage, and drafting in clinical and legal workflows. In forensic psychiatry, however, decisions combine low base rates, asymmetric harms, and adversarial scrutiny—conditions under which accuracy alone is insufficient for determining whether model-assisted inferences are fit for use. We present a decision-analytic governance framework that makes admissibility conditional on (i) jurisdiction-specific harm weights, (ii) local base rates, and (iii) auditable evidence integrity. Using decision curve analysis (DCA), we evaluate net benefit across policy-relevant threshold regions rather than single-point metrics. We introduce an evidence-integrity score Q ∈[0, 1] that operationalizes provenance, corroboration, contradiction handling, and coverage via a claim–evidence map. We provide illustrative demonstrations of (a) the base-rate "prevalence trap" under high-sensitivity screening, (b) an auditable unsupported-claim metric with integrity gating, and (c) a Bayesian evidence map for causal nexus judgments with explicit dependence on source integrity. To address the fact that net benefit is harm-weighted and real-valued, we add extended demonstrations that span positive to strongly negative net-benefit regimes across a 2 × 2 grid of operating points (high/low TPR; high/low FPR) and low-vs. high-stakes thresholds. Finally, we provide concrete workflow vignettes for criminal responsibility and violence risk assessment, clarifying that the LLM functions as decision-support and documentation aid rather than a jurisdictional decision tool. The proposed framework is contestable, measurable, and designed to reduce epistemic risk while supporting transparent decision-analytic evaluation in forensic practice.

Keywords: bayesian networks, Decision curve analysis, Evidence integrity, Forensic Psychiatry, governance, Large language models, Risk Assessment, utility

Received: 10 Jan 2026; Accepted: 16 Feb 2026.

Copyright: © 2026 Marcos Abreu Filho. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Regio Marcos Abreu Filho

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.