METHODS article

Front. Digit. Health

Sec. Human Factors and Digital Health

Volume 7 - 2025 | doi: 10.3389/fdgth.2025.1460236

Think FAST: A Novel Framework to Evaluate Fidelity, Accuracy, Safety, and Tone in Conversational AI Health Coach Dialogues

Provisionally accepted
Martha  NearyMartha Neary*Emily  FultonEmily FultonVictoria  RogersVictoria RogersJulia  WilsonJulia WilsonZoe  GriffithsZoe GriffithsRam  ChuttaniRam ChuttaniDr. Paul  SacherDr. Paul Sacher
  • Allurion Technologies, Natick, United States

The final, formatted version of the article will be published soon.

Developments in Machine Learning based Conversational and Generative Artificial Intelligence (GenAI) have created opportunities for sophisticated Conversational Agents to augment elements of healthcare. While not a replacement for professional care, AI offers opportunities for scalability, cost effectiveness, and automation of many aspects of patient care. However, to realize these opportunities and deliver AI-enabled support safely, interactions between patients and AI must be continuously monitored and evaluated against an agreed upon set of performance criteria. This paper presents one such set of criteria which was developed to evaluate interactions with an AI Health Coach designed to support patients receiving obesity treatment and deployed with an active patient user base. The evaluation framework evolved through an iterative process of development, testing, refining, training, reviewing and supervision. The framework evaluates at both individual message and overall conversation level, rating interactions as Acceptable or Unacceptable in four domains: Fidelity, Accuracy, Safety, and Tone (FAST), with a series of questions to be considered with respect to each domain. Processes to ensure consistent evaluation quality were established and additional patient safety procedures were defined for escalations to healthcare providers based on clinical risk. The framework can be implemented by trained evaluators and offers a method by which healthcare settings deploying AI to support patients can review quality and safety, thus ensuring safe adoption.

Keywords: GenAI, AI, Health coach, Weight Loss Coach, AI evaluation, Conversational Generative AI Evaluation, Large language models, machine learning

Received: 05 Jul 2024; Accepted: 30 Apr 2025.

Copyright: © 2025 Neary, Fulton, Rogers, Wilson, Griffiths, Chuttani and Sacher. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Martha Neary, Allurion Technologies, Natick, United States

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.