Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Comput. Sci.

Sec. Human-Media Interaction

Volume 7 - 2025 | doi: 10.3389/fcomp.2025.1575296

This article is part of the Research TopicEmbodied Perspectives on Sound and Music AIView all 11 articles

A Model of Vocal Persona: Context, Perception, Production

Provisionally accepted
  • 1Amplifier Health, Calgary, Canada
  • 2Center for Computer Research in Music and Acoustics, School of Humanities and Sciences, Stanford University, Stanford, California, United States

The final, formatted version of the article will be published soon.

We present a contextualized production–perception model of vocal persona developed through deductive thematic analysis of interviews with voice and performance experts. Our findings reveal that vocal persona is a dynamic, context-responsive set of vocal behaviors that frames and bounds expressive interactions—both biological and synthesized—while centering the speaker's agency. By examining how experts adapt their vocal output through both broad persona shifts and fine-grained paralinguistic adjustments, our model identifies a key missing mechanism in current approaches to expressive speech synthesis: the integration of high-level persona prompting with detailed paralinguistic control. This work bridges an important gap in the literature on expressive and interactive speech technologies and offers practical insights for improving voice user interfaces and augmentative and alternative communication systems. Incorporating this vocal persona framework into expressive speech synthesis holds the potential to enhance user agency and embodiment during communication, fostering a heightened sense of authenticity and a more intuitive relationship with voice interaction technology and one's environment.

Keywords: vocal persona, social communication, Expression, Paralinguistics, Synthesized voice, Augmentative and AlternativeCommunication (AAC), Voice User Interface (VUI)

Received: 12 Feb 2025; Accepted: 05 Sep 2025.

Copyright: © 2025 Noufi, May and Berger. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Camille Noufi, Amplifier Health, Calgary, Canada

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.