Your research can change the world
More on impact ›

ORIGINAL RESEARCH article

Front. Commun. | doi: 10.3389/fcomm.2021.675704

Prosodic differences in human- and Alexa-directed speech, but similar local intelligibility adjustments Provisionally accepted The final, formatted version of the article will be published soon. Notify me

  • 1University of California, Davis, United States
  • 2UC Davis Phonetics Lab, United States

The current study tests whether individuals (n=53) produce distinct speech adaptations during pre-scripted spoken interactions with a voice-AI assistant (Amazon’s Alexa) relative to those with a human interlocutor. Interactions crossed intelligibility pressures (staged word misrecognitions) and emotionality (hyper-expressive interjections) as conversation-internal factors that might influence participants’ intelligibility adjustments in Alexa- and human-directed speech (DS). Overall, we find speech style differences: Alexa-DS has a decreased speech rate, higher mean f0, and greater f0 variation than human-DS. In speech produced toward both interlocutors, adjustments in response to misrecognition were similar: participants produced more distinct vowel backing (enhancing the contrast between the target word and misrecognition) in target words, and louder, slower, and higher mean f0, and higher f0 variation at the sentence-level. No differences were observed in human- and Alexa-DS following displays of emotional expressiveness by the interlocutors. Expressiveness, furthermore, did not mediate intelligibility adjustments in response to a misrecognition. Taken together, these findings support proposals that speakers presume voice-AI has a ‘communicative barrier’ (relative to human interlocutors), but that speakers adapt to conversational-internal factors of intelligibility similarly in human- and Alexa-DS. This work contributes to our understanding of human-computer interaction, as well as theories of speech style adaptation.

Keywords: voice-activated artificially intelligent (voice-AI) assistant, Speech register, Human-computer interaction (HCI), computer personification, Speech Intelligibility

Received: 03 Mar 2021; Accepted: 21 Jun 2021.

Copyright: © 2021 Cohn and Zellou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dr. Michelle Cohn, University of California, Davis, Davis, United States, mdcohn@ucdavis.edu