Edited by: Guangtao Zhai, Shanghai Jiao Tong University, China
Reviewed by: Xiongkuo Min, University of Texas at Austin, United States; Yucheng Zhu, Shanghai Jiao Tong University, China
This article was submitted to Perception Science, a section of the journal Frontiers in Psychology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Perceived quality of experience for speech listening is influenced by cognitive processing and can affect a listener's comprehension, engagement and responsiveness. Quality of Experience (QoE) is a paradigm used within the media technology community to assess media quality by linking quantifiable media parameters to perceived quality. The established QoE framework provides a general definition of QoE, categories of possible quality influencing factors, and an identified QoE formation pathway. These assist researchers to implement experiments and to evaluate perceived quality for any applications. The QoE formation pathways in the current framework do not attempt to capture cognitive effort effects and the standard experimental assessments of QoE minimize the influence from cognitive processes. The impact of cognitive processes and how they can be captured within the QoE framework have not been systematically studied by the QoE research community. This article reviews research from the fields of audiology and cognitive science regarding how cognitive processes influence the quality of listening experience. The cognitive listening mechanism theories are compared with the QoE formation mechanism in terms of the quality contributing factors, experience formation pathways, and measures for experience. The review prompts a proposal to integrate mechanisms from audiology and cognitive science into the existing QoE framework in order to properly account for cognitive load in speech listening. The article concludes with a discussion regarding how an extended framework could facilitate measurement of QoE in broader and more realistic application scenarios where cognitive effort is a material consideration.
Quality of experience (QoE) is a paradigm that assesses media quality by mimicking human judgement. The goal is to understand and quantify how consumers perceive media quality. Instead of using the measurable signal parameters, QoE researchers evaluate the quality of a multimedia event based on reported quality ratings from participants in subjective experimental studies. To void the biases from the interpersonal differences, a mean opinion score (MOS) is used to represent an averaged perceived quality. The subjective ratings from experiments are also used to develop signal-based QoE prediction models (also called objective models). Such models are expected to predict quality judgements for multimedia application. Thus, the QoE evaluation approach has been widely adopted to rapidly test the perceptual effect of new products and services.
Despite the wide applicability of QoE evaluation methods, current QoE evaluations for naturalistic multimedia consumption scenarios, when a person is listening to podcasts while driving for example, are limited. They lack the consideration of a person's comprehension, engagement, effort, and other mental status. The current QoE framework, a conceptual model that characterizes how QoE forms, adopts a simple filtering structure that collapse all the interactions of different influencing factors to a single outcome—people's internal comparison between their expectation of the signal properties and what they actually perceive—which can be observed from the subjective quality judgement. Such framework has been widely adopted and works well for many scenarios. For instance, the telecommunication industry uses it to analyse the quality impact of a change in network capacity or system parameters. However, how the cognitive processes affect the multimedia QoE are not addressed by the framework nor by the evaluation methods.
As the multimedia consumption scenarios become more complex, the cognitive aspects of the experience need to be taken into account. QoE evaluation methods applicable to more natural scenarios are important to understand the impact of potential technological changes. Although cognitive aspects are highly personal and are hard to be modeled, the theories and the empirical studies in cognitive science can provide us with practical tools to systematically evaluate the impacts of the cognitive processes. This paper reviews the existing QoE framework as well as the cognitive listening methods and models from the audiology and cognitive psychology domains. The paper then discusses the potential ways to integrate cognitive effort into the existing QoE framework. While this paper uses listening effort as a focus, this review prompts consideration of broader and more realistic QoE framework for application scenarios where cognitive effort is a factor.
The QoE framework is a conceptual model that describes a QoE formation mechanism for any multimedia consumption scenario. It can be applied as a template to characterize a quality judgement formation for an experience. The QoE framework identifies the QoE formation pathways, the QoE observables, and the QoE influencing factors (see
The QoE framework adapted from the QoE whitepaper (Brunnström et al.,
Building on the QoE formation mechanism,
The two commonly used QoE evaluation approaches, the “descriptive” and the “integrated” (Katz and Nicol,
The cognitive processes are modeled in the QoE framework through the pathways connecting the human influencing factors (orange box in bottom left of
From a multimodal perspective, the existing pathways in the QoE framework are not exhaustive in modeling the effect of different source signals. The combined effect of audio and visual input signals have been shown to produce shifts in attention in various studies (Talsma et al.,
Attentional saliency, comprehension, fatigue level, task performance, and emotional status are important building blocks for understanding QoE in realistic listening scenarios, and these aspects cannot be captured and fully understood by the quality judgement alone via the standard QoE observable adopted by the community. The existing QoE framework lacks an explicit systematic model to guide effective studies exploring the impact of the cognitive processes on QoE. The attentional control can be influenced by the source signals (e.g., multimodal interaction) as well as by the human influencing factor (e.g., mental capacity). This study will focus on the latter and use the uni-modal input signal as an example to show how studies from cognitive hearing and perception theory could provide complementary learning to supplement the existing QoE framework.
To integrate listening effort into the QoE framework model, we consider three questions: (i) what contributes to the increase in the cognitive effort; (ii) how increased effort affects QoE; (iii) how to quantify the effect of effort on QoE. These questions correspond to the three core component in the QoE framework: influencing factors, QoE pathways, and the observables.
This section addresses each question and discuss how each component in the existing QoE framework can be adapted with reference to two cognitive hearing models: the Framework for understanding Effortful Listening (FUEL) (Pichora-Fuller et al.,
Listening effort increases along with the listening demand (McGarrigle et al.,
Sources of listening effort and their corresponding influencing factor categories in the QoE framework and the FUEL.
Voice degradation | System | Transmission |
Bandwidth limit | System | Transmission |
Noise | System | Transmission |
Reverberation | System | Transmission |
Multi-talker | Signal | Source & context |
Spatial separation | Signal | Source & context |
Synthesized voice | Signal | Source |
Sustained speech | Context | Source |
Voice similarity | Signal | Source |
Foreign language | Signal & context | Message & context |
Reward | Human | Motivation |
Hearing loss | Human | Listener |
The formation pathways in a model identify the possible mechanisms through which the influencing factors can follow to impact an outcome. Although the formation pathways are not concrete, they are depicted in the models to guide research protocol designs wishing to evaluate the effect of factors of interest. The implications of increased listening effort are the result of complex combinations of interactions. The existing QoE formation pathways collapse the contributions of influencing factors to an internal comparison, which limits the capacity to capture the wider cognitive effects that make up our listening experience. Cognitive hearing studies (McGarrigle et al.,
It has yet to be shown whether the effect of multiple effort formation pathways can be simplified to a single pathway. Therefore, we show multiple potential effort formation pathways so that systematic investigations into the cognitive impact can be designed. Multiple pathways might result in different experiential implications in addition to the quality judgement, thus additional measurements that capture different aspects of an experience need to be recorded to compare the differences in the perceptual experiences.
The observables are used by researchers to infer the impact of influencing factors. The choice of the observables depends on the outcome of interest and the corresponding formation pathways. For instance, the corresponding observables for the percept (Johnsrude and Rodd,
The most direct observables for listening effort are the self-reported ratings or descriptions. Ratings are more commonly adopted as they are both scalable and easier to process. The NASA-TLX mental effort scale (Hart and Staveland,
Behavioral responses are also used to indicate effort. These include the memory recall, speech comprehension (observed after the task), or attention-related task performance (observed during the task). The Span Test (Conway et al.,
Psychophysiological changes are also used to indicate the effort involved in a listening task. Some physiological observables (e.g., pupil dilation, cardiac responses, skin conductance, and hormonal changes) are the result of sympathetic or parasympathetic responses to stress or effort (de Waard,
Identifying the potential and appropriate observables is critical in order to select the methods that will capture how effort affects different aspects of our experience. Using multiple observables is also recommended to reduce the structural interference in data analysis (Kahneman,
This review introduced the QoE framework model used by the media technology community to assign in designing and selecting the appropriate methods to empirically evaluate quality of experience. We introduced the rationale behind the framework and explained the structural influencing factors, pathways and observables. The limited capability within the framework to capture and quantify how effort interacts with QoE was highlighted. With a focus on listening effort, this paper reviewed multiple listening effort formation pathways from the cognitive science domain to complement the existing QoE formation pathway. A review of literature and methods drawn from the audiology and cognitive science domains, illustrated how the QoE framework could be expanded and QoE experimental methods could be applied to naturalistic listening scenarios where the cognitive process plays a significant part in QoE formation. Pathways and observables beyond self-reported quality ratings were reviewed. We believe the review warrants adding a cognitive dimension to QoE framework. It would allow for more direct comparisons of different subjective experiments. It would encourage the community to design subjective experiments that consider the impact of less explored cognitive processes. Furthermore, subjective experiments guided by such framework should provide new insights into the more nuanced experiential aspects of our multimedia consumption experience.
More generally, the review highlights the flexibility within the framework for extension and the potential to capture a better understanding of audio influence within wider QoE studies, e.g., listening effort impacting video or immersive QoE. This review also presents an opportunity to apply a similar approach beyond listening, identifying new pathways and observables within the QoE framework, for visual, haptic or multimodal interactions.
PS and AH both contributed to writing, development, and editing. Both authors contributed to the article and approved the submitted version.
This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under Grant Number 12/RC/2289_P2.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.