Measuring Mindreading: A Review of Behavioral Approaches to Testing Cognitive and Affective Mental State Attribution in Neurologically Typical Adults

Mindreading refers to the ability to attribute mental states, including thoughts, intentions and emotions, to oneself and others, and is essential for navigating the social world. Empirical mindreading research has predominantly featured children, groups with autism spectrum disorder and clinical samples, and many standard tasks suffer ceiling effects with neurologically typical (NT) adults. We first outline a case for studying mindreading in NT adults and proceed to review tests of emotion perception, cognitive and affective mentalizing, and multidimensional tasks combining these facets. We focus on selected examples of core experimental paradigms including emotion recognition tests, social vignettes, narrative fiction (prose and film) and participative interaction (in real and virtual worlds), highlighting challenges for studies with NT adult cohorts. We conclude that naturalistic, multidimensional approaches may be productively applied alongside traditional tasks to facilitate a more nuanced picture of mindreading in adulthood, and to ensure construct validity whilst remaining sensitive to variation at the upper echelons of the ability.


INTRODUCTION
Mindreading describes the ability to attribute mental states to oneself and others, and is essential for predicting behavior (Nichols and Stich, 2003). It comprises cognitive and affective components dissociable at the neural level (Shamay-Tsoory and Aharon-Peretz, 2007; though see Pessoa, 2008), and can be both explicit (deliberate) and implicit (automatic; Heyes and Frith, 2014), expressed via two-systems (Apperly and Butterfill, 2009) and multi-systems cognitive models (Christensen and Michael, 2016). Mindreading is also referred to as Theory of Mind (ToM; Wimmer and Perner, 1983) and mind perception (Gray et al., 2010). As ToM alludes to an elaborate accumulation of concepts and mind perception minimizes agency, the term mindreading is employed here.
Since Premack and Woodruff (1978) posed the question, "does the chimpanzee have a ToM?", empirical mindreading research has focused on child development, autism spectrum disorder (ASD) and, more recently, clinical groups, whereas studies featuring neurologically typical (NT) adults are less frequent. There are compelling arguments for investigating adults' mindreading: mindreading ability changes across the lifespan Maylor et al., 2002;Duval et al., 2010) and is positively associated with effective interpersonal relationships (Castano, 2012) and prosocial behavior (Paal and Bereczkei, 2007;Johnson, 2012). Studying adults facilitates the construction of theoretical models, which supports an understanding of mindreading development ) and identification of diagnostic markers for ASD, clinical and neurodegenerative disorders (e.g., Poletti et al., 2012;Guastella et al., 2013).
Explicit and implicit mindreading abilities appear dissociable in children, ASD and clinical samples (Onishi and Baillargeon, 2005;Senju et al., 2009), but closely related in NT adults (Kanske et al., 2015), hence this paper will focus on explicit measures. Whereas implicit mindreading is measured indirectly (e.g., via eye-gaze), explicit tasks probe deliberate mental state reasoning. A challenge for researchers lies in establishing behavioral tools sensitive to variation at the upper echelons of mindreading, since NT adults tend to perform at ceiling (at or near 100% accuracy) on standard explicit measures.
Mindreading is a multidimensional construct, which has led to inconsistent definitions across the literature (Schaafsma et al., 2015). We aim to cover the empirical ground by addressing: (1) emotion recognition tests [emotion perception reflects a low-level process in affective mindreading (Mitchell and Phillips, 2015)]; (2) cognitive and affective mentalizing tasks measuring attribution of beliefs, intentions, desires, and emotions, respectively; (3) multidimensional measures combining these facets. Rather than present an exhaustive review of the literature, we focus on selected experimental paradigms illustrative of four core approaches: emotion recognition, social vignettes, narrative fiction, and participative interaction, highlighting challenges for use with NT adult samples.

EMOTION RECOGNITION
The ability to recognize emotions precedes affective mindreading (Mitchell and Phillips, 2015). Emotion recognition tests traditionally require participants to identify basic emotions (happiness, sadness, anger, fear, surprise, contempt, and disgust; Ekman and Friesen, 1971, though see Awasthi and Mandal, 2015) presented in photographs or brief video-clips of posed facial expressions. The emotion perception literature has primarily focused on macroexpressions: full-face unconcealed expressions lasting more than 0.5 s, however, strong agreement of the basic emotions can result in ceiling effects with NT adults. One approach is to speed up presentation so that stimuli represent microexpressions (Ekman and Friesen, 1976), which last up to 0.25 s and are usually fragmentary (appearing on the top or bottom half of the face). Brief presentations can remain on the retina for longer [e.g., Brief Emotion Recognition Test (BART; Ekman and Friesen, 1974)], though this is resolvable by incorporating neutral expressions as forward-backward masks (Matsumoto et al., 2000). Microexpressions are involuntary, tending to signal concealed or altered emotion expressions, so perceiving them likely reflects the advanced capacity to detect deception in real-life interactions (Frank and Svetieva, 2015).
The Reading the Mind in the Eyes Test (Eyes Test; Baron-Cohen et al., 2001a), requires participants to attribute the most appropriate mental state term (e.g., "ashamed, " "nervous, " "suspicious, " and "indecisive") to photographs of the eye-regions of faces. The task probes non-automatic processes (Bull et al., 2008), was designed to detect subtle deficits (Baron-Cohen et al., 1997), and has been applied to a range of domains, including brain studies (Adolphs et al., 2002), dementia (Gregory et al., 2002), and clinical disorders (e.g., Fett et al., 2011). The Eyes Test demonstrates particularly strong predictive power with ASD groups, supporting its validity as a measure of the social cognitive deficits characteristic of ASD: In the original study, performance negatively correlated with Autism Spectrum Quotient scores (Baron-Cohen et al., 2001b), which may be due to the "purity" of the stimuli minimizing the opportunity to depend on alternative (e.g., verbal) cues (cf. Happé, 1995). The Eyes Test is one of few "classic" mindreading tasks sensitive to variation in NT adults, however, it measures emotion recognition rather than mindreading per se. This is an important distinction as emotion recognition and other mindreading dimensions can dissociate (Oakley et al., 2016).
Emotion recognition stimuli processed via a single modality [including facial/body images or auditory voice recordings (e.g., Rutherford et al., 2002)] present a specific problem for research with NT adults, and a general issue of ecological validity. Older adults tend to perform poorly compared to young adults on static emotion recognition tests, whilst outperforming them at recognizing continuous emotions in dyadic interactions (Sze et al., 2012). Dynamic stimuli can be used to circumvent problems faced using static images (Biele and Grabowska, 2006;Halberstadt et al., 2011), although both static and dynamic, visual and prosodic affective stimuli lack contextual information (Achim et al., 2013). Therefore, emotion recognition tasks may be most fruitfully applied in conjunction with mental state reasoning measures to facilitate a more comprehensive approach.

Social Vignettes
Cognitive mentalizing entails setting aside one's own perspective to attribute states to other agents. Both children and adults demonstrate automatic egocentric bias in verbal and visual perspective-taking tasks (e.g., Epley et al., 2004), however, NT adults can partially correct for it (Wang et al., 2014). Beliefattribution, for example, has been shown to be non-automatic in adults (Back and Apperly, 2010). The concept that mindreading ability is indicated by understanding not simply what someone knows, but their mistaken beliefs (Dennett, 1978), led to the development of Wimmer and Perner's (1983) false-belief task (FBT), which depicts belief-states through social vignettes. In the traditional object-transfer paradigm, participants must identify a target agent's mistaken belief about the location of an object, through understanding that the agent lacks knowledge that the object has moved. For example, A wrongly believes that the sweets are in the opaque jar, because they did not witness B move them to the cupboard (first-order); B wrongly believes A will look for the sweets in the jar, unaware that A secretly watched them being moved (second-order).
False-belief tasks have been applied to child development studies (for a meta-analysis, see Wellman et al., 2001), ASD (Baron-Cohen et al., 1985), psychiatric disorders (Frith and Corcoran, 1996), brain damage , stroke (Happé et al., 1999), and Alzheimer's (Le Bouc et al., 2012). As children typically pass first-and second-order FBTs aged 4-5 (Astington and Dack, 2008) and 6-7 (Perner and Wimmer, 1985) respectively, they tend to show ceiling effects with adults. Adaptations for use with NT adults include a version where participants rate the likelihood that protagonist "Sally" will look for an object in various locations (Birch and Bloom, 2007). Participants are privy to the object's location in one condition, and the task is sensitive to the interference of that knowledge ("reality bias"; Mitchell et al., 1996).
False-belief understanding has become synonymous with mindreading, however, the construct validity of FBTs has been called into question (e.g., Bloom and German, 2000). For example, the False-Belief Localizer tool for isolating the neural basis of false-belief representation (Saxe and Kanwisher, 2003;Dodell-Feder et al., 2011), is often referred to as the "ToM Localizer, " yet the neural pattern diverges from meta-analytic accounts of the ToM network (Spunt and Adolphs, 2014). In developmental populations, poor FBT performance may reflect general task demands (Siegal and Beattie, 1991;Sullivan et al., 1994), and some individuals with ASD pass secondorder tasks whilst exhibiting real-life social cognitive difficulties (Happé, 1994), suggesting they may recruit compensatory verbal strategies (Happé, 1995) such as knowledge of complement syntax (Lind and Bowler, 2009). Social animation tasks (e.g., Castelli et al., 2000) circumvent this issue, requiring participants to attribute intentions to animated geometric shapes, though they lack the range of epistemological and emotional information present in ecological stimuli.
The computerized Yoni Test (Shamay-Tsoory and Aharon-Peretz, 2007) requires integration of visual and verbal cues, and generates both behavioral and neuroimaging data. A series of vignettes feature a central character, "Yoni, " depicted by a simple cartoon "smiley, " and four images of a single category (e.g., faces, animals, and transport) alongside sentences containing blanks. Participants indicate by mouse-clicking the appropriate image, what Yoni is close to, thinks about, loves, does not love, or identifies with (first-order), and whose misfortune Yoni gloats over, whose success Yoni envies, and items Yoni thinks about, has or loves, that another character thinks about, has or loves (second-order). The task entails interpretation of proximity, eyegaze and facial expressions, and measures response time and accuracy across cognitive, affective and physical (control) trials.
In the original study, success was higher on affective compared to cognitive trials, a finding replicated by Kalbe et al. (2010), who suggested that additional facial expression cues in the affective condition facilitated decision-making (the scoring system does not separate out the emotion recognition dimension).
Nonetheless, second-order differences between controls and patients with ventromedial frontal lobe damage were observed only in the affective condition, indicating that cognitive and affective neural systems are partially dissociable (Shamay-Tsoory and Aharon-Peretz, 2007). The Yoni Test has shown sensitivity to variation in NT adults where FBTs have proven insufficient (e.g., Kidd and Castano, 2013), however, the simplistic stimuli may enable participants to form basic object-agent associations rather than engage in mindreading (also a criticism of FBTs; Perner and Ruffman, 2005). The Why/How Task (Spunt and Adolphs, 2014) -an alternative approach to linking neuroscientific and behavioral data -prevents the formation of basic associations by asking participants how (physical) and why (mindreading) questions about human behaviors depicted through photographs. Designed for f MRI studies, the Why/How Task also generates reliable behavioral (accuracy and response time) data. Whilst simple social images do not reflect the complexity of real-world mindreading stimuli, they present opportunities to examine the brain basis for behavioral differences between participants.

Narrative Fiction (Prose)
Naturalistic narrative stimuli allow mindreading targets to be contextually embedded (e.g., Frith and Corcoran, 1996;Saxe and Wexler, 2005), which may inhibit non-mindreading strategies (Happé, 1995). Happé's (1994) Strange Stories Task assessed comprehension of short, naturalistic narratives including joke, lie, appearance/reality, and contrary emotions. The range of narratives proved more sensitive to subtle between-group differences than FBTs, paving the way for more complex narrative-based approaches.
Participants in the Short Story Task (SST; Dodell-Feder et al., 2013), read a fictional story about two characters whose romantic relationship breaks down (Hemingway, 2003). It contains firstand second-order mental states and requires synthesis of contextual, verbal and physical information. Semi-structured questions probe explicit and spontaneous mentalizing; explicit items scored from 0 to 2, and a single spontaneous question as a dichotomous yes/no variable. However, as the spontaneous question prompts participants to provide "the character's thoughts, feelings and intentions when it applies to the question" (Dodell-Feder et al., 2013, p. 4) the implicit/explicit distinction is not clear cut. The coding scheme does not distinguish cognitive and affective, or first-and second-order attributions (indicated through low internal consistency; α = 0.54), signaling the need for a scoring system to support a more nuanced picture of mindreading (see Dodell-Feder et al., 2013, for some recommendations).
The original SST demonstrated sensitivity to variation among NT adults (scores ranged from 2 to 14 of 16 points), and concurrent validity with the Eyes Test and Interpersonal Reactivity Index fantasy subscale (IRI; Davis, 1983) supported it as a measure of the mindreading construct. Notably, recent evidence suggests that reading literary fiction can enhance performance on mindreading measures including the Eyes Test (Kidd and Castano, 2013), indicating that processes associated with fiction-engagement may prime the mindreading mechanism. In the original SST, participants completed all mindreading measures after reading, so future studies should vary task order to control for potential priming effects.

MULTIDIMENSIONAL MEASURES Narrative Fiction (Film)
Film stimuli enable researchers to present dynamic interactions (e.g., Golan et al., 2008;Barnes et al., 2009;Bazin et al., 2009), but can lack the range and complexity of ecological mindreading. Using actors to simulate social scenarios offers increased control over context and content variables. The Movie for the Assessment of Social Cognition (MASC; Dziobek et al., 2006), features four characters at a dinner party. A script development process (Field et al., 2001) generated realistic characters (displaying stable traits and transient states) and prominent themes are romance and friendship. Participants answer direct questions about the characters' cognitive and affective mental states, requiring interpretation of vocal, physical and contextual information, alongside classic mindreading concepts such as false-beliefs, metaphor and faux pas.
In the validation study, the MASC converged with three extant mindreading measures: a basic emotion recognition task, the Eyes Test and Strange Stories Task (shortened). However, MASC scores predicted Strange Stories Task performance in participants with Asperger Syndrome, and emotion recognition in controls, which indicated that verbal strategies may have compensated for facial processing difficulties. The authors recommended future studies vary mental state complexity and part of the face focused on (eyes/mouth). Additionally, both groups performed at ceiling on the control questions, so future revisions should incorporate more challenging questions to account for other cognitive processes (Heavey et al., 2000;Dziobek et al., 2006). Notably, the MASC was more sensitive to group differences than the established measures, supported by a recent finding that participants with ASD showed impaired MASC, but not Eyes Test performance, when compared to participants with alexithymia (a condition characterized by impaired emotion recognition that often co-occurs with ASD; Oakley et al., 2016). This suggested that the emotion recognition deficit deemed characteristic of ASD may be due to alexithymia, highlighting the MASC's sensitivity to selective deficits and diagnostic potential.
Versions of the MASC include the original German and dubbed English editions [dubbing did not interfere with participants' task focus (Dziobek et al., 2006) and generally does not impact information processing (Koolstra et al., 2002)]. Generalizability and longevity may be limited, however, due to the contextually specific nature of mental state attributions (interactions may be better understood by similar age-groups to the characters, for example; Griffiths, 1997). This is, to some extent, also true for fictional prose. Moreover, in light of evidence that reading fiction can enhance mindreading, we have suggested that narrative-engagement may prime task performance. While task order was not reported for the MASC, similar effects have been found for television dramas (Black and Barnes, 2015), so future researchers should consider the potentially moderating effects of narrative-engagement processes when employing either fiction approach.

Participative Interaction
Individuals not only observe-they interact with-the social world. Interactive approaches to measuring mindreading include a participative version of the Empathic Accuracy Paradigm (Ickes et al., 1990). Pairs of participants are covertly filmed waiting to participate in an experiment. After debriefing, they individually watch the footage back to identify their thoughts and feelings, and infer the mental states of their partner. Partner inferences are scored for accuracy. The procedure is socially valid, but contingent on individuals accurately articulating their own mental states (Cuff et al., 2014), and limited to the range of states naturally occurring in the context.
The advancement of virtual environment (VE) technology enables the construction of more complex interactive scenarios. The Interactive Real World Task (Spiers and Maguire, 2006) requires participants to retrospectively describe their thoughts during a driving simulation. The original study allowed researchers to observe patterns in f MRI data in relation to participants' mentalizing. The task was designed to elicit spontaneous mindreading, however, direct questions and a coding system containing accuracy and complexity variables could facilitate a temporal view of explicit decoding and reasoning processes, whilst advancing knowledge of the neural network underlying mindreading.
Training is required prior to participation in VEs, however, this could prove a beneficial tradeoff for studying live, interactive mindreading whilst incorporating the range of variables available to fiction approaches. As with fiction tasks, processes associated with interpreting the narrative features of VEs may impact mindreading performance; research has shown that in-game storytelling enhances affective mindreading (e.g., Bormann and Greitemeyer, 2015). While this necessitates additional measures to control for individual differences in narrative-engagement, it also signifies the potential utility of VEs in interpersonal skills training. VE training offers greater ecological validity than previous tools (e.g., false-belief training; Parsons and Mitchell, 2002) and preliminary data from ASD groups indicates that it can improve both emotion recognition and mentalizing abilities (Kandalaft et al., 2013).

CONCLUSION
Extant mindreading research has primarily focused on children, ASD, and clinical populations, and standard measures can suffer ceiling effects with NT adults. However, several classic tests have been adapted for research with adult cohorts. Tasks measuring specific mindreading dimensions such as emotion perception and cognitive mentalizing risk subtracting out key processes, particularly as social displays can be suppressed, and so interpreting mental states may require integration of verbal, physical and contextual information (McDonald et al., 2003). In contrast, complex, naturalistic approaches, including fiction-based and interactive tasks, reflect the multiplicity of ecological mindreading and speak to a multi-systems cognitive architecture (Christensen and Michael, 2016). We suggest, however, that researchers using fictional prose, film and VEs should consider the potentially moderating processes associated with narrative-engagement. As VEs have proven efficacious both in interpersonal skills development and studying the neural basis of mindreading, this may prove a worthwhile tradeoff. Multidimensional approaches are often resource-heavy and necessitate complex scoring systems to avoid compensatory strategies masking selective deficits. Therefore, we suggest the concomitant use of established multidimensional and simpler measures, to assess concurrent validity and probe mindreading variation both within and between participants. In this way, multidimensional stimuli need not problematize construct validity, but could prove fruitful to the development of multisystems approaches, and studies of the neural architecture underlying mindreading in adulthood, which in turn may expand the wider social cognition literature.