Evidence from paranoid schizophrenia for more than one component of theory of mind

We previously reported finding that performance was impaired on four out of five theory of mind (ToM) tests in a group of 21 individuals diagnosed with paranoid schizophrenia (pScz), relative to a non-clinical group of 29 individuals (Scherzer et al., 2012). Only the Reading the Mind in the Eyes Test did not distinguish between groups. A principal components analysis revealed that the results on the ToM battery could be explained by one general ToM factor with the possibility of a latent second factor. As well, the tests were not equally sensitive to the pathology. There was also overmentalization in some ToM tests and under-mentalisation in others. These results led us to postulate that there is more than one component to ToM. We hypothesized that correlations between the different EF measures and ToM tests would differ sufficiently within and between groups to support this hypothesis. We considered the relationship between the performance on eight EF tests and five ToM tests in the same diagnosed and non-clinical individuals as in the first study. The ToM tests shared few EF correlates and each had its own best EF predictor. These findings support the hypothesis of multiple ToM components.


Theory of Mind
Theory of mind (ToM) is but one component of social cognition Green et al. (2008). ToM is defined as the ability to attribute, correctly or incorrectly, beliefs, knowledge, feelings or intentions to others, in order to understand and predict their behavior (Perner, 1991;Perner and Lang, 2000;Green et al., 2008). The discovery of this ability coupled with the large inventory of tests used as a measure, allowed researchers to make strides in the understanding of atypical development and specifically in schizophrenia (Scz) (Frith, 1992;Frith and Corcoran, 1996;Langdon and Coltheart, 1999;Champagne et al., 2005;Champagne-Lavau et al., 2006Uhlhaas et al., 2006;Martino et al., 2007).

Theory of Mind and Schizophrenia
Research in social cognition in Scz has revealed reliable and large impairments in understanding first and second order false beliefs (Bora et al., 2009a;Bozikas et al., 2011), understanding indirect messages (Corcoran et al., 1995;Greig et al., 2004), inferring affect based on photos of the area around the eyes ( Baron-Cohen et al., 2001) identifying irony and faux pas (Shamay-Tsoory et al., 2005;Chung et al., 2014), and making inferences concerning real time social interactions (Bazin et al., 2009;Ouellet et al., 2010;Montag et al., 2011).

Executive Functions and ToM in Schizophrenia
Executive functions are considered to be a critical cognitive mediator for ToM (Perner and Lang, 2000). The link between executive deficits and ToM impairment may be explained by difficulty in inhibiting one's own perspective and distinguishing it from others (Ruby and Decety, 2003). A difficulty in making non-literal interpretations may be due to a difficulty in inhibiting a usual interpretation (Leslie et al., 2004), a lack of flexibility that is reflected in difficulties judging the relative importance of each aspect of a script and attributing the appropriate importance to the pertinent information (Channon and Crawford, 2000).
Significant correlations have been found between a large variety of EF tests and ToM in patients with Scz (see for example Langdon et al., 2001;Bell and Mishara, 2006;Bora et al., 2006a; but see Lysaker et al., 2008 for a contrary opinion and results). Pickup (2008) analyzed 17 studies, eight of which reported a significant correlation between ToM and EF. Although there is a link between the two, he found that the patients were impaired on tests of ToM compared with control subjects, even when EF were factored out. On the other hand, he found that EF shared 65% of the variance on ToM tests in the clinical groups while there was no significant correlation in the non-clinical group (Pickup, 2008). Bora et al. (2006a) used a battery to probe the relationship between insight into illness, ToM (RMET, first and second order false belief stories) and EF (WCST, Digit Span, verbal fluency, letter-number sequencing). RMET correlated with Digit Span backward and letter-number sequencing, but not WCST. Second order false beliefs correlated with letter-number sequencing, WCST perseveration and categories, but not with Digit Span backward. These results could lead one to consider that there are likely different ToM components as different ToM tests load differentially on EF tests. These discrepancies in results and divergent approaches in the analyses point to inconsistencies in the literature and raise questions about the structure or content of what is being measured. Montag et al. (2011) identified one ToM component the cognitive/emotional content (see also Shamay-Tsoory et al., 2007). Hynes et al. (2006) identified a differential activation pattern depending on whether the social perspective-taking task was emotional or cognitive, with activation of the medial orbitofrontal lobe distinguishing between the two conditions. Shamay-Tsoory et al. (2006) found that patients with ventromedial prefrontal lesions performed better in the cognitive than in the emotional condition. The differentiation between the two components was further confirmed using ToM tests (Shamay-Tsoory et al., 2006; See Abu-Akel and Shamay-Tsoory, 2011 for a résumé of the neural circuitry of the components that they identify that include cognitive and emotional components). Salvatore et al. (2008, p. 193, paragraph 1) present an argument, based on data from different populations, in support of a multi-component ToM. However, the clinical evidence extracted from an interview with two patients diagnosed with schizophrenia leaves the debate unresolved.

Objectives and Hypotheses
The present study is an attempt to reexamine the link between ToM (faux pas, lies, indirect messages, inferring facial expressions of emotions etc) and EF (cognitive flexibility, deductive reasoning, etc), in a group of patients diagnosed with paranoid schizophrenia (pScz) to determine if different EF measures, are equally good predictors of performance on a battery of ToM tests and if the ToM tests share the same relationship with the EF measures. We predict that although the clinical group will be impaired relative to a non-clinical group on the ToM measures, with the exception of the RMET (Scherzer et al., 2012), that performance on some ToM measures better distinguish between the two groups than others.
We further predict that performance on all the EF tests will be impaired in the pScz group relative to the non-clinical group but the pScz group will perform better on some EF measures than others. Finally, we predict that correlations between the different EF measures and ToM tests, will differ sufficiently within and between groups to support the contention that there is more than one component to ToM.

MATERIALS AND METHODS
The ethics committee of the Département de psychologie, Université du Québec à Montréal and the ethics and scientific committee (Comité d'éthique de la recherché) of the Hôpital Louis H. Lafontaine (recently renamed Institut Universitaire en Santé Mentale de Montréal) approved the protocol. Informed written consent was obtained from each subject prior to study entry.

Participants
Twenty-one patients diagnosed with pScz and a group of 29 nonclinical individuals were recruited for the study (more details in Scherzer et al., 2012). The subjects were all males, between 18 and 35 years old, either native French speakers or having received all of their schooling in French. Their IQ (VIQ and PIQ, FSIQ) was ≥85. Individuals with Axis I and/or II comorbidity, neurologic problems, head trauma, alcoholism, substance abuse, or dependence, non-corrected visual deficits as determined by the medical records, were excluded from the study. As well, the attending psychiatrists verified that the participants were not under the influence of any recreational drugs or alcohol prior to experimentation.
The diagnosis of pScz was made by the attending psychiatrists and confirmed by ES according to DSM-IV (American Psychiatric Association [APA], 1994) diagnostic criteria. The clinical group was recruited from the outpatient clinic for young adults with psychosis of the Institut Universitaire en Santé Mentale de Montréal. Their medication had to be stable 2 weeks prior to data collection.
The non-clinical group was recruited from the community via posters, word-of-mouth and from talking to groups of individuals. Their socio-demographic profile (age, education, parental education) was comparable to that of the clinical group and their immediate family history (parents, siblings) had to be free of schizophrenia and other psychosis related disorders. They had to be free of Axis I and II comorbidity, neurological problems, head trauma, alcoholism, substance abuse, or dependence, non-corrected visual deficits. This information was obtained during an extensive telephone interview with the potential candidates.

Measures
Clinical Evaluation (Kay et al., 1987) The clinical group was evaluated using the Positive and Negative Syndrome Scale (PANSS) during a semi-structured interview. A second psychiatrist independently validated the ratings on the PANSS. The rated >4 on one or more of the following characteristic symptoms on the PANSS: grandiose ideas, delusions, or hallucinations, and persecutory ideation.

WAIS-III (Wechsler, 1997)
An abridged French version of the WAIS-III (Pilgrim et al., 1999) was used to evaluate the intelligence of the participants in order to control for any potential contribution of this variable to the ToM measures. An estimate of VIQ was obtained using the following subtests: Information, Similarities, Digit Span, and Arithmetic. An estimate of PIQ was obtained using the following subtests: Picture Completion, Block Design Substitution.

Theory of mind tasks
Theory of mind was measured using five different tests: reading the mind in the eyes test (RMET; Baron-Cohen et al., 2001), Hinting Task (Corcoran et al., 1995;Marjoram et al., 2005), Strange Stories (Happé et al., 1998), and Faux pas (Stone et al., 1998) and Conversations and Insinuations (C and I; Ouellet et al., 2010).
Reading the mind in the eyes test (Baron-Cohen et al., 2001). The RMET is a first order false belief test of recognition of mental states and emotions (Baron-Cohen et al., 2001;Craig et al., 2004). It consists of 36 images of the facial area around the eyes, each image illustrating a different mental state. The test has been found to distinguish between patients with schizophrenia and non-clinical participants (Bora et al., 2008;Kettle et al., 2008). The French version of the multiple choices was taken from the web site www.autismresearchcenter. com.
Hinting task (Corcoran et al., 1995;Marjoram et al., 2005). The Hinting Task is a verbal measure of first and second order false beliefs (Bora et al., 2009a,b). It tests the ability to infer the real intentions behind indirect messages of the speaker (Corcoran et al., 1995;Bliksted et al., 2014). There are two versions of this test, each having 10 stories of social interactions in which one person sends an indirect message to another. These two versions were found to distinguish between patients with schizophrenia (Uhlhaas et al., 2006) with a high level of social functioning and those with a low level of social functioning (Bora et al., 2006b), between patients with schizophrenia and paranoid symptoms and patients with negative symptoms (Bora et al., 2008). A combined score derived from both versions were used in this study.   Strange stories (Happé, 1994;Happé et al., 1998). This test consists of eight stories requiring an inference concerning the mental state of a protagonist (Happé et al., 1998) and eight control stories requiring a physical inference. This test was found to distinguish between subjects at high risk for schizophrenia and non-clinical subjects (Chung et al., 2008), as well as between patients with schizophrenia and non-clinical subjects Ward, 2009, 2010).
Faux Pas (Stone et al., 1998;Baron-Cohen et al., 1999). This test consists of 10 stories describing the interaction between two people, one of whom unknowingly makes a comment that is insulting or hurtful, about the other person. It has been found to distinguish between schizophrenic patients with and without a history of violence (Abu-Akel and Abushua'leh, 2004) as well as between schizophrenic patients with negative symptoms and non-clinical control participants (Martino et al., 2007). Ouellet et al., 2010). This test is composed of four self-contained clips of approximately 2 min duration each, taken from popular French TV programs (see Ouellet et al., 2010 for details). The subject is required to make inferences in order to understand the social interactions, indirect messages, faux pas, white lies, and sarcasm used in the conversation. Each scene is independent of the others and does not require any further information in order to understand the content, nor having viewed previous episodes of the program. See Ouellet et al. (2010) for a more complete description of C and I.

Procedure
The entire battery required two sessions of approximately 2 h each to administer. The sequence of tests was counterbalanced between subjects. At the end of the testing, each participant received eight dollars in compensation. Participants in the clinical group were tested in the hospital whereas those in the nonclinical group were tested in a university laboratory.

Statistical Analyses
Analyses of covariance (ANCOVA) adjusted for FSIQ and effect size (η 2 p ) were used to compare the performances between groups on each ToM test. Student t-tests or analyses of covariance (ANCOVA) adjusted for IQ and education where appropriate, were used to compare the performance of the two groups on the measures derived from the tests of executive functions. Partial Pearson correlations within groups were used to examine the shared variance between EF and ToM measures, after controlling for FSIQ if pertinent.

Statistical Analyses
All the transformations used are noted in detail in the subtext of the relevant tables, where appropriate.

Between Group Comparisons of Socio-demographic Characteristics
Student t-tests revealed significant group differences for age, education and IQ measures ( Table 1).

Comparison between Groups on ToM Tests
Given the group differences in background variables that could influence the ToM or EF variables, the group differences on these variables were tested first by including education and FSIQ as covariables. Only covariables declared significant predictors of the dependent ToM variable at p < 0.05 were kept and the homogeneity of regression slopes was verified. If homogeneity was rejected (i.e., covariable × dependent variable interaction significant at p < 0.05), the covariable was excluded and the situation flagged in the tabled report. The standard deviations reported are the original ones, not reduced by the retained covariables, if any.
Two ToM variables were significantly negatively skewed in controls because of ceiling effects. The skewness index was at z = −1.486/0.434 = −3.42 for Strange Stories, with 18 of the 29 controls at ceiling. For Faux Pas, skewness was at z = −2.958/0.434 = −6.82, with 11 at ceiling. For patients, skewness was, respectively, at z = −0.713/0.501 = −1.42 and z = −0.795/0.501 = −1.59. Without the participants performing at ceiling, the skewness for Strange Stories reduced to z = −0.294/0.661, but for Faux Pas it reduced only to z = −2.438/0.536 = −4.55. The latter value, in association with the negative skewness in the patients, warranted a scale transformation for this variable. The transformation L Faux Pas = 2-LG10(60.3-Faux Pas) brought skewness to z = 0.151/0.501 for the patients and to z = −0.282/0.434 all 29 controls and z = 0-698/0.536 for the 18 controls not at ceiling. The transformed version is used in the statistical analyses, and its group means are reported back transformed in the original scale.
The scores on the ToM tests were scaled for a maximum of 100 to allow for comparisons between tests prior to analyses. The effect size (rp 2 ) was also calculated in order to identify those tests that best distinguished between the two groups. Hinting Task appear to be the most sensitive followed by C and I, Strange Stores, Faux Pas, and RMET, in that order. Only RMET did not distinguish between the groups ( Table 2) and was not considered for further analysis.

Group Comparisons of Executive Measures
Student t-tests or ANCOVA controlling for IQ and education, were used to compare the performance of the two groups on the 24 measures derived from the eight EF tests. Only the significant results are presented in Table 3.
Performance of the clinical group was impaired compared with the non-clinical group, on 8 out of 24 measures derived from the eight EF tests. The effect size was largest for TOL planning time (tested by t for unequal variances, effect size using pooled variance d = 1.01 and using control variance d = 1.13) and TOL number of moves (η 2 p = 43.4%) followed by Hayling inhibition errors (η 2 p = 38.8%) and by Zoo Map execution time (η 2 p = 24.3%).

Relation between Performance on EF and ToM Measures
Correlations between ToM variables and EF variables were examined with Education and FSIQ as covariables where appropriate, first in patients and then in controls. Transformed versions were used, when indicated (see Tables 2 and 3). Only EF variables that have at least one significant correlation with a ToM variable are reported in Table 4. Table 5 presents the percentage of variance (rp 2 ) of each ToM measure that is explained by the scores on the respective EF tests.
There is no overlap between the best EF and ToM measures within groups. The shared variance differs between tests and groups, ranging from 29.6 to 44.6% in the clinical group and from 15.8 to 24.9% in the non-clinical group, which leaves 75.1 to 84.2% of the variance unaccounted for.

DISCUSSION
We first predicted that some ToM measures would be more sensitive and better discriminate between the pScz and nonclinical group and this hypothesis was confirmed. Hinting Task is the most sensitive [η 2 p (%) = 64.6] while RMET does not distinguish between the two groups. Hinting Task measures indirect speech acts, requiring distinguishing between explicit, literal, unambiguous content and the intended, implicit, ambiguous content (Lukas, 2011;Bliksted et al., 2014). It requires a sharing of information, some knowledge of the conventions of conversation, the ability to infer the non-literal primary directive component of speech, i.e., the ability to identify and decode the attempt by the speaker to get the listener to do something (Searle, 1975;Hagoort and Indefrey, 2014). The complexity of the task or any of these requirements may explain the sensitivity of this measure. The other tests, C and I, Strange Stories, and Faux Pas are progressively less sensitive in distinguishing between the pScz and non-clinical groups.
In contrast to our prediction but in agreement with Chung et al. (2008), few of the measures used in the present study were sensitive to the pathology even after controlling for IQ. Of the 24 measures derived from the eight EF tests, only eight are significant. The most sensitive of these measures was Tower of London planning time. The clinical group took less time to plan their moves and as a consequence made more moves than the non-clinical group before finding a solution to the problem, although they made a comparable number of errors. However, when they took as much time as the non-clinical group to plan their moves on another task (Zoo Map) [η 2 p (%) = 0.6], it took them significantly longer to find the solution [η 2 p (%) = 24.3] and they made significantly more errors than the non-clinical group [η 2 p (%) = 10.7]. Finally, we predicted that correlations between the different EF measures and ToM tests, would differ sufficiently within and between groups to support the contention that there is more than one component to ToM. As predicted each ToM test had its own best EF correlate. These findings as well as the differences between best EF predictor of ToM in the pScz and non-clinical groups would tend to support the contention that there is more than one ToM component.
A secondary goal of the study of ToM in clinical populations should be to help elucidate the processes involved in the pathology. To this end, and based on the content of the tests, we can derive the following composite image. Patients with paranoid schizophrenia have problems with on-line planning and anticipation of the consequences of their actions (TOL errors) and it is harder for them to switch from one mode of responding to another (Stroop Condition 4 -flexibility time; see also Ibáñez et al., 2014). They especially take more time to plan when confronted with a complex task that requires a lot of thought and planning before initiating any action (Tower of London planning time). They also are more likely to abandon any effort to deduce a rule when it changes without notice (Brixton).
At a social cognitive level, they have difficulties (1) correctly interpreting another's state of mind in order to best be able to explain what one might consider unusual behavior in the context; (2) correctly perceiving and interpreting indirect messages; (3) detecting and understanding an inadvertent, inappropriate comment and the effect that this comment could provoke. These difficulties overlap well with the list of ToM abilities described by Lysaker et al. (2008).
If one is inclined to agree that there is more than one component to ToM then what is needed is a model of these components rather than a list. Such a model should be based on the developmental trajectory of the components, the dissociation of neural pathways, and the link between components at different stages of development (Tager-Flusberg and Sullivan, 2000; e.g., are first order beliefs a prerequisite to the development of second order beliefs?). One such model could be as follows: first→second order beliefs ↔(NB -indicating the possibility of an overlap. See for example Weimer et al., 2012) emotional beliefs, → order beliefs of intention (see for example Baron-Cohen et al., 1999;Brüne et al., 2007;Martino et al., 2007;Zalla et al., 2009).

LIMITATIONS
The results of this study provide evidence for a multicomponent model of ToM and a message to researchers for the need to identify what these components might be and how they may be affected in various clinical populations. However, there is an important need for replication studies given the relatively small sample and the fact that the results of this study, taken at face value, could be interpreted as being attributable to chance (24 comparisons between two groups, eight significant results - Table 3). However, it should be noted that seven of the eight measures were derived from just three out of the eight tests: Wisconsin Card Sorting Test (3), Zoo Map (2), Tower of London (2). If the distribution was random one would expect the results from more EF tests to be significant. Also, chance remains a viable alternative explanation for most of the correlations in Table 4 (four ToM measures, 24 EF measures and two groups: 11 significant results, 4 of which were at the 0.01 level). These results remain to be confirmed by others.
Finally, the differences between the results on the EF measures and the correlations between the measures and ToM measures may be attributable to the limited psychometric qualities of these tests (see Green et al., 2008) and the fact that we did not control for anxiety (Lysaker et al., 2010;Achim et al., 2011Achim et al., , 2013. There is also a lack of important information concerning the psychometric qualities of each ToM test (Green et al., 2008) although concomitant validity in terms of predicting group membership appears to be acceptable, as does test-retest reliability. Finally, the number of EF and ToM measures used in this study while numerous, do not account for all tests and measures used in this domain. It is quite possible that the inclusion of other tests would modify the findings. More participants from this clinical group need to be tested on this and similar batteries in order to determine the reliability and validity of the results and the applicability of the clinical description to the population.