Graded Representations of Emotional Expressions in the Left Superior Temporal Sulcus

Perceptual categorization is a fundamental cognitive process that gives meaning to an often graded sensory environment. Previous research has subdivided the visual pathway into posterior regions that processes the physical properties of a stimulus, and frontal regions that process more abstract properties such as category information. The superior temporal sulcus (STS) is known to be involved in face and emotion perception, but the nature of its processing remains unknown. Here, we used targeted fMRI measurements of the STS to investigate whether its representations of facial expressions are categorical or noncategorical. Multivoxel pattern analysis showed that even though subjects were performing a categorization task, the left STS contained graded, noncategorical representations. In the right STS, representations showed evidence for both stimulus-related gradations and a categorical boundary.

either categorical or noncategorical representations of facial expressions. The goal of this study is to test whether STS representations of facial expressions are categorical or noncategorical, while subjects are performing a categorization task.
Instead of observing the responses of individual neurons or individual voxels, we investigate the distributed representations of populations of voxels (Haxby et al., 2001;O'Toole et al., 2005;Norman et al., 2006;Haushofer et al., 2008;Op de Beeck et al., 2008). Thus, in this study, neural representation is operationally defi ned as a spatial map of parameter estimates on each trial. The similarity structure of neural representations is not necessarily the same as the similarity structure obtained from fMRI adaptation. Indeed, in experiments not related to categorization, it has been shown that the distributed representations are more faithful to the physical similarity structure of the stimuli in some brain areas, whereas the adaptation-based representations are more faithful in other areas (Drucker and Aguirre, 2009).
Subjects viewed videos of facial expressions, where each video occupied a point along a continuum between fear and anger. Subjects were required to indicate whether the expression was closer to fear or anger. We enforced the category boundary with feedback in order to minimize potential changes in boundary positions caused by adaptation (Webster et al., 2004).

INTRODUCTION
Perceptual categorization is a fundamental cognitive ability because it underlies the recognition of abstract, behaviorally relevant classes in an often graded sensory environment. In the brain, there is evidence for a frontal-posterior division of labor for visual categorization (Freedman et al., 2001(Freedman et al., , 2003Op de Beeck et al., 2001). Experiments on the inferior temporal cortex (ITC) of nonhuman primates have shown that as the shape of a stimulus moves parametrically away from the a neuron's preferred shape, the neuron's response decreases in a graded manner (Op de Beeck et al., 2001). In contrast, when primates are trained to separate stimuli varying along a morph continuum into categories, there is evidence that neurons in prefrontal cortex (PFC) tend to represent the categories of the stimuli, instead of their position along the continuum (Freedman et al., 2001(Freedman et al., , 2003. Consistent with the evidence from nonhuman primates, fMRI adaptation studies on humans have shown mostly noncategorical tuning for specifi c shapes in lateral occipital cortex (LOC), but tuning for categories in lateral PFC (Jiang et al., 2007). Idealized illustrations of categorical and noncategorical representations are shown in Figure 1.
In this functional magnetic resonance imaging (fMRI) study, we used spatially targeted measurements to study representations of facial expressions in the superior temporal sulcus (STS), an area that it is poorly understood (Hein and Knight, 2008). While the STS is known to respond to facial expressions (Hasselmo et al., 1989;Andrews and Ewbank, 2004;Leslie et al., 2004;Engell and Haxby, 2007;Montgomery and Haxby, 2007;Thielscher and Pessoa, 2007), little is known about how facial expressions are represented. In particular, the STS is thought to have both "high level" and "low level" properties (Binder et al., 1997;Allison et al., 2000;Grossman and Blake, 2002;Noguchi et al., 2005;Calder et al., 2007;Hein and Knight, 2008), and thus could plausibly be hypothesized to have

VIDEO STIMULI
Subjects viewed computer generated videos of emotional expressions. Each video was either anger, fear, or an intermediate expression taken from a morph continuum between anger and fear. All videos started with a neutral expression held for 167 ms, and then moved for 767 ms towards an endpoint expression, which was then held for an additional 767 ms. The full range of morph levels for the endpoint frame was 0%, 10%, 20%, 30%, 40%, 60%, 70%, 80%, 90% and 100% anger. For brevity, the phrase "k% anger" here and henceforth implicitly designates an expression that is also (100-k)% fear. In order to maintain subject engagement in the task, the expressions were posed by fi ve different identities. An example of the different morphing levels for one identity is shown in Figure 2.
Each video frame was created with the FaceGen program 1 and custom VBA code. As stated above, endpoint frames were linear morphs between the 100% anger setting and the 100% fear setting in FaceGen. The remaining frames in each video were linear morphs between the neutral setting and the endpoint. Frames were stitched together in QuickTime Pro and played at 30 frames per second.

fMRI PARADIGM
Subjects were presented with a Type 1, Index 1 sequence (Finney and Outhwaite, 1956), so that each morph level preceded and followed every other morph level a balanced number of times. This type of sequence allows for the effi cient, rapid presentation fMRI designs (Buracas and Boynton, 2002;Aguirre, 2007). A different sequence was used for each subject. Pure emotions (0% anger and 100% anger) appeared three times as often as any other morph level in order to increase the classifi er training set. Each of the fi ve identities posed each morph level a balanced number of times. The sequence of identities was different for each subject. A total of 420 stimulus trials were shown, along with 30 rest trials during which a fi xation cross was shown. Like the stimulus trials, rest trials were included in the Type 1, Index 1 order counterbalancing. Subjects were free to move their eyes around the stimuli. Each trial was preceded by a 3300 ms intertrial interval consisting of a fi xation cross (Figure 3).
To allow subjects to take breaks, the sequence was broken up into fi ve "runs" of 6:50 min each. Each run began with an additional 10 s of fi xation to allow the scanner to stabilize, and ended with another 10 s of fi xation to allow time to measure the BOLD response from the last trial of the run.
Subjects used their right hand to indicate with a button press whether each stimulus was closer to fear or closer to anger. One button corresponded to fear and the other to anger. To enforce the category boundary, a red X was fl ashed after any incorrect response and after any miss. These trials were not included in the main fMRI analysis, and were analyzed separately. Subjects were trained on the categories during the anatomical scan, which preceded the functional scans.

fMRI ACQUISITION
The BOLD signal was used as a measure of neural activation (Ogawa et al., 1990;Kwong et al., 1992). EPI images were acquired with a Siemens 3.0-T Allegra scanner (Siemens, Erlangen, Germany) and FIGURE 2 | Final frames of the movies from all ten different morph levels for one particular identity. From left to right: 0%, 10%, 20%, 30%, 40%, 60%, 70%, 80%, 90%, and 100% anger. Low values indicate the brain is in the "α" state. High values indicate the brain is in the "β" state.
a Nova Medical head transmit coil with receive-only bitemporal array coils (Nova Medical, Inc, Wilmington, MA, USA). The receiveonly array coils were positioned directly on the lateral surface of the head to achieve high signal-to-noise ratio (SNR) from the STS. We acquired 34 interleaved 3.6-mm axial slices with an interslice gap of 0.36 mm (TR = 2000 ms, TE = 20 ms, fl ip angle = 90°, FoV = 192 mm). Voxel dimensions were 3 × 3 × 3.96 mm 3 . At the beginning of each scan session, a high-resolution anatomical image was acquired (T1-MPRAGE, TR = 2500 ms, TE = 4.38 ms, fl ip angle = 8°, FoV = 256 mm).

fMRI PREPROCESSING
Image preprocessing was performed in Analysis of Functional NeuroImages, commonly known as AFNI (Cox, 1996). After discarding the fi rst fi ve EPI images from each run to allow the MR signal to reach steady-state equilibrium, the remaining images were despiked using the AFNI program 3dDespike. Images were then slice time corrected and motion corrected to the third image of the fi rst run using a six-parameter 3-D motion correction algorithm. A 3-mm full width at half maximum smoothing kernel was then applied to the images before conversion to percent signal change from the mean. Neurons in monkey and human STS are sensitive to both facial expression and facial identity (Hasselmo et al., 1989;Winston et al., 2004;Calder and Young, 2005). Thus, analysis of facial expression is made diffi cult because of extra variance due to effects of facial identity. We removed the effects of facial identity by modeling the BOLD time series with a General Linear Model (GLM) consisting of regressors for each of the six identities, each convolved with a gamma function, as well as six regressors for motion parameters. The residual time series was then extracted for further analysis. We also controlled for the effect of task diffi culty by creating a GLM with regressors for very easy (0% and 100% anger), easy (10% and 90% anger), moderate (20% and 80% anger), diffi cult (30% and 70% anger), and very diffi cult (40% and 60%) trials. + + + FIGURE 3 | Example of the sequence subjects observed. Each face was separated by a 3-s intertrial interval, and the sequence was Type 1, Index 1 counterbalanced (see Materials and Methods). Video play buttons are shown here for illustration, and were not seen by the subjects. From left to right: 20%, 100%, and 60% anger from three different identities.
Next, we modeled the residuals with a GLM consisting of a regressor for every stimulus trial, each convolved with a gamma function. Thus, each trial became associated with a map of parameter estimate values (one parameter estimate for each voxel and trial), which could then be submitted to a classifi er. Since we used gamma convolved regressors, the parameter estimate for a particular voxel and a particular trial can be thought of as the weighted sum of fMRI signals shortly after trial onset. We operationally defi ne the neural representation for each trial as this map of parameter estimates.

ANATOMICAL REGIONS OF INTEREST
Anatomical masks were created by bringing the T1-weighted images into a standard space using the @auto_tlrc program to match the TT_N27 template (Holmes et al., 1998). The left STS and right STS were hand drawn on coronal slices (Figure 3) from -64.5 mm to -1.5 mm posterior to the anterior commisure. All anatomical masks were then brought back into original space, multiplied by a separate mask of brain versus non-brain areas, and then applied to the functional data (Figure 4).

CLASSIFIER
Classifi cation was performed on the parameter estimates using the Princeton Multi-Voxel Pattern Analysis Toolbox for Matlab 2 . For both ROIs, a Sparse Multinomial Logistic Regression (Krishnapuram et al., 2005) classifi er was used to measure the neural representations. SMLR is based on logistic regression, but uses a sparsitypromoting Laplacian prior λ to limit the norm of the weight vector and therefore selects only a subset of relevant voxels during training. Based on our observations of classifi er performance in a previous study on the STS, we set λ to 0.01 for this experiment. This reduced the number of used voxels by 6% from the total available in the anatomical ROI. To avoid circularity, all decisions about classifi er parameters were made before training, and all feature weights were computed before testing.
The classifi er was trained to distinguish the neural representations associated with 0% anger from the representations associated with 100% anger. The classifi er was then tested on the intermediate morph levels, and its logit output was used as a measure of how much the neural representation for each morph level was similar to the representation of pure anger relative to pure fear. (High logit values indicate the neural representation is similar to the pure anger representation. Low values indicate it is similar to the pure fear representation.) The logit is the weighted sum of the voxel values, where the weights are the regression coeffi cients. As such, it provides a continuous measure of the classifi er's belief at each trial. It is more appropriate than other measures of a classifi er's belief -such as guess rate -since it is not passed through any nonlinearity, which will bias the results towards categorical representations 3 .

TESTING FOR CATEGORICAL AND NONCATEGORICAL EFFECTS
For each morph level, the classifi er provided a measure of the degree to which the brain state was similar to anger relative to fear. Under the strictest hypotheses, the function relating this measure and morph level could either be either a step function (Figure 1A), implying categorical representations, or a linear function (Figure 1B), implying graded, noncategorical representations. We fi t both of these models to the data using a least squares approach. To compare the models we computed the likelihood ratio, which is a ratio of the probabilities of the data given the two models. The likelihood function for a particular model is the normal distribution with a mean of zero and a variance of β −1 , and T is the transpose operator (Bishop, 2006). The likelihood ratio is the likelihood for the linear function divided by the likelihood for the step function. Since the functions were not nested, we could not perform the likelihood ratio signifi cance test.
Finally, to ensure that our results were not caused by idiosyncratic properties of our SMLR classifi er, we repeated the analysis with a two-layer (no hidden layer) neural network classifi er trained with backpropagation (Rumelhart et al., 1986). The average net input to the anger output unit was used as the measure of neural representation. Higher input values indicated the brain was closer to the pure anger representation. Lower values indicated the brain was closer to the pure fear representation. Unlike output unit activation values, which are passed through a nonlinear activation function, input values provide a measure of neural representations that are unbiased towards categorical representations. In order to minimize the noise effects caused by the random initial weight assignments, we ran the classifi er 20 times and averaged the net input values across those 20 iterations.
Reaction times increased as morph levels became more diffi cult, t(16) = 11.7, p < 0.05, and are reported in Table 1.

fMRI RESULTS
We trained a SMLR classifi er to distinguish the neural representation associated with 0% anger from the neural representation associated with 100% anger. We then tested the classifi er on intermediate morphs. In this analysis we only considered trials on which the subject made the correct response. Compared to chance classifi cation accuracy (0.500, determined by separately shuffl ing the labels of the training data and the labels of the testing data 1000 times per subject), overall classifi cation accuracy of intermediate morphs in the bilateral STS was low but signifi cant (M = 0.527, t(16) = 2.57, p < 0.05). This was the case in both the left STS (M = 0.521, t(16) = 2.48, p < 0.05) and the right STS (M = 0.523, t(16) = 2.56, p < 0.05). Finer grained information contained in the logits provided enough signal to test for categorical versus noncategorical representations. Logits are the weighted sums of the features in logistic regression and, unlike classifi er guess rates, provide a measure of neural representations that is not biased towards the categorical hypothesis (see Materials and Methods).
In the left STS, there was a signifi cant linear fi t relating morph level to neural representation ( Figure 6A; t(16) = 2.67, p < 0.05) as well as a signifi cant step function fi t [t(16) = 2.39….]. However, the fact that the step function fi t was signifi cant does not imply that it is the best model for the data. To compare the linear function to the step function, we computed a likelihood ratio of 303.7, which means that the observed data was 303.7 times more likely under the linear model than under the step model, thus providing strong evidence for the linear model in this ROI. To confi rm that our results were not due to idiosyncratic properties of SMLR, we repeated the analysis using a two-layer (no hidden layer) neural network classifi er instead. During testing, the net input of the output unit was used as a measure of the classifi er's belief, with higher values indicating neural representations closer to anger. Again, a high likelihood ratio of approximately 141.7 provided strong evidence for the linear function ( Figure 7A).
In the right STS the SMLR analysis revealed a signifi cant linear fi t [t(16) = 2.22, p < 0.05] and a marginally signifi cant step function fi t [t(16) = 2.00, p < 0.10]. However, the likelihood ratio computed on the group average data was 0.34, which is weak evidence for the step function ( Figure 6B). In contrast, the neural network analysis showed a weak preference for the linear function, with a likelihood ratio of 2.44 ( Figure 7B). Collectively, the evidence is inconclusive as to whether a step function or linear function best fi ts the data in the right STS. In both cases, the relationship between neural representation and morph level appeared to show stimulus-related gradations as well as a category boundary.

ERROR TRIALS
It is also important to ask whether the representations in the STS track the actual stimulus or the perceived stimulus, as refl ected in task responses. To try to answer this, we measured the neural representations during trials in which subjects made an error. Hardly any errors were made for most morph levels, although somewhat more were made near the category boundary at 40% anger and 60%  anger (see Figure 5). We restricted our analyses to these morph levels but, perhaps in part due to the small number of samples, none of these analyses yielded signifi cant results. Using the results from SMLR we found that in the left STS, logits for error trials at 40% anger and 60% anger were both not signifi cantly different from zero (M = −.49, SD = 2.29, t(16) = −.88, p > 0.05 for 40% anger; M = −0.24, SD = 4.21, t(16) = −0.24, p > 0.05 for 60% anger.) The logits for errors at these two morph levels were not signifi cantly different from each other either (dependent t = −0.19, p > 0.05). Similarly for the right STS, neither of the two boundary morph level logits were signifi cantly different from zero (M = −0.52, SD = 3.8, t(16) = −0.56, p > 0.05 for 40% anger; M = −0.52, SD = 3.14, t(16) = −0.69, p > 0.05 for 60% anger.) Again, the logits for errors at these two morph levels were not signifi cantly different from each other either (dependent t = 0.004, p > 0.05). Therefore, due to the relatively small number of error trials, the data may be underpowered to answer whether the STS tracks the actual stimulus or the perceived stimulus.

UNIVARIATE ANALYSIS
It is also possible to compare linear and step function fi ts using a univariate approach. Unlike multivoxel pattern analysis, this approach determines the effect of stimulus level on each voxel individually. For each subject, we used the general linear model to determine the effect of each stimulus level on every voxel. Then, after spatially registering each map of parameter estimates to Talairach space, we tested for linear and step function fi ts to each voxel's set of parameter estimates. At a voxelwise cutoff of p < 0.001, no clusters were large enough to pass correction for multiple comparisons for either the linear fi t or step function fi t. (The largest cluster of address this issue. Finally, it is possible that some types of stimuli with natural categories, such as bird species, are represented categorically in the STS, while other types of stimuli that are often found along a continuum in nature, such as facial expressions, are represented less categorically (Susskind et al., 2007). In this study, subjects were not required to fi xate while viewing the stimuli. While this provides a more ecologically valid setting to study facial expression perception, it opens up the possibility that differential responses to different stimuli were driven in part by the location at which subjects fi xated. However, this concern is tempered to some extent by the fact that visual scanning of fearful faces is very similar to visual scanning of angry faces (Wong et al., 2005). Another potential explanation for our fi ndings is that differences in arousal levels between fear and anger could have driven the differential responses in the STS. In general, fear and anger are associated with similarly high levels of arousal (Russell, 1980), and so any differences in arousal levels in this experiment are likely to be small. Nevertheless, it is possible, in principle, that this explanation could still explain the data. If so, our main fi nding of noncategorical representations would remain valid, but the interpretation would shift from one about visual representations to one about arousal levels.

MOTION OR BIOLOGICAL MOTION?
Our study showed mostly linear functions between morph level and distributed representations in the STS. In principle, the population responses could refl ect distinctions among facial expressions as well as lower-level aspects of visual motion. Previous fMRI studies have shown STS sensitivity both to motion coherence (Braddick et al., 2000), and to static facial expressions as compared to neutral expressions (Engell and Haxby, 2007). The population responses for variations of facial expression may show gradations that refl ect subtle variations within coarsely-defi ned categories, such as anger or fear.

CATEGORICAL REPRESENTATIONS
If the STS contains mostly noncategorical representations of facial expressions, the question arises of where they are represented categorically, if anywhere. Our targeted surface coil measurements prevented us from obtaining good SNR in brain areas beyond the STS, and so the investigation of other brain areas remains a topic for future research. However, if the results of previous studies on other types of stimuli can generalize to emotional expressions, then it is likely that categorical representations will be found in the PFC (Freedman et al., 2001(Freedman et al., , 2003Jiang et al., 2007). At least in the macaque, there are known projections from the STS to frontal cortex (Luppino et al., 2001).

ACKNOWLEDGMENTS
We thank Greg Detre, Joe McGuire, Per Sederberg, Sam Gershman, and Mike Todd for helpful discussions. This work was supported by National Science Foundation Grants BCS-0446846 and BCS-0823749. 6 voxels was lower than the 16 required for a corrected alpha of 0.05, as determined by Monte Carlo simulations of null hypothesis data with the AFNI program AlphaSim.) Thus, differential responses to stimulus level was best detected with the multivariate approach, and not the univariate approach.

DISCUSSION
In this study we showed that the multivoxel representations of facial expressions in the left STS are graded and noncategorical. In the right STS, we found evidence for both stimulus-related gradations and a category boundary. We obtained these results in an experiment in which subjects viewed videos of facial expressions, ranging on a continuum from pure fear to pure anger. Subjects were asked to categorize each expression as either fear or anger. A classifi er was fi rst trained to distinguish the fMRI patterns of activation for pure fear from the patterns for pure anger, and was then tested on the intermediate morphs. In the left STS, both a SMLR classifi er and a neural network classifi er showed a linear relationship between morph level and neural representation. In the right STS, the fi ts were generally more intermediate between a linear function and step function, suggesting both stimulus-related gradations and a category boundary. To the extent that there are differences in laterality, they are similar to recent work on the fusiform gyrus, which shows categorical representations for faces in the right, but not left hemisphere (Meng et al., 2008). It is unlikely that the emergence of some category information in the right STS was driven by motor responses, since all subjects used their right hand (and therefore their left motor cortex) to control the button box.
The fi nding that representations in the left STS are noncategorical is similar to fi ndings from research on other posteriors areas (Freedman et al., 2003;Jiang et al., 2007) including a multivariate pattern analysis study on moving dots stimuli (Li et al., 2007). This is also consistent with multivariate work on nonhuman primates showing mostly graded responses in IT cortex to high level stimuli (Meyers et al., 2008). Here, for the fi rst time, we have demonstrated this multivariate effect in humans using high level stimuli.
One recent paper using fMRI adaptation and bird stimuli has found clusters in the left STS that appear to show stronger categorical representations than we have shown here (van der Linden et al., 2009). The differences between the results in that study and present study could be due to a number of causes, which are not mutually exclusive. First, the categorical clusters reported in that study were relatively small, with a total volume of 2262 mm 3 , which is about 7% of the volume of our average anatomical ROI. It is possible that an analysis of that data in the entire left STS would show representations that are less categorical, as we have found here. Indeed, different regions of the STS may include more categorical or less categorical representations than others. Second, subjects in that study received longer training periods than in ours. Perhaps the neural representations in our study would become increasingly categorical with longer training. Future studies could attempt to