“It's Not What You Say, But How You Say it”: A Reciprocal Temporo-frontal Network for Affective Prosody

Humans communicate emotion vocally by modulating acoustic cues such as pitch, intensity and voice quality. Research has documented how the relative presence or absence of such cues alters the likelihood of perceiving an emotion, but the neural underpinnings of acoustic cue-dependent emotion perception remain obscure. Using functional magnetic resonance imaging in 20 subjects we examined a reciprocal circuit consisting of superior temporal cortex, amygdala and inferior frontal gyrus that may underlie affective prosodic comprehension. Results showed that increased saliency of emotion-specific acoustic cues was associated with increased activation in superior temporal cortex [planum temporale (PT), posterior superior temporal gyrus (pSTG), and posterior superior middle gyrus (pMTG)] and amygdala, whereas decreased saliency of acoustic cues was associated with increased inferior frontal activity and temporo-frontal connectivity. These results suggest that sensory-integrative processing is facilitated when the acoustic signal is rich in affective information, yielding increased activation in temporal cortex and amygdala. Conversely, when the acoustic signal is ambiguous, greater evaluative processes are recruited, increasing activation in inferior frontal gyrus (IFG) and IFG STG connectivity. Auditory regions may thus integrate acoustic information with amygdala input to form emotion-specific representations, which are evaluated within inferior frontal regions.

INTRODUCTION each emotion as a function of this cue level change across items. We hypothesized that variation in cue salience level would be refl ected in activation levels within a reciprocal temporo-frontal neural circuit as proposed by Schirmer and Kotz (2006) and others (Ethofer et al., 2006). F0 SD as a proxy for cue salience in fear and happiness allowed further differentiation: Saliency-related performance increases are expected to positively correlate with pitch variability (F0 SD ) for happy stimuli, and negatively correlate with F0 SD for fear stimuli. Therefore, a similar activation pattern for increasing cue saliency for both happiness and fear would suggest that the activation observed relates to emotional salience as predicted, rather than to pitch variation alone.
The proposed temporo-frontal network that we expect to be affected by changes in cue saliency is grounded in neuroscience research. Initial lesion studies (Ross et al., 1988;Van Lancker and Sidtis, 1993;Borod et al., 1998) linked affective prosodic processing broadly to right hemispheric function (Hornak et al., 1996;Ross and Monnot, 2008). More recent neuroimaging studies (Morris et al., 1999;Adolphs et al., 2001;Wildgruber et al., 2005;Ethofer et al., 2006;Wiethoff et al., 2008Wiethoff et al., , 2009) related prosodic processing to a distributed network including: posterior aspects of superior and middle temporal gyrus (pSTG, pMTG), inferior frontal (IFG) and orbitofrontal (OFC) gyri, and sub-cortical regions such as basal ganglia and amygdala. In current models (Ethofer et al., 2006;Schirmer and Kotz, 2006), affective prosodic comprehension has been parsed into multiple stages: (1) elementary sensory processing (2) temporo-spectral processing to extract salient acoustic features (3) integration of these features into the emotional acoustic object, and (4) evaluation of the object for meaning and goal relevance. Together these processing stages comprise a circuit with reciprocal connections between nodes.
Prior neuroimaging studies compared prosodic vs. nonprosodic tasks [i.e. (Mitchell et al., 2003)], or prosodic identifi cation of emotional vs. neutral stimuli [i.e. (Wiethoff et al., 2008)], and thereby identifi ed a set of brain regions likely involved in affective prosody. Based on knowledge of functional roles of temporal cortex and IFG ('reverse inference'; Poldrack, 2006;Van Horn and Poldrack, 2009), it was assumed that temporal cortex mediates sensory-integrative functions while IFG plays an evaluative role (Ethofer et al., 2006;Schirmer and Kotz, 2006). However, these binary 'cognitive subtraction' designs did not permit a direct demonstration of the distinct roles of temporal cortex versus IFG.
Our parametric design, using stimuli varying in cue salience to create varying levels of stimulus-driven prosodic ambiguity, has two major advantages over prior study designs: First, analysis across varying levels of an experimental manipulation allow more robust and interpretable results linking activation to the manipulated variable than designs that utilize a binary comparison. Second, the parametric manipulation of cue saliency should produce a dissociation in the relationship of sensory vs. evaluative regions to the manipulated cue level. This allows direct evaluation of the hypothesis that IFG plays an evaluative role distinct from the sensory-integrative role of temporal cortex.
We hypothesized that during a simple emotion identifi cation task, the presence of high levels of affectively salient cues within the acoustic signal should facilitate the extraction and integration of these cues into a percept that would be refl ected in temporal cortex activation increases. We also hypothesized that increased cue saliency would correlate with amygdala activation. Amygdala activation is correlated with perceived intensity in non-verbal vocalizations (Fecteau et al., 2007;Bach et al., 2008b). Such activity may refl ect automatic affective tagging of the stimulus intensity level (Bach et al., 2008a,b). Conversely, we predicted that decreasing cue saliency would be associated with increasing IFG activation, refl ecting increased evaluation of the stimuli for meaning (Adams and Janata, 2002) and diffi culty in selecting the proper emotion (Thompson-Schill et al., 1997). We thus expected that increased activation in this evaluation and response selection region (IFG) would be directly associated with decreased activity in feature extraction and integration regions (pSTG and pMTG). Thus, our parametric design aimed to characterize a reciprocal temporo-frontal network underlying prosodic comprehension and examine how activity within this network changes as a function of cue salience.

SUBJECTS
Informed consent was obtained from 20 male right-handed subjects with a mean age of 28 ± 5, 14.9 ± 2 years of education, and no reported history of psychopathology or hearing loss. One subject did not complete the scanning session due to a strong sensitivity to scanner noise. All procedures were conducted under the supervision of the local internal review board.

STIMULI AND DESIGN
Recognition of emotional prosody was assessed using a subset of stimuli from Juslin and Laukka's (2001) prosody task. The stimuli consisted of audio recordings of two male and two female actors portraying three emotions -anger, fear, happiness, as well as utterances with no emotional expression. The sentences spoken were semantically neutral and consisted of both statements and questions (e.g., "It is eleven o'clock", "Is it eleven o'clock?"). All speakers were native British English; these stimuli have been used successfully with American subjects (Leitman et al., 2008). All stimuli were less than 2 s in length. Each emotion was represented by 8-10 exemplars that had unique acoustic properties that would refl ect a particular level of cue salience for each emotion. These stimuli were repeated on average 5-7 times to yield 56 stimuli for each emotion. These stimuli were pseudo-randomly presented over fMRI time series acquisitions (runs a-d) of 56 stimuli each, in such a manner that all runs were balanced for the type of sentence (question or statement), emotion, and gender of speaker.
For this stimulus set, measurement of all acoustic cues was conducted in PRAAT (Boersma, 2001) speech analysis software as described previously (Juslin and Laukka, 2001). F0 SD was transformed to a logarithmic scale for all analyses as done previously (Leitman et al., 2008). Our initial choice of these particular cues as our proxies for cue salience (F0 SD for happiness and fear, HF 500 for anger) was based on our prior fi ndings with a full Juslin and Laukka stimuli set. There we found that the F0 SD ranges of happy and fear and the HF 500 range for anger were statistically distinct from the other emotions as a whole (see Leitman et al., 2008 -Table 2) and that they provided the single strongest correlate of subject performance. For this study, due to time constraints, we reduced the emotions presented from six to four: anger, fear, happiness, or neutral. As Table 1 illustrates, in the present study the ranges for After sound offset, this crosshair was replaced with a visual prompt containing emoticons representing the four emotion choices and the corresponding response button number. Auditory stimuli were presented through pneumatic headphones and sound presentation occurred between volume collections to minimize any potential impact of scanner noise on stimulus processing.
F0 SD and HF 500 for happiness and anger respectively are no longer statistically different from the three remaining emotions; nevertheless, they did remain the strongest single predictor of performance of the acoustic features measured. Note that we had no a priori hypotheses regarding the neutral stimuli that were included in the experiment in order to give subjects the option not to endorse an emotion. Our prior study (Leitman et al., 2008) indicated that when the cue salience of an emotional stimulus was low, subjects often endorsed it as neutral. With the inclusion of neutral stimuli, we were additionally able to replicate more prior conventional binary contrasts of emotional prosody versus neutral. The task consisted of a simple forced-choice identifi cation task and was presented in a fast event-related design whose timing and features are described in Figure 1. This design used compressed image acquisition to allow for a silent period in which audio stimuli could be presented.

IMAGE ACQUISITION
Images were acquired on a clinical 3T Siemens Trio Scanner (Iselin, NJ, USA). A 5 min magnetization-prepared, rapid acquisition gradient-echo image (MPRAGE) was acquired for anatomic overlays of functional data and spatial normalization Talairach and Tournoux (1988). Functional BOLD imaging (Bandettini et al., 1992) used a single-shot gradient-echo (GE) echo-planar (EPI) sequence (TR/ TE=4000/27 ms, FOV=220 mm, matrix=64 × 64, slice thickness/ gap=3.4/0 mm). This sequence delivered a nominal voxel resolution of 3.4 × 3.4 × 3.4 mm. Thirty four axial slices were acquired from the superior cerebellum up through the frontal lobe, aligning the slab orientation so that the middle slice was parallel to the lateral sulcus, in order to minimize signal drop-out in the temporal poles and ventral and orbitofrontal aspects of cortex. The extent of this scanning region is illustrated in Figure 2 along with a contrast of all stimuli > rest.

IMAGE PROCESSING
The fMRI data were preprocessed and analyzed using FEAT (FMRI Expert Analysis Tool) Version 5.1, part of FSL (FMRIB's Software Library, www.fmrib.ox.ac.uk/fsl). Images were slice time corrected, motion corrected to the median image using trilinear interpolation with 6 degrees of freedom, high pass fi ltered (120 s), spatially smoothed (8-mm FWHM, isotropic) and scaled using mean-based intensity normalization. Resulting translational motion parameters were examined to ensure that there was not excessive motion (in our data, all subjects exhibited less than 1 mm displacement in any plane). BET was used to remove non-brain areas (Smith, 2002). The median functional image was coregistered to the T1-weighted structural volume and then normalized to the standard anatomical space (T1 MNI template) using tri-linear interpolation (Jenkinson and Smith, 2001) and transformation parameters were later applied to statistical images for group-level analysis.

Behavior
Variations in subject performance were examined using a general linear mixed effects model conducted with Stata 9.0 (StataCorp; College Station, TX, USA). In this model, subjects' prosodic identifi cation served as the outcome variable, subjects (n=19) were treated as random effects, and fi xed effects included fMRI runs (a-d) and cue saliency level (10 for happy and anger, 8 for fear, each level refl ecting a unique stimulus). Adjustment for the clustering (repeated measures from within individual) was accomplished within the mixed model using the sandwich estimator approach, which is the default adjustment method for this program. The signifi cance levels of individual model parameters were assessed using the F-test statistic, which were appropriately adjusted for the non-independence of the repeated measures within individual, with an alpha criterion of p < 0.05.  (Juslin and Laukka, 2001). *Stimulus repetitions were arranged so as to balance as well as possible the number of stimuli presented for speaker. inter-session or inter-subject random-effects components of the mixed-effects variance using Markov chain Monte Carlo sampling to estimate the true random-effects variance and degrees of freedom at each voxel (Woolrich et al., 2004). As mentioned, saliency-related activation for happy stimuli was positively related to pitch variability (F0 SD ), while saliency-related activation for fear stimuli was negatively related to F0 SD . In order to illustrate that activation changes correlating with cue level within our ROIs refl ect emotion-specifi c changes and not directional changes in acoustic features, we conducted a conjunction analysis of happy and fear stimuli. This analysis examines correlated activation changes of increasing cue saliency (increasing F0 SD for happiness, decreasing F0 SD for fear) or decreasing cue saliency (decreasing F0 SD for happiness, increasing F0 SD for fear) within these emotions jointly.

Frontiers in Human
Statistical signifi cance was based on both voxel height and spatial extent in the whole brain, using AFNI AlphaSim to correct for multiple comparisons by Monte Carlo simulation (10,000 iterations, voxel height threshold p < 0.01 uncorrected, cluster probability p < 0.01). This whole-brain correction required a minimum cluster size of 284 2 × 2 × 2 voxels. Given the small size of the amygdala (319 voxels for both amygdalae combined) and our a priori prediction of amygdala involvement, this cluster threshold was deemed inappropriate for detecting amgydala activity. We therefore repeated the above AlphaSim correction using a mask restricted to the amygdala as defi ned anatomically by a standardized atlas (Maldjian et al., 2003), yielding a cutoff of >31 voxels.

Imaging
Subject-level time-series statistical analysis was carried out using FILM (FMRIB's Improved Linear Model) with local autocorrelation correction (Woolrich et al., 2001). Event-related fi rst stage analysis was conducted separately for the four timeseries, modeling each of the four conditions (angry, happy, fear, neutral) against a canonical hemodynamic response function (HRF) and its temporal derivative.
In order to compare our results to those of prior studies (Wiethoff et al., 2008(Wiethoff et al., , 2009 we contrasted anger, fear and happiness with neutral stimuli. In order to quantify the relationship of activation to parametrically varied cue saliency levels, we also included a parametric regressor -ZCUE -consisting of z-normalized values of the relevant cue value for each emotion (F0 SD for fear and happy, HF 500 for anger) across all emotions. A separate analysis was conducted for each of the three emotion conditions in which the HRF was scaled as a function of the relevant cue level for each stimulus (F0 SD for fear and happy, HF 500 for anger). These parametric regressors were orthogonalized relative to the fi xed amplitude HRF regressor for the corresponding emotion, yielding a contrast that refl ected cue level related variations above or below the average stimulus response.
A second-level within-subject fi xed effects analysis across all four runs was then conducted for each subject. The resulting singlesubject contrast estimates were submitted to a third-level betweensubjects (group) analysis employing FMRIB's Local Analysis of Mixed Effects (FLAME) (Beckmann et al., 2003), which models FIGURE 2 | All stimuli > rest. Activation presented at an uncorrected p < 0.05 threshhold. Grey shadow represents scanned regions of the brain. Psychophysiological interaction (PPI) analysis (Friston et al., 1997) was used to evaluate effects of cue salience on the functional connectivity of right IFG with other regions in our affective prosodic model. PPI examines changes in the covariation of BOLD signal between brain regions in relation to the experimental paradigm. IFG was chosen as a seed region because we wished to clarify its role in prosodic "evaluation" which should increase with decreasing cue saliency. The mean time series was extracted from an 8-mm-radius sphere within the right IFG seed region, centered on the coordinates (MNI = 50,22,20) where the peak effect was observed in our initial parametric analysis of cue salience within each emotion. Using FSL FEAT and following the method of Friston et al. (1997), we created a regression model employing regressors refl ecting the standardized estimate (Z score) of cue saliency for each cue by emotion (ZCUE), the mean timeseries of our rIFG sphere, and the ZCUE × timeseries interaction (the PPI regressor of interest). Additionally, we included mean global (whole brain) times series, slice time correction, and motion in our model to reduce non-specifi c sources of timeseries correlation.

BEHAVIOR
Emotion identifi cation accuracy was well above chance for all four emotional categories ( Figure 3A). Examination of identifi cation rates within each emotion as a function of cue level revealed that the identifi cation of anger stimuli signifi cantly increased as a function of HF 500 (F 1, 1041 = 101.08 p < 0.0001) ( Figure 3B). An inverse correlation indicated that decreasing F0 SD was associated with increased identifi cation of fearful stimuli (F 1, 1037 = 12.32 p < 0.0005) (Figure 3C), while happy prosodic stimuli signifi cantly increased as a function of F0 SD (F 1, 1056 = 28.45 p < 0.0001) ( Figure 3D). Although Anatomical regions within signifi cant clusters were identifi ed by a Talairach atlas Talairach and Tournoux (1988) with supplemental divisions for regions like planum temporale (PT) and IFG-pars triangularis delineated using the Harvard-Oxford atlas created by the Harvard Center for Morphometric Analysis, and WFU Pick atlas (Maldjian et al., 2003), respectively. Using the cluster tool (FSL), we identifi ed local maxima with connectivity of 26 voxels or more within these anatomical regions.
To assess the degree of lateralization within auditory regions for our cue × emotion interactions we adopted a method akin to one used previously by Obleser et al. (2008). We contrasted activity within right and left structural ROIs containing PT, pSTG, and pMTG by calculating a lateralization quotient index (LQ). We used "Energy" as an activation measure, which takes into account both amplitude and spatial extent . Energy is calculated as: Energy=mean BOLD % signal change *number of voxels, where % signal change was calculated using FSL's Featquery tool from voxels greater than our chosen voxel height threshold (overall whole brain p < 0.01). Thus, where k = number of voxels. As in Obleser et al. (2008), we used a jackknife procedure (Efron and Tibshirani, 1993) to determine the reliability of our emotion × cue effects, rerunning the model n times (n = 19, the number of our participants) each time omitting a different participant. This procedure resulted in n models with n-1 subjects, which, unlike lateralization analysis based on single subjects, preserved the advantages of second level modeling such as greatly increased signal to noise ratio. Neural representation of vocal affect the experiment was divided into four runs (a-d), there was no effect of run number on performance for any of the emotions (all p's > 0.19).

All emotions > neutral
A contrast of emotional prosody versus neutral prosody revealed increasing activation to emotional prosody in a cluster spanning Heschl's gyrus and posterior and middle portions of superior and middle temporal gyrus (pSTG, mSTG, pMTG) as well as clusters in inferior frontal (IFG) and orbitofrontal gyri (OFC) (Figure 4 and Table 2). Additional activation clusters were observed in anterior and middle portions of cingulate gyrus as well as sub-cortically within insula, caudate and thalamus. No activation within amygdala was observed even at reduced signifi cance thresholds (uncorrected p < 0.05).

All emotions × cue saliency
A voxel-wise examination of ZCUE-correlated activation patterns for all emotions (anger, fear and happiness) revealed activation clusters spanning PT, pSTG, pMTG, and IFG that were modulated by cue saliency level ( Figure 5A). Increasing cue saliency (increasing ZCUE) correlated with activation in PT, pSTG and pMTG. Conversely, decreasing cue saliency (decreasing ZCUE) was associated with IFG activation. Further, in contrast to the all emotion>neutral contrast, small volume analysis of amygdala revealed bilateral activation clusters that correlated with increasing cue saliency. Beyond these a priori ROIs, increasing cue saliency positively correlated with activation in posterior cingulate gyrus (pCG) bilaterally, right precuneus, and anterior-medial portions of paracingulate gyrus (Brodmann's areas 23, 7 and 32 respectively) ( Table 3).

FIGURE 4 | All emotions > neutral.
A subtraction of neutral activation from all emotions (anger, fear and happiness) indicates activation clusters bilaterally in posterior superior/middle temporal gyrus (pSTG/ pMTG), inferior frontal gyrus (IFG) and orbitofrontal cortex (OFC). The markers in red illustrate differences between this contrast and the subsequent parametric analysis: Arrow = OFC activation; * = thalamic activation; circles = absence of amygdala activation bilaterally.

Anger × HF 500
Activation to anger stimuli was signifi cantly modulated by HF 500 level ( Figure 5B). Increasing cue saliency (greater HF 500 ) was associated with bilateral clusters of activation spanning PT, STG, and MTG. In contrast, decreasing cue saliency (lower HF 500 ) was associated with increased bilateral IFG activation. Within amygdala, small volume correction indicated activation clusters that were associated with increasing cue saliency.
Beyond these a priori ROI's, increasing cue saliency (here HF 500 ) in anger stimuli positively correlated with activation in pCG and precuneus ( Table 4). Decreasing cue saliency correlated with activation in AC, left globus pallidus, and right caudate and insula.

Conjunction analysis of fear and happiness × F0 SD
Similarly, for fear and happiness, F0 SD -correlated activation patterns were observed in clusters spanning PT, pSTG, MTG, amygdala and IFG that were modulated by cue saliency level ( Figure 5C). Increasing cue saliency (increasing F0 SD for happiness, decreasing F0 SD for fear) correlated with activation in PT, pSTG, pMTG and amygdala. Conversely, decreasing cue saliency (decreasing F0 SD for happiness, increasing F0 SD for fear) was associated with right IFG activation.
Beyond these a priori regions of interest, increasing cue saliency for fear and happy stimuli positively correlated with activation in anterior and ventral aspects of left MTG (Brodmann's areas 20, 34 and 24), bilateral pCG, and right supramarginal gyrus, right postcentral gyrus, right insula and right precuneus ( Table 5).
These overall activation patterns observed in the conjunction analysis of happiness and fear were also seen within each emotion individually, albeit at a reduced signifi cance threshold (see Figure 5D).

Functional connectivity
An examination of the psychophysiological interaction between ZCUE and right IFG activity indicated robust negative interactions centered in bilateral pSTG (Figure 6). This interaction suggests that the functional coupling of rIFG and STG/MTG signifi cantly increases as ZCUE decreases.

DISCUSSION
We approached affective prosodic comprehension from an object-based perspective, which characterizes affective prosodic processing as a reciprocal circuit comprising sensory, integrative, and cognitive stages (Schirmer and Kotz, 2006). Our model locates sensory-integrative aspects of prosodic processing in posterior STG and MTG, while higher-order evaluation occurs in IFG. Sensory-integrative processing should be robust when the prosodic signal is rich in the acoustic cues that typify the affective intent (high cue saliency), yielding increased PT, pSTG, and pMTG activation. Such integration may be facilitated by   and fear (right) indicate activation clusters spanning pSTG, amygdala and IFG. For happiness increasing F0 SD (red) is associated with activation increases in pSTG and amygdala while decreasing F0 SD (blue)is associated with increasing IFG activation. The reverse pattern is seen for fear, decreasing F0 SD is associated with activation increases in pSTG and amygdala, while decreasing F0 SD is associated with increasing IFG activation.  amygdala. Conversely, when the prosodic signal is ambiguous (low cue saliency), greater evaluative processes are recruited, increasing activation in IFG. We tested this model by capitalizing on prior observations that acoustic cues, namely pitch variability (F0 SD ) and high-frequency spectral energy (HF 500 ), correlate with the identifi cation of specifi c emotions. We conducted a prosody identifi cation task in which the stimuli varied parametrically in their cue salience. Our results were highly consistent with model predictions.

ACTIVATION RELATED TO SALIENCY OF EMOTION-SPECIFIC ACOUSTIC CUES
Consistent with our hypothesis, increased cue saliency was associated with right lateralized BOLD signal increases in PT, pSTG, pMTG and amygdala, as well as additional regions not included in our a priori model. Similarly, Wiethoff et al. (2008) reported pSTG activation to emotional prosody relative to neutral prosody-that was abolished after covarying for acoustic features such as F0 SD and decibel level. This effect is consistent with our fi ndings: A comparison between a contrast of all emotion >neutral and our maps of emotions × cue saliency revealed a high degree of overlap in pSTG, where increasing cue saliency produced correlated activation increases. We posit that these changes refl ect increased facilitation in the extraction and integration of acoustic cues that characterize the emotion.
Again as predicted, decreased cue saliency was associated with increased activation in IFG (as well as anterior cingulate for anger, which was not part of our model). This activity, we propose, refl ects increasing evaluation of the stimulus because ambiguity increases the diffi culty of response selection.
These effects of salience were similar across the three emotions we examined, but depended on emotion-specifi c acoustic cues. Thus, saliency-related activation for happy stimuli was positively related to pitch variability (F0 SD ), negatively related to F0 SD for fear stimuli, and positively associated with HF 500 for anger stimuli. This emotion-specifi c effect is highlighted by the conjunction analyses combining fear and happy conditions, where the same acoustic cue (F0 SD ) produces opposite saliency effects. When the conjunction combined positive parametric effects of F0 SD across happy stimuli and negative parametric effects of F0 SD across fear stimuli, the predicted saliency patterns were robust. In contrast, in a control conjunction analysis (see Figure 7), examining effects of F0 SD independent of emotion (positive parametric effect across both happy and fear conditions), an unrelated pattern emerged. This pattern suggests that effects within auditory sensory regions are not due to pitch variability change alone. Rather, these auditory regions code acoustic features in an emotion-specifi c manner when individuals are engaged in vocal affect perception.

Neural representation of vocal affect
A comparison of our parametric model ( Figure 5) with a standard binary contrast of all emotions > neutral (Figure 4) revealed a high degree of overlap in activation in temporal and inferior frontal regions. However, the all emotions>neutral contrast (Figure 4 red markers) also indicated activation clusters in ventral IFG/OFC and thalamus that were not present in our cue salience parametric model, even at reduced thresholds. This effect suggests that the modulation of evaluation resulting from stimulus-driven ambiguity may be restricted to portions of the frontal prosodic processing circuit. The absence of modulation of thalamic activity by cue salience suggests that such modulation may only begin at the corticolimbic level.
Notably, cue salience increases resulted in correlated activation increases in the amygdala that were not observed in a contrast of all emotion > neutral (Figure 4). An examination of all stimuli versus rest also failed to indicate signifi cant amygdala activation (Figure 2) even at p < 0.05 uncorrected.
The literature regarding the role of amygdala in prosody and non-verbal vocalizations is mixed, with some studies (Phillips et al., 1998;Morris et al., 1999;Sander et al., 2005;Fecteau et al., 2007;Ethofer et al., 2009a;Wiethoff et al., 2009) indicating a role for the amygdala and others not Mitchell and Crow, 2005). A number of studies (Morris et al., 1999;Adolphs, 2002) suggest that the amygdala may preferentially activate during implicit tasks and become deactivated during explicit tasks, other studies have indicated the opposite (Gur et al., 2002;Habel et al., 2007) or that the amygdala activation may decrease over the duration of the experiment due to habituation (Wiethoff et al., 2009). Our results suggest that during explicit identifi cation the amygdala may be sensitive to the degree of cue salience in the prosody. This sensitivity may relate to increasing arousal engendered by cue salience as well as the fact that identifi cation accuracy for such stimuli was considerably higher for high cue than for low cue saliency stimuli. Indeed, a study of facial affect has shown that identifi cation accuracy is associated with increased amygdala activation . Thus, amygdala activation may refl ect some form of concurrent visceral or automatic recognition of emotion that may facilitate explicit evaluation.

FUNCTIONAL INTEGRATION WITHIN THE AFFECTIVE PROSODY CIRCUIT
To examine how cue salience modulates the functional coupling between IFG and other regions in the prosody network, we also conducted a psychophysiological interaction (PPI) functional connectivity analysis (Friston et al., 1997). The IFG timeseries was positively correlated with the regions in the model including auditory cortex and amygdala (not shown), demonstrating the expected functional connectivity within the network. Also consistent with our hypothesis, we found that IFG-STG connectivity was  (Ethofer et al., 2006) suggested that bilateral IFG regions receive parallel input from right temporal cortex during prosodic processing. Our results build on this fi nding, demonstrating that temporal auditory processing regions and inferior frontal evaluative regions exhibit a reciprocal interaction, whose balance is determined by the degree of cue presence that typifi ed the emotion. When this cue saliency is low, evaluation of the stimulus and selection of the appropriate response become more diffi cult.
These observations demonstrate the integrated action of regions within a functional circuit. They support the view that in affective prosodic identifi cation tasks IFG is involved in evaluation (Adams and Janata, 2002;Wildgruber et al., 2004) and response selection (Thompson-Schill et al., 1997), increasing top down modulation on auditory sensory-integrative regions in temporal cortex when stimuli are more ambiguous.

LATERALITY EFFECTS
While the reciprocal effects of salience in our a priori regions were similar across all three emotions, parametric modulation of HF 500 for anger yielded a bilateral response that was slightly left lateralized in contrast to the expected strongly rightpredominant response seen for happy and fear. Prosodic identifi cation is considered to be a predominantly right-hemisphere process (Ross, 1981;Heilman et al., 1984;Borod et al., 1998). Several considerations may explain the bilateral effects seen for anger. First, voice quality as indexed by HF 500 is highly correlated with decibel level (here r = 0.83). While spectral changes appear to predominantly engage right auditory cortex, intensity or energy changes are likely refl ected in auditory cortex bilaterally (Zatorre and Belin, 2001;Obleser et al., 2008). However, Grandjean et al. (2005) observed bilateral activation to anger seemingly independent of isolated acoustic cues such as intensity. This fi nding suggests that the lateralization of affective prosody is emotion specifi c.

LIMITATIONS AND FUTURE DIRECTIONS
Our study had several limitations. First, we parametrically varied cue levels using non-manipulated speech stimuli. This enhances ecological validity, and we chose cues (F0 SD, HF 500 ) that are tightly linked to the relevant emotions and which maximally differentiated emotions portrayed in our stimulus set [see (Leitman et al., 2008) for details]. However, in natural speech stimuli these cues are also correlated with other acoustic features that result from the vocal gestural changes eliciting the particular cue change.
These additional features could contribute to the observed relationship between our selected cues and variation in performance and neural activity. Future studies could employ synthetic stimuli that can permit precise and independent modulation of one cue at a time. Second, while fMRI provides high spatial resolution, its relatively low temporal resolution cannot capture many details of temporally complex and dynamic processes contributing to prosody identification. Electrophysiological studies indicate prosodic distinctions occurring at multiple timepoints, ranging from ∼ 200 ms in mismatch studies to ∼400 ms in N400 studies, supporting multi-stage "objects" model of prosodic processing (Schirmer and Kotz, 2006). Combining EEG and fMRI may provide a more complete description of prosodic circuit function and allow us to discriminate processes we could not distinguish in the current study, such as feature extraction vs. feature integration.
Third, while our model incorporates the regions and processes most prominently implicated in prosodic identifi cation, it is not comprehensive. The whole-brain analysis identifi ed additional areas, such as posterior cingulate (pCG), whose activity varied with cue salience as well as reinforcement-sensitive regions, such as caudate and insula. Prior studies have suggested that pCG and insula activation increases during prosodic processing are linked to increased sensory integration of acoustic cues such as F0 modulation (Hesling et al., 2005). Our fi nding that cue saliency increases correlate with pCG and insula activation increases strongly support this assertion. The exact role of pCG in facilitating sensory integration is not known but perhaps this region serves to coordinate STG integration of acoustic features between hemispheres. Future models of prosodic processing should incorporate insula and pCG more thoroughly.
Fourth, our population sample was limited to right-handed males, in order to avoid variation in prosodic processing and general language processing known to result from differences in handedness or gender (Schirmer et al., 2002(Schirmer et al., , 2004. Future studies will need to examine factors such as handedness, gender and IQ directly and determine their impact on different processing stages within the model. The purpose of our study was to explore the neural representation of acoustic-cue dependent perceptual change in affective prosody across all emotions. To accomplish this we formally tested a proposed multi-stage model of affective prosody that parses such perception into sensory-integrative and cognitiveevaluative stages. Consistent with our hypothesis, parametric manipulation of cue saliency revealed a reciprocal network underlying affective prosodic perception. Temporal auditory regions, which process acoustic features more generally, here in conjunction with amygdala, process acoustic features in an emotion-specifi c manner. This processing and its subsequent evaluation for meaning is modulated by inferior frontal regions, such that when the signal is ambiguous (as in the case of low cue-saliency), information processing in auditory regions is augmented by increased recruitment of top-down resources. While the current study identifi ed responses to emotional salience common to multiple emotions, our results are not meant to suggest that there are no emotion-specifi c differences in neural activation between emotions. Indeed, recent work by Ethofer et al. (2009b) FIGURE 6 | Psychophysiological (PPI). This functional connectivity analysis map illustrates the negative interaction between ZCUE and the mean timeseries of IFG seed region (red sphere). This map indicates that functional connectivity between IFG and auditory processing regions is signifi cantly modulated by cue saliency: Decreasing cue saliency increases IFG-STG functional coupling, while increasing cue saliency decreases this coupling.

FIGURE 7 | Control conjunction analyses.
Increasing or decreasing F0 SD across fear and happiness does not reveal activation in STG, IFG or amygdala at uncorrected p < 0.05 threshold.