Integrative Neuroscience Original Research Article Neural Underpinnings of Distortions in the Experience of Time across Senses

Auditory signals (A) are perceived as lasting longer than visual signals (V) of the same physical duration when they are compared together. Despite considerable debate about how this illusion arises psychologically, the neural underpinnings have not been studied. We used functional magnetic resonance imaging (fMRI) to investigate the neural bases of audiovisual temporal distortions and more generally, intersensory timing. Adults underwent fMRI while judging the relative duration of successively presented standard interval–comparison interval (CI) pairs, which were unimodal (A–A, V–V) or crossmodal (V–A, A–V). Mechanisms of time dilation and compression were identified by comparing the two crossmodal pairs. Mechanisms of intersensory timing were identified by comparing the unimodal and cross-modal conditions.The behavioral results showed that auditory CIs were perceived as lasting longer than visual CIs.There were three novel fMRI results. First, time dilation and compression were distinguished by differential activation of higher-sensory areas (superior temporal, posterior insula, middle occipital), which typically showed stronger effective connectivity when time was dilated (V–A). Second, when time was compressed (A–V) activation was greater in frontal cognitive-control centers, which guide decision making. These areas did not exhibit effective connectivity. Third, intrasensory timing was distinguished from inter-sensory timing partly by decreased striatal and increased superior parietal activation.These regions showed stronger connectivity with visual, memory, and cognitive-control centers during intersensory timing. Altogether, the results indicate that time dilation and compression arise from the connectivity strength of higher-sensory systems with other areas. Conversely, more extensive network interactions are needed with core timing (striatum) and attention (superior parietal) centers to integrate time codes for intersensory signals.


INTRODUCTION
Humans possess a remarkable ability to estimate the passage of time, which is vital for behavior. Yet the experience of time is not isomorphic to physical time and depends on many factors including properties of stimuli, past experiences, and behavioral contexts. For example, emotionally charged, larger magnitude, and more intense stimuli are known to expand estimates of time whereas events that are repeated, higher probability, and non-salient tend to compress perceived duration (Tse et al., 2004;van Wassenhove et al., 2008;Eagleman and Pariyadath, 2009;Matthews et al., 2011). Decades of psychophysical studies have debated the mechanisms of temporal distortions. By pacemaker-counter models (Penney et al., 2000;Ulrich et al., 2006), attention is a central factor that causes time to speed up or slow down by closing or opening a switch, which allows pulses generated from a clock during event timing to be accumulated and counted. Arousal is another factor that ostensibly increases the speed of the pacemaker. Indeed, the level of attention devoted to timing influences perceived duration (Casini and Macar, 1997;Coull et al., 2004) as does heightened physiological arousal induced by emotionally negative sounds (Mella et al., 2011). However, a more complete understanding of how temporal distortions arise has been hampered by scant investigations into the underlying neural mechanisms.
The present study used functional magnetic resonance imaging (fMRI) to investigate the neural underpinnings of the illusion that auditory (A) signals are perceived as lasting longer than visual (V) signals of the same physical duration when they are compared together (Wearden et al., 1998;Gamache and Grondin, 2010). This temporal distortion is of considerable interest because an understanding of its mechanisms may help elucidate how synchrony is maintained across senses to form coherent representations of multisensory events. The modality effect on perceived duration is often attributed to a pacemaker-accumulator "clock" system that runs faster for auditory than visual stimuli, possibly due to an attentional switch that allows pulses to accumulate faster for auditory information (Penney et al., 2000;Wearden et al., 2006). Audiovisual distortions are classically studied using the temporal bisection procedure. However, the present study employed a comparison procedure wherein a standard interval (SI) and a Frontiers in Integrative Neuroscience www.frontiersin.org comparison interval (CI) were successively presented, and participants judged whether the CI was longer or shorter in duration than the SI (Ulrich et al., 2006). SI-CI pairs were either unimodal (A-A, V-V) or crossmodal (V-A, A-V). Using this method, V-A pairs are perceived as lasting longer than A-V pairs, which is due to the longer interpulse time for visual than auditory CIs (Ulrich et al., 2006). Our primary aim was to identify neural systems underlying time dilation and compression by comparing activation patterns in the crossmodal conditions (V-A versus A-V) where the amount of visual and auditory stimulation was the same. Our hypotheses were motivated by the striatal beat frequency (SBF) model (Matell and Meck, 2004), which suggests that audiovisual differences in timing could arise from cortical oscillatory patterns in the cortex or from the striatum. Specifically, the time code for signal duration is thought to arise from the firing of cortical neurons that have different oscillation rates, which should produce distinct temporal and spatial signatures for auditory and visual signals. On the other hand, the striatum serves as a core timer by detecting and integrating cortical oscillatory states over time. Thus, activation in auditory and visual centers should differ between the crossmodal conditions if modality effects are related to different temporal signatures in sensory and association regions of the cortex. Alternatively, striatal activation should differ between the crossmodal conditions if modality effects are related to differences in the rate of detecting and integrating auditory and visual oscillatory states. We also were interested in whether interactions of the brain with key regions that modulated modality effects were stronger for the time dilation than the compression condition (i.e., effective connectivity). If time dilation is due to an attentional mechanism that favors auditory signals (Penney et al., 2000;Wearden et al., 2006), connectivity might be stronger for V-A than A-V comparisons.
A secondary aim was to investigate neural mechanisms that distinguish intrasensory from intersensory timing by comparing the unimodal and crossmodal conditions in regions that did not exhibit time dilation or compression effects. Current knowledge of the neural underpinnings of temporal processing comes solely from studies of intrasensory timing. Intersensory timing presumably differs in that attention must be switched between senses and time codes must be integrated across senses. Although not explicitly addressed by the SBF model, the detection and integration of oscillatory states by the striatum might be enhanced when timing signals within the same modality because they share similar spatial signatures, which facilitates temporal integration, thereby producing a more robust neuronal response relative to crossmodal timing. If the striatum differentially modulates intraand intersensory timing, we also speculated that the strength of striatal interactions with the cortex would differ for unimodal and crossmodal timing.

PARTICIPANTS
Twenty healthy adults participated in the study (8 female and 12 male; mean age = 24.4 years, range: 19-35 years, SD = 4.5; mean education = 15.5 years, range: 13-20 years, SD = 1.6). Participants were excluded if they had a history of neurologic disturbance (e.g., seizures, head injury), learning disability, major psychiatric disturbance, or substance abuse. All participants gave their written informed consent according to guidelines of the Human Research Protections Program at the University of California San Diego (UCSD).

FMRI TASK
Participants performed a time perception task while undergoing fMRI scanning. The task involved presentation of filled-auditory (1000 Hz pure tones) and visual (blue ellipse) stimuli. Tone stimuli were delivered binaurally through a headphone that together with earplugs attenuated background scanner noise by about 40 db. Visual stimuli were viewed through a NordicNeuroLab goggle system. Participants made a two-choice key-press response on a button box using the index or middle finger of their right hand. Figure 1A shows the experimental design and trial events. Pairs of auditory (A) and/or visual (V) stimuli were successively presented. In two unimodal conditions, the SI and the CI were of the same modality (A-A, V-V) and in two crossmodal conditions they were different (A-V, V-A). Throughout the experiment, the subject maintained fixation on a white cross at the center of the display. Prior to trial onset, a warning signal (i.e., flashing yellow cross and mixed 700-Hz tone) appeared for 350 ms followed by a 500-ms delay. A trial began with the presentation of the SI, followed by a 1.5-s delay, and then the CI. The participant indicated if the CI was shorter or longer than the SI by pressing a key with the right index or middle finger, respectively. Four SIs (1467, 1540, 1620, and 1710 ms) were used to increase the demands of encoding an interval on each trial ( Figure 1B). For each SI, there were three shorter and three longer CIs that differed from the SI by successive increments of ±7%. Accuracy and reaction time (RT; measured from CI offset to the key press) were recorded.
There were 24 trials per condition (i.e., A-A, V-V, A-V, V-A). Within each of the conditions there were four trials per CI (i.e., 7, 14, and 21% shorter or longer than the SI). These trials were equally divided among the four different SI-CI combinations ( Figure 1B). The order of the conditions was randomized across four runs, each of which contained 24 trials. At the end of the CI, there was a 3-s "filler" epoch (i.e., fixation) wherein subjects made their response. At the end of this response window, the inter-trial interval was jittered between 3 and 7 s to allow for the best sampling of the hemodynamic response and establishment of a baseline resting state in the model (i.e., fixation plus ambient scanner noise). Six additional filler images were added to the beginning and the end of each run to respectively allow for T1 equilibration and the delayed hemodynamic response of the final trial. Each run consisted of 180 images acquired over 6 min.

Image acquisition
Event-related fMRI was conducted at the UCSD Center for FMRI using a GE 3-T Excite MRI system equipped with an 8-channel head coil. Foam padding was used to limit head motion within the coil. Prior to functional imaging, high-resolution T1-weighted anatomic images were collected for anatomic localization and coregistration (TE 3.0 ms, TR 7.8 ms, 12˚flip angle, NEX 1, 1-mm axial slice thickness, FOV 25 cm, 256 × 256 matrix). Echo-planar images were acquired using a single-shot, blipped, gradient-echo, Frontiers in Integrative Neuroscience www.frontiersin.org Prior to trial onset, a warning signal (flashing yellow cross and mixed 700-Hz tone) appeared for 350 ms followed by a 500-ms delay. A trial began with the presentation of the SI, followed by a 1.5-s delay, and then the CI. The participant indicated if the CI was shorter or longer than the SI by pressing a key with the right index or middle finger, respectively. (B) SI and CI durations. There were four different SIs. Each SI was paired with three shorter and three longer CIs that differed from the SI by successive increments of ±7%.

Image analysis
Functional images were generated using Analysis of Functional NeuroImages (AFNI) software. Time series images were spatially registered in three-dimensional space and corrected for time-slice acquisition differences. The time series for each participant was deconvolved using trial onset (i.e., presentation of the SI) separately for each of the four conditions (A-A, V-V, A-V, V-A). This analysis produces hemodynamic response functions (HRFs) of the fMRI signal on a voxel-wise basis. The HRFs are estimates of the hemodynamic response for each condition relative to the baseline state (i.e., filler images), and are generated without making a priori assumptions about the shape, delay, or magnitude of the HRF. The deconvolution was modeled for 8 time points (i.e., 16 s). Six head-motion parameters were included as covariates of no interest. Area under the curve (AUC) was calculated using the volumes that captured peak activation during the SI (volumes 2 and 3 beginning at 4.0 and 6.0 s post-trial onset) and the CI and response (volumes 4 and 5 beginning at 8.0 and 10.0 s post-trial onset). AUC maps were then interpolated to volumes with 1-mm 3 voxels, co-registered, converted to Talairach coordinate-space, and blurred using a 6-mm Gaussian full-width half maximum filter. Repeated-measures analyses of variance (ANOVAs) were performed on a voxel-wise basis to generate statistical parametric maps that identified voxels that showed main effects of timing condition (unimodal, crossmodal), CI modality (auditory, visual), and the interaction. Voxel-wise thresholds were derived from 3000 Monte Carlo simulations (AFNI AlphaSim), which computed the voxel-probability and minimum cluster-size threshold needed to obtain a 0.05 familywise alpha. Because spatial thresholds are biased against smaller activation clusters of a priori interest (i.e., basal ganglia), statistical thresholds were derived separately for basal ganglia and cortical volumes (Worsley et al., 1996). This was accomplished by creating a basal ganglia mask (i.e., putamen, globus pallidus, caudate) using the Talairach Daemon dataset; the mask was then expanded to include any voxels within a 2-mm radius. The cortical mask included all other regions of the brain. Each mask was used in the Monte Carlo simulations to determine the appropriate combination of individual voxel-probability and Frontiers in Integrative Neuroscience www.frontiersin.org minimum cluster-size threshold. For the basal ganglia volume, we used a voxel-wise threshold of p < 0.006 and a minimum cluster size of 0.338 ml. For the cortical volume, we used a voxel-wise threshold of p < 0.004 and a minimum cluster size of 0.675 ml. The objectives of our study were to investigate regional differences associated with (1) signal modality (A-A versus V-V), (2) timing condition (unimodal versus crossmodal), and (3) the interaction of modality × timing condition. Planned comparisons of significant interactions focused on the contrast between the two crossmodal pairs (A-V versus V-A) since this directly tests for regional activation associated with the time dilation effect while controlling for sensory processing demands. To accomplish these objectives, a functional region of interest (fROI) analysis was conducted to directly evaluate regional differences associated with each of these effects. The fROI map was generated by conjoining activated regions associated with the main effect and interaction tests that were identified in the above voxel-wise analyses. As some fROI were quite large, we separated them into smaller regions along minimum contour lines of the voxel-wise map using a watershed algorithm. This algorithm first uses AFNI 3dExtrema to find a set of local maxima separated by at least 20 mm and then creates boundaries for clusters containing these maxima along the minimum value contour lines (Cox, 1996). The watershed algorithm was applied to the conjoined fROI map using the normalized maximum intensity value from each voxel. The results from F tests conducted on the fROI were the focus of the study.

Effective connectivity analyses
We also asked if connectivity of key regions with the entire brain were modulated by the timing condition (unimodal versus crossmodal) or by dilation/compression effects on perceived duration (V-A versus A-V). This was achieved by conducting voxel-based tests of psychophysiological interactions (PPI; Friston et al., 1997) for key regions, which were identified by the above fROI analyses. For PPI analyses pertaining to timing condition, key regions were selected that (1) exhibited differences between the unimodal and crossmodal conditions, (2) did not show a timing condition × CI modality interaction, and (3) have been implicated in temporal processing. For PPI analyses pertaining to time dilation/compression, key regions were selected that exhibited a timing condition × CI modality interaction that was related to differences in activation between the two crossmodal pairs (V-A versus A-V). Voxels in these key regions were the seed ROI and were selected for each subject based on the conjunctive maps generated for the fROI analyses. Seeds were constructed by drawing a 5-mm radius sphere that was centered close to the peak activation within a fROI. In one PPI analysis, the experimental variable was the timing condition (unimodal versus crossmodal). In the other PPI analysis, the experimental variable was the time dilation/compression effect (V-A versus A-V). Multiplication of the deconvolved time series for the seed areas with each experimental variable formed the interaction term (i.e., PPI regressor), which tested whether connectivity of a key region with the whole brain was modulated by the experimental variable. A p < 0.006 voxel-wise threshold and a 0.338-ml minimum cluster size was the criterion for significance.

BEHAVIORAL RESULTS
The analyses collapsed across SI duration. Hence, CIs that were ±7, 14, and 21% increments of the SI duration were also averaged. The main dependent measure was accuracy, which was converted to the percent longer responses for each CI. A repeatedmeasures ANOVA tested the main effect of CI modality (auditory, visual), timing condition (unimodal, crossmodal), CI duration (±7, 14, 21%), and the interactions. The Huynh-Feldt correction was applied to multiple DOF effects to adjust for violations of sphericity. The main results are graphed in Figure 2.
All first-order interactions were significant [CI modality × timing condition: F (1,19) = 96.3, p < 0.0001, η 2 = 0.84; timing condition × CI duration: F (4.4, 83) = 8.3, p < 0.0001, η 2 = 0.30; CI modality × CI duration: F (5,95) = 2.8, p < 0.025, η 2 = 0.13]. Planned comparisons of the CI modality × timing FIGURE 2 | Task performance during fMRI scanning. Accuracy data were converted to the mean (standard error bars) percent longer and then averaged across the standard interval (SI) conditions and their respective comparison intervals (CIs). The left graph shows the mean percent longer responses for each unimodal (A-A, V-V) and crossmodal (V-A, A-V) condition. The right graph plots the mean percent longer responses for the unimodal and crossmodal conditions as a function of the CI duration. On the x axis, ±7, 14, and 21 designate CIs that were 7, 14, and 21% shorter (negative values) or longer (positive values) than the SI.

Frontiers in Integrative Neuroscience
www.frontiersin.org condition interaction (Figure 2, left graph) showed that in the unimodal condition, visual CIs were perceived as lasting longer than auditory Though no differences were expected between A-A and V-V pairs, this was found previously (Ulrich et al., 2006) and relates to the greater variability in timing visual signals (Merchant et al., 2008;Grondin and McAuley, 2009). In contrast, auditory CIs were perceived as lasting longer than visual CIs in the crossmodal con- Pairwise comparisons between the unimodal and crossmodal conditions indicated that perceived duration was dilated for intersensory timing of auditory CIs The timing condition × CI duration interaction showed that differences between the two timing conditions grew as CI duration increased (Figure 2, right graph). The CI modality × CI duration interaction showed that CI modality differences also grew as CI duration increased. The second-order interaction was not significant [F (3.9,75) = 1.8, p = 0.14]. Secondary analyses of the RT data showed a trend for a CI modality × timing condition interaction [F (1,19) = 3.8, p = 0.067, η 2 = 0.17]. Planned comparisons showed the interaction was due to faster RTs for V-A (mean = 776.3 ms, SE = 50.5) than A-V pairs (mean = 899.9 ms, SE = 51.0) [t (1,19) = 3.1, p < 0.01], but not for A-A (mean = 861.7 ms, SE = 50.2) than V-V pairs (mean = 874.1 ms; SE = 52.1).

Functional ROI results
The conjoined fMRI activation masks in Figures 3 and 4 display 25 regions that exhibited effects of CI modality, timing condition, and/or an interaction. Table 1 provides the details of these activation foci. For each fROI, the table also summarizes the results from statistical analyses that tested for the effects of signal modality (A-A versus V-V), timing condition (unimodal versus crossmodal), and the interaction of modality × timing condition. Figure 3 (left column; red) show that the modality of unimodal pairs affected activation largely in posterior cortical areas including in the parietal (superior parietal cortex and precuneus), temporal (posterior portions of superior temporal cortex and insula, middle temporal cortex, parahippocampus, hippocampus), and occipital cortices (middleoccipital cortex and cuneus). An exception was the modality effect on activation of the medial frontal/anterior cingulate areas. In most regions activation was greater for visual than auditory unimodal pairs, except for the medial frontal/anterior cingulate and superior temporal/insular cortices wherein activation was greater for auditory than visual pairs.  Twenty-one cortical fROI were identified by conjoining activation maps from the voxel-wise analyses. Tests of modality, timing condition, and the interaction were conducted on these fROI. The fROI are color coded according to whether activation was affected by each of these factors. In all three columns, purple denotes no effect of a particular factor on activation. For the test of modality (left column), red designates a significant difference between the A-A and V-V conditions. For the test of timing condition (middle column), yellow signifies a significant difference between the unimodal and the crossmodal conditions. In the right column, green signifies a significant CI modality × timing condition interaction.

FIGURE 4 | Subcortical functional ROI (fROI).
Four subcortical fROI were identified by conjoining the activation maps from the voxel-wise analyses. In all of the fROI, activation was greater in the unimodal than the crossmodal timing condition. No other effects were significant. z coordinates are the superior (+)/inferior (−) distance in millimeter from the anterior commissure.
(MTG), parahippocampus, hippocampus], and middle-occipital cortices, the thalamus (pulvinar nucleus, lateral geniculate body) Frontiers in Integrative Neuroscience www.frontiersin.org These regions showed greater activation for crossmodal than unimodal pairs. All other areas showed greater activation for unimodal than crossmodal pairs. and the basal ganglia (putamen, caudate body). Figure 5 displays graphs of signal change for the unimodal and crossmodal conditions in representative regions. For most regions, activation was greater for unimodal than crossmodal pairs. Exceptions included the MFG/IFG and superior parietal cortex, wherein activation was greater for crossmodal than unimodal pairs. Activation was also greater for the crossmodal than the unimodal condition, but negative, in the right SMA/paracentral lobule and the left parahippocampus/hippocampus (Table 1, clusters 14 and 15). Table 1 and Figure 3 (right column; green) display regions wherein CI modality interacted with the timing condition. All posterior, but not anterior, regions that showed an interaction also showed modality effects (A-A versus V-V; Figure 3, left column; red). However, we were principally interested in whether activation differed between the two crossmodal pairs (V-A versus A-V) since this contrast directly tests for regional activation associated with time dilation and compression, while controlling for the amount of auditory and visual stimulation. There were three patterns of interactions. First, for the left medial frontal/anterior cingulate, the interaction was due to greater activation in the auditory than the visual unimodal condition (Table 1), yet no difference between A-V and V-A pairs (p > 0.10). Second, Figure 6 shows that for the left preSMA and MFG/IFG, the interaction was due to greater activation for the A-V than the V-A pairs (p < 0.0001 and p < 0.02, respectively); auditory and visual unimodal conditions did not differ. For the third interaction pattern, Table 1 and Figure 6 show that large regional biases for timing unimodal auditory (right and left superior temporal/insula cortex) or visual pairs (right and left middle-occipital Frontiers in Integrative Neuroscience www.frontiersin.org  Frontiers in Integrative Neuroscience www.frontiersin.org cortex) translated into smaller, but significant differences between the two crossmodal conditions. Specifically, activation was greater for V-A pairs in the right and left superior temporal/insula cortex (p < 0.006 and p < 0.02, respectively) and greater for A-V pairs in the right and left middle-occipital cortex (  Figure 7 displays spatial maps of regions exhibiting significant effective connectivity with each seed that was modulated by the timing condition. Table 2 describes the details of these interacting regions. For all seeds, effective connectivity was stronger in the crossmodal than the unimodal condition. The striatum and most cortical ROI showed connectivity with encoding/retrieval hubs (posterior cingulate, precuneus). The right caudate also showed connectivity with cognitive control . Of these seeds, effective connectivity was not found for the left preSMA, left MFG, and two occipital seeds (Table 1, clusters 1, 6, 16, and 17). Figure 8 displays spatial maps of regions showing significant effective connectivity with each seed that was modulated by the time dilation and compression conditions. Table 3 describes the details of these interacting regions. Two patterns of effective connectivity were found. First, the predominant pattern was characterized by stronger connectivity in the V-A"time dilation"condition. For this pattern, the right and/or left superior temporal cortex showed connectivity with cognitive control [MFG (BA 6,9,10), IFG (6), SMA, preSMA], attention/association (superior/inferior parietal), sensory integration (anterior insula, claustrum), and visual centers, and with the caudate body and culmen ( Figure 8A). Similarly, the left and/or right insula showed connectivity with cognitive control (SMA, preSMA), higher association (inferior parietal), and sensory integration areas (anterior insula), and with the putamen ( Figure 8B). By comparison, the left middle-occipital seed showed more limited connectivity with sensory integration (anterior insula) and visual centers (cuneus; Figure 8C). Second, a less common pattern was characterized by stronger connectivity of some seeds with medial cortical areas in the A-V "time compression" condition ( Figures 8A,B, sagittal views). Specifically, the left superior temporal cortex and the left and right insula showed stronger connectivity with rostral medial frontal cortex (9, 10) for A-V pairs. The right superior temporal cortex also showed stronger connectivity with the cingulate (BA 30, 31).

DISCUSSION
Our behavioral findings confirmed that auditory CIs were perceived as lasting longer than visual CIs in the crossmodal condition (Ulrich et al., 2006). Moreover, pairwise comparisons of each crossmodal and unimodal condition demonstrated that perceived duration was dilated when the CI was auditory (V-A) and compressed when it was visual (A-V). Additionally, crossmodal RTs were faster when perceived duration was dilated, possibly because auditory signals are more salient in the context of temporal processing, wherein audition dominates vision (Repp and Penel, 2002;Recanzone, 2003;Mayer et al., 2009). We also found that differences between the unimodal and crossmodal conditions in judgments of time grew with CI duration, irrespective of CI modality. By pacemaker-accumulator models, this result suggests that intersensory timing affects the flow of pulses from the pacemaker rather than a delay in the start of the clock, which would have a constant effect across CI durations (Wearden et al., 1998(Wearden et al., , 2010Penney et al., 2000). The neural underpinnings of these behavioral findings were elucidated for the first time by the present study, which uncovered four main findings. First, we showed that time dilation and compression were distinguished by differential activation of higher-sensory areas (superior temporal/insula, middle occipital) associated with the modality of the CI. Effective connectivity of these areas with middle frontal and parietal cortices, anterior insula, and the striatum was typically stronger when perceived duration was dilated (V-A). We suspect that this result is due to the engagement of distributed neural networks when timing more salient auditory signals. Second, time compression (A-V) was characterized by greater activation of cognitive-control centers (preSMA, MFG/IFG), although these centers did not exhibit effective connectivity with other regions. This finding suggests that A-V comparisons required more cognitive effort, consistent with the longer RTs when perceived duration was compressed. Third, audiovisual distortions in subjective duration were not mediated by the striatum, suggesting that the rate of detection or integration of cortical oscillatory states is not faster for auditory than visual signals. Fourth, intersensory timing was distinguished from intrasensory timing by decreased activation of the striatum and SMA, but increased activation of an attention center (superior parietal cortex). These regions showed stronger connectivity with frontal, parietal, and visual areas during crossmodal than unimodal timing, which may signify the greater demands on core timing and attention systems in integrating audiovisual Frontiers in Integrative Neuroscience www.frontiersin.org  Table 2 for details about individual activation foci.
time codes. We now turn to a more complete discussion of these findings.

TIME DILATION AND COMPRESSION
Audiovisual distortions in perceived duration were largely distinguished by activity in higher-sensory areas, wherein the magnitude of activation and the strength of effective connectivity both depended on the time dilation/compression conditions. Despite equivalent stimulation of the two senses, activation was greater in bilateral superior temporal and posterior insular cortex when perceived duration was dilated (V-A) and greater in bilateral middleoccipital cortex when it was compressed (A-V). These results indicated that the modality of the CI drove differential activation in these areas, consistent with their respective bias for timing unimodal auditory or visual signals. At the same time, secondary auditory and visual centers are multisensory (Ghazanfar and Schroeder, 2006) and are thought to support audiovisual integration (Calvert, 2001;Klemen and Chambers, 2011). This prospect was suggested by our effective connectivity results wherein highersensory areas typically showed stronger connectivity when time was dilated rather than compressed.
Common to all of these higher-sensory areas was stronger connectivity with the anterior insula. The insula integrates processing from disparate domains (e.g., interoception, working memory, emotion) including time (Nenadic et al., 2003;Harrington et al., 2010;Kosillo and Smith, 2010;Wittmann et al., 2010a). It has also been linked to the dilation of perceived duration by salient features of visual signals (Wittmann et al., 2010b). Importantly, the insula mediates the perception of audiovisual asynchrony (Bushara et al., 2001;Calvert et al., 2001), implicating it in the synthesis of crossmodal signals based on their temporal correspondence. The anterior insula is also thought to be an attentional hub that assists central executive networks in generating accurate responses to salient or task-relevant events (Menon and Uddin, 2010). Auditory signals are more salient than visual signals in the context of temporal processing (Repp and Penel, 2002;Recanzone, 2003;Mayer et al., 2009). This is likely due to past experiences in timing principally via audition (e.g., music, speech), which over time may build up the connectivity strength of networks that mediate temporal processing of auditory signals. Thus, enhanced sensitivity in the anterior insula to auditory oscillatory patterns may contribute to time dilation. Time dilation was also related Frontiers in Integrative Neuroscience www.frontiersin.org Regions showing effective connectivity with each seed region (bold font) are displayed in Figure 7.
For all seed-interacting regions, connectivity was stronger for the crossmodal than the unimodal condition. Brodmann areas (BA) were defined by the Talairach and Tournoux atlas. Cerebellar lobules were defined by the Schmahmann atlas (Schmahmann et al., 2000).  Regions showing effective connectivity with each seed region (bold font) are displayed in Figure 8.
Brodmann areas (BA) were defined by the Talairach and Tournoux atlas. Cerebellar lobules were defined by the Schmahmann atlas (Schmahmann et al., 2000).
Coordinates represent distance in millimeter from anterior commissure: x, right (+)/left (−); y, anterior (+)/posterior (−); z, superior (+)/inferior (−). to stronger connectivity of superior temporal and insular cortices with the striatum (caudate and putamen), an alleged core-timing system (Matell and Meck, 2004), and with higher association areas (parietal cortex), sensorimotor areas (cerebellum), and cognitivecontrol centers (preSMA, MFG, IFG), which are also involved in audiovisual integration (Lewis et al., 2000;Bushara et al., 2001;Calvert et al., 2001). By comparison, only one of three middleoccipital fROI showed effective connectivity, which was interregionally restricted to the anterior insula. Taken together, these results indicate that a mechanism underlying audiovisual temporal distortions is the strength of superior temporal/posterior insular cortex connectivity with distributed networks that mediate multisensory integration, cognitive control, and timekeeping.
A less common finding was stronger connectivity in the time compression condition of the left superior temporal and bilateral insular cortices with medial cortical regions involved in more abstract decision making (rostral medial frontal; BA 9, 10) and executive control (posterior cingulate). This circumscribed connectivity pattern may reflect the greater difficulty of A-V than V-A judgments, consistent with their longer RTs. This prospect was also supported by our fROI analyses, wherein activation was greater in classic working memory and attention regions (preSMA, MFG, IFG) when time was compressed than when it was dilated. These regions, however, did not exhibit significant effective connectivity. This leads us to conclude that the preSMA and MFG/IFG are supramodal centers that direct attention and working memory Frontiers in Integrative Neuroscience www.frontiersin.org  Table 3 for details about individual activation foci.
resources during intersensory timing, but do not give rise to audiovisual effects on perceived duration per se.

INTERSENSORY AND INTRASENSORY TIMING
Our fROI results did not suggest that audiovisual distortions in subjective duration were mediated by the striatum. Rather, we found that putamen and caudate activation was greater when timing unimodal than crossmodal signals, irrespective of the CI modality. This result was not consistent with classic attentional switching accounts of striatal function (van Schouwenburg et al., 2010), wherein greater activation would be expected in the crossmodal than the unimodal condition. Attentional switching should also produce a constant effect on perceived duration across CI durations for the crossmodal condition, which was not found. The mechanisms by which time is synthesized across the senses are not understood. Crossmodal stimulation often enhances neuronal responses in multisensory integration centers (e.g., superior colliculus, association areas; Calvert et al., 2001), including the striatum (Nagy et al., 2006), but depression of neuronal responses is also found, especially when intersensory signals are spatially incongruent or asynchronous as in our study .
Increased striatal activation during unimodal timing may relate to the role of the striatum in detecting and integrating cortical oscillatory states, which provide the temporal code for signal duration (Matell and Meck, 2004). Stronger striatal responses might arise when timing unimodal signals because they share similar spatial signatures. Detection and temporal integration of oscillatory states might therefore speed up because evidence for the time code accumulates faster when the CI duration can be mapped onto the neural time-code of the SI modality, which is active in memory. Conversely, different spatial signatures for crossmodal signals may render temporal integration noisy, resulting in a diminished striatal response. Though speculative, this account may also relate to the increased activation and reduced suppression in the left and right SMA for unimodal than crossmodal timing. The SMA is sensitive to elapsed time (Pouthas et al., 2005;Mita et al., 2009;Wencil et al., 2010), but unlike the striatum, it mediates maintenance of temporal and non-temporal information (Harrington et al., 2010). The SMA may therefore maintain temporal representations online for other networks to make use of to affect behavior. Stronger SMA activation when timing unimodal than crossmodal signals may signify a stronger neural representation of the time code.

Frontiers in Integrative Neuroscience
www.frontiersin.org Despite the increased activation of the striatum and SMA during intrasensory timing, connectivity of these areas with the brain was stronger during crossmodal timing. For example, these regions showed stronger connectivity with a core memory hub (precuneus, posterior cingulate, parahippocampus), possibly signifying the greater dependence of striatum and SMA on output from encoding and retrieval systems during intersensory timing. The caudate and SMA also showed stronger connectivity with visual (fusiform and lingual gyrus, MTG), but not auditory centers, and frontal cognitive-control centers (medial frontal, MFG, SFG, IFG). These findings may relate in part to the more deliberate timing of visual signals (Repp and Penel, 2002;Mayer et al., 2009), which renders synthesis of audiovisual temporal codes more difficult.
Intersensory timing was also associated with increased activation of a frontal-parietal attention network. Though increased MFG/IFG activation was largely related to the more difficult A-V judgments, our results suggest that the synthesis of audiovisual temporal codes increases attentional processing in the superior parietal cortex, irrespective of the CI modality. This was consistent with the stronger connectivity of the superior parietal cortex with frontal control-systems (MFG, IFG) during crossmodal timing, but also with higher visual areas (MTG) and a memory encoding hub (precuneus, posterior cingulate). Altogether, these effective connectivity patterns suggest that more extensive network interactions with the striatum, SMA, and superior parietal cortex are needed to time intersensory than intrasensory signals.

CONCLUSION
Our results indicate that audiovisual effects on the experience of time emanate from higher-sensory areas, in which connectivity is stronger and far more inter-regionally distributed when timing auditory than visual signals. Though we found greater activation in cognitive-control centers for the more difficult (time compression) than easy (time dilation) crossmodal comparisons, effective connectivity of these regions was not modulated by the modality effect. This may suggest that cognitive-control centers play a supramodal role in directing attention or allocating working memory resources during decision making. We also found that audiovisual distortions in perceived duration were not driven by the striatum, suggesting that the presumed core-timing system (Matell and Meck, 2004) operates at the same rate for visual and auditory signals. Rather, during crossmodal timing, striatal activation was decreased and connectivity was stronger with visual, memory encoding and cognitive-control centers. These findings were attributed to the greater demands on striatal integration of crossmodal time codes. The present findings have implications for understanding neural mechanisms of temporal processing distortions in maturation and disease. For example, enhanced modality effects in children  and in individuals at risk for schizophrenia (Penney et al., 2005) are due to impaired timing of visual signals. Our results suggest that this might arise from developmental differences and preclinical changes in frontal cognitive-control centers, but also the connectivity of highersensory association areas with executive control centers (medial cortex). Conversely, audiovisual distortions in perceived duration are diminished in diagnosed schizophrenics, largely due to inaccurate timing of auditory signals (Carroll et al., 2008). This is consistent with changes in temporal cortex in schizophrenia, which may well alter inter-regional connectivity. Altogether, the present study demonstrates that intersensory synthesis of temporal information and time dilation and compression effects are mediated by different patterns of regional activation and inter-regional connectivity. Future studies are needed that further elucidate interactions among multiple brain regions, which are fundamental to temporal processing and likely breakdown in certain neurological and psychiatric disorders.