Neural correlates of auditory temporal predictions during sensorimotor synchronization

Musical ensemble performance requires temporally precise interpersonal action coordination. To play in synchrony, ensemble musicians presumably rely on anticipatory mechanisms that enable them to predict the timing of sounds produced by co-performers. Previous studies have shown that individuals differ in their ability to predict upcoming tempo changes in paced finger-tapping tasks (indexed by cross-correlations between tap timing and pacing events) and that the degree of such prediction influences the accuracy of sensorimotor synchronization (SMS) and interpersonal coordination in dyadic tapping tasks. The current functional magnetic resonance imaging study investigated the neural correlates of auditory temporal predictions during SMS in a within-subject design. Hemodynamic responses were recorded from 18 musicians while they tapped in synchrony with auditory sequences containing gradual tempo changes under conditions of varying cognitive load (achieved by a simultaneous visual n-back working-memory task comprising three levels of difficulty: observation only, 1-back, and 2-back object comparisons). Prediction ability during SMS decreased with increasing cognitive load. Results of a parametric analysis revealed that the generation of auditory temporal predictions during SMS recruits (1) a distributed network of cortico-cerebellar motor-related brain areas (left dorsal premotor and motor cortex, right lateral cerebellum, SMA proper and bilateral inferior parietal cortex) and (2) medial cortical areas (medial prefrontal cortex, posterior cingulate cortex). While the first network is presumably involved in basic sensory prediction, sensorimotor integration, motor timing, and temporal adaptation, activation in the second set of areas may be related to higher-level social-cognitive processes elicited during action coordination with auditory signals that resemble music performed by human agents.


INTRODUCTION
The ability to predict the time course of events as they unfold in the immediate environment is a fundamental skill that underlies most activities in everyday life. This ability is paramount in situations that require the coordination of our actions in time and space with the actions of other people around us. Highly refined forms of such interpersonal action coordination can be found in the music domain, where action synchronization between performers is often required under challenging circumstances. For instance, in Western classical music, musical expression entails the introduction of local tempo deviations in order to communicate particular interpretations and to convey emotions to the audience (e.g., Palmer, 1997;Clarke, 1999;Gabrielsson and Lindström, 2001;Chaffin and Logan, 2006). Nevertheless, interpersonal action coordination in musical ensembles can still be extremely accurate. Studies of small ensembles with two or three instruments report that sounds that are notated to be produced simultaneously by different instrumental voices are typically played with small asynchronies between them, with a spread of only around 30-50 ms or less (Rasch, 1979(Rasch, , 1988Shaffer, 1984;Palmer, 1997;Goebl and Palmer, 2009;Keller and Appel, 2010). Furthermore, this high level of synchrony can be retained during expressive musical passages that are characterized by considerable deviations from isochronous tempo (Shaffer, 1984). Such a high degree of temporal precision during interpersonal synchronization cannot be achieved if musicians only react to the perceived actions of their co-performers. Musicians presumably rely on anticipatory mechanisms, among other processes, which allow them to predict the sounds that will be produced by their co-performers and coordinate their own anticipated actions with these predictions (see Keller, 2008).
Paced finger-tapping tasks are one of the most common paradigms used to investigate basic musical SMS processes in an experimental setting. In these tasks, individuals are normally asked to tap a finger in time with an auditorily presented pacing signal, while the asynchronies between taps and pacing events are analyzed to yield different measures of individual SMS performance (for an overview, see Repp, 2005Repp, , 2006Repp and Su, 2013). The auditory pacing signals can be regular (i.e., metronomic) or irregular, e.g., include local timing perturbations or gradual tempo changes as found in performed classical music. Repp (2002) proposed two correlational measures as indices of synchronization ability in tasks requiring finger tapping to expressive music. These measures are derived by computing the cross-correlation (CC) between an individual's inter-tap intervals (ITIs) and the inter-onset intervals (IOIs) between tones in the musical sequence at two different lags: lag-0 and lag-1. If an individual is able to predict upcoming tempo modulations in the auditory sequence, relatively high lag-0 CCs reflect the fact that he or she adjusted their ITIs more strongly on the basis of upcoming rather than preceding IOIs in the sequence. Conversely, relatively high lag-1 CCs reveal that the individual tended to react to rather than predict the pattern of tempo changes in the IOIs, i.e., the individual adjusted his or her tap timing more strongly on the basis of preceding rather than upcoming IOIs in the auditory sequence. Tracking behavior has been mainly observed for SMS with random or barely detectable timing modulations (e.g., Michon, 1967;Thaut et al., 1998Thaut et al., , 2009Stephan et al., 2002;Madison and Merker, 2005;Schulze et al., 2005), or when participants were unaware that they were tapping along with a pacing sequence that mirrored the expressive timing profile of a complex musical piece (Repp, 2002(Repp, , 2006. On the other hand, when timing variations in the auditory sequence are easily detectable and follow a regular or familiar pattern (e.g., the local tempo variations of a well-known musical piece), participants are able to anticipate these changes (e.g., Michon, 1967;Repp, 2005;Rankin et al., 2009) and, furthermore, their performance improves with training (Repp, 2002).
Our own previous research has focused on anticipation during SMS by investigating temporal prediction abilities in tasks requiring individuals to tap a finger in synchrony with a tempochanging pacing signal Keller, 2009a,b, 2011). In these studies, gradual tempo transitions were designed to resemble tempo variations found in real performed music (i.e., accelerando and ritardando). To assess the degree to which individuals predicted upcoming tempo changes, we computed the ratio of the lag-0 over the lag-1 CCs between the individual's ITIs and the IOIs of the pacing signal (cf. Repp, 2002). This prediction/tracking ratio reflects whether individuals show relatively stronger prediction (ratio > 1) or tracking (ratio < 1) tendencies during SMS with ongoing tempo changes. The first two studies (Pecenka and Keller, 2009a,b) revealed that about two-thirds of the individuals in our samples (comprised mostly of amateur musicians) tended to predict the timing variations, while the remaining individuals displayed a relatively stronger tendency to track the ongoing tempo changes. Prediction/tracking ratios were approximately normally distributed and positively correlated with SMS ability (see Pecenka and Keller, 2009a,b). In a third study (Pecenka and Keller, 2011) we found that the individual differences in prediction ability impacted upon interpersonal action coordination in a joint finger-tapping task. Furthermore, individual differences in prediction ability, which were in those settings highly reliable and stable across time were positively correlated with musical experience and self-reported tendency to take other performers' perspectives during musical interactions (Pecenka and Keller, 2011).
Although temporal prediction is inherent in practically any kind of SMS task, the underlying mechanisms and brain processes of temporal prediction during SMS have not yet been investigated. Keller (2008) has proposed that musicians make predictions about their co-performers' ongoing action outcomes by using internal simulation processes to generate anticipatory auditory images of the others' sounds. This assumption is supported by findings from our previous studies, which revealed a positive correlation between prediction/tracking ratios and the acuity of auditory imagery for pitch (Pecenka and Keller, 2009a) and timing (Pecenka and Keller, 2009b). Empirical and theoretical work has furthermore suggested that the formation of auditory images relies (although not solely) on working memory (Baddeley and Logie, 1992;Smith et al., 1995;see Hubbard, 2010 for a review), and corresponding brain areas have been found to be active during auditory imagery (Aleman and van't Wout, 2004).
We hypothesized that introducing a simultaneous workingmemory task during finger tapping with tempo-changing pacing sequences would lead to variations in the individual degree of prediction to the extent that both tasks draw on similar working-memory processes. A pilot study conducted beforehand supported this hypothesis and indicated that intraindividual prediction ability decreases with increasing cognitive load imposed by simultaneous working-memory computations 1 . The aim of the current study was to identify patterns of neural activation that covary as a function of prediction tendencies during synchronized finger tapping. To this end, we employed functional magnetic resonance imaging (fMRI) to investigate the neural correlates of temporal prediction during SMS. We have used a within-subjects experimental design (rather than a betweensubjects correlational approach based on individual differences), as it was possible to manipulate individual prediction/tracking ratios experimentally. Participants were asked to tap a finger in synchrony with an auditory pacing signal that changed its tempo gradually. Simultaneously, cognitive load was varied by means of a visual n-back working-memory task with three levels of increasing difficulty: object observation, 1-back, and 2-back object comparisons. According to our pilot study, we expected 1 The behavioral pilot study was conducted to ensure that our workingmemory manipulation had the intended effect on individual prediction tendencies measured during SMS. The task and procedure were practically identical to the present fMRI study (see Material and Methods), except that participants were tested sitting upright in a quiet laboratory and trials did not include temporal jitter (this was only necessary for the fMRI design). Out of the 39 participants that were tested (mean age 26.1 years, 19 females), 34 participants displayed a tendency to predict rather than track the tempo changes (as evidenced by a prediction/tracking ratio >1) during finger tapping and object observation (Tap), while this was the case for only 24 participants during SMS and 1-back comparisons (Tap1B) and for 19 participants during SMS and 2-back comparisons (Tap2B). An analysis of variance yielded a significant main effect of experimental condition [F (2, 76) = 71.36, p < 0.001, η 2 = 0.65]. Paired comparisons between individual conditions revealed that prediction/tracking ratios decreased significantly from Tap over Tap1B to Tap2B [t (38) > 3.84, ps < 0.001]. Twenty participants from this pilot sample volunteered to participate in the fMRI study about 6 months later. that the degree of temporal prediction measured during finger tapping will decrease with increasing visual working-memory load. A parametric analysis of the hemodynamic responses was conducted to reveal activations in brain networks that covary with the degree of temporal prediction measured during synchronized finger tapping with a tempo-changing pacing signal.

PARTICIPANTS
Twenty musicians (10 females) participated in the study. Two participants were excluded from the analyses due to incomplete behavioral data and excessive head movement during fMRI data acquisition. The mean age of the remaining sample of 18 participants (9 females) was 25.5 years (SD = 3.79). Participants were amateur musicians (n = 9) and university music students (n = 9) with varying amounts of musical experience (years of instrument playing/singing summed over all instruments: range = 11-57, M = 27.7, SD = 13.1). Participants had started playing an instrument (or singing) when aged between 4 and 16 years (M = 7.8, SD = 2.9) and played one to four instruments (M = 2.8, SD = 1.1). All participants were right-handed according to the Edinburgh handedness inventory (Oldfield, 1971) and had no history of psychiatric, major medical, or neurological disorder. After being informed about potential risks of participation in an fMRI study and being screened for exclusion criteria by a physician at the institute, participants gave their written informed consent to undertake the study. The experiment was performed in accordance with ethical standards compliant with the Declaration of Helsinki and approved by the ethics committee of the University of Leipzig.

Auditory stimuli
During each trial, 47-50 identical pacing sounds (1000 Hz sinewave tones with 200 ms duration and 10 ms linear rise and decay ramps) were presented (to guide finger tapping in experimental conditions 1-3, see Figure 1A). The number of initial pacing sounds was varied from 5-8 tones in order to implement a variable jitter in the time point at which the tempo changes started. These initial pacing sounds (isochronous phase) were presented with an IOI of 600 ms, after which the tempo changed gradually (tempo-changing phase). Tempo changes were designed to resemble tempo variations found in performed music (accelerando and ritardando) and followed quadratic functions. Each trial contained 6 tempo changes that proceeded over 5, 8, or 10 pacing tones (corresponding to 5, 7, or 11% change rate, respectively) and spanned a tempo range from 387 to 600 ms IOI. The end of each trial was signaled by an auditory stop signal (800 Hz sine-wave tone with 200 ms duration and 10 ms linear rise and decay ramp). Overall 14 trials with different pacing sequence versions were presented randomly intermixed, with each version being presented once per experimental condition (see for a sound example Audio S1 in the Supplementary Audio Material).

Visual stimuli
In order to implement a visual distractor task, a set of 11 novel abstract objects (from a catalog of 36 objects, called "Fribbles", Williams, 1998; obtained with permission from the Tarr lab website: http://tarrlab.cnbc.cmu.edu/; see Figure 1B) was presented sequentially during each trial. The visual objects were displayed centrally in blue on a black background, and presented one after another for a duration of 1000 ms with an intervening white fixation cross (duration 1000 ms, displayed at the center of the screen; see Figure 1A). Written task instructions were presented only at the beginning of the experiment. Prior to each trial, task conditions were coded symbolically: a constellation of filled and empty circles symbolized the n-back memory task requirement, while the presence of a line connecting these circles informed the participant about a tapping vs. no-tapping condition. This visual task instruction was displayed for 3000 ms at FIGURE 1 | (Panel A) provides an overview over auditory and visual stimuli presented in the experiment. Finger tapping is only performed in the tapping conditions (Tap, Tap1B, Tap2B). The question regarding n-back hits ("Hits?") is only presented in the n-back conditions (Tap1B, Tap2B, 2B). Red lines indicate temporal jitter implemented to decorrelate predictors of the fMRI design. Hemodynamic responses recorded during the tempo-changing phase (yellow box) were used to model the predictors of interest in the fMRI analyses. (Panel B) displays the experimental conditions of the dual task. Task instruction was given symbolically by a constellation of filled and empty circles (indicating the visual n-back task requirement) and a line connecting these circles (informing about a finger-tapping condition). A selection of novel objects ("Fribbles") is presented as an example. Red arrows indicate hits in the visual n-back task.

Frontiers in Human Neuroscience
www.frontiersin.org August 2013 | Volume 7 | Article 380 | 3 the beginning of each trial. We have used a non-verbal dual task in order to avoid covert subvocalization, which is assumed to activate motor regions [such as the supplementary motor area (SMA); e.g., Halpern et al., 2004], to reduce the likelihood of overlap with motor regions activated by the finger-tapping task.

PROCEDURE
The experiment included a total of four experimental conditions (see Figure 1B): (1) Finger tapping and object observation (Tap), (2) finger tapping and 1-back object comparisons (Tap1B), (3) finger tapping and 2-back object comparisons (Tap2B), and (4) 2-back object comparisons without finger tapping (2B). At the beginning of each trial, participants were symbolically instructed which of the experimental conditions they would encounter. Auditory stimulation began after the visual instruction had been on screen for 3 seconds (see Figure 1A). In finger-tapping conditions (Tap, Tap1B, Tap2B), participants were asked to tap their right index finger on a custom-built MRI-compatible tapping pad in synchrony with the pacing signal, starting with the third pacing sound. (Condition 2B served as a control condition, where participants only performed the 2-back working-memory task while receiving auditory stimulation without finger-tapping requirement). The tapping pad had dimensions 210 mm long, 150 mm wide, and 25 mm deep, with a 70 × 70 mm area on its upper surface marked as the target for finger taps. Taps on the pad were registered by an air-pressure sensor, which emitted a trigger that was sent to the computer running the experiment. Depending on jittering, visual object presentation started with the 5th, 6th, 7th, or 8th pacing tone, which coincided with the last regular tone of the isochronous phase. In the tapping only condition (Tap), participants were required to attend visually to the presented objects. In the remaining conditions, individuals were asked to either perform 1-back (Tap1B) or 2-back object identity comparisons (Tap2B and 2B), while keeping track of the total number of n-back hits in each trial. The end of auditory and visual object presentation was signaled by the auditory stop signal in each trial.
After a variable time interval (0-1500 ms, jittered in steps of 500 ms), participants were asked to enter the responses to either one (Tap) or two subsequently presented questions (Tap1B, Tap2B, 2B) by moving a cursor (left or right) on a horizontal 5-point visual rating scale. In conditions where n-back comparisons were performed (Tap1B, Tap2B, 2B), participants were visually instructed to report the number of n-back hits (prompted by the German equivalent for "hits?") in the foregoing trial, ranging from "1" to "5." In all four conditions, participants were asked to rate the overall difficulty (prompted by the German equivalent for "difficulty?") of each trial ranging from "very easy" (1) to "very difficult" (5). All responses were performed with the left hand on a two-button response box and participants were allowed 3 s to complete each question.
Depending on condition and jittering, the total trial length varied between ∼30 and 36 s. Trials were separated by a variable inter-trial interval of 10.5-12 s duration (jittered in steps of 500 ms). Trials were pseudo-randomized between conditions. Overall, 56 trials (14 trials per experimental condition) were completed during ∼42 min of continuous scanning.
Participants were familiar with the task prior to MRI measurement, as they had performed a near-identical dual task in a behavioral pilot experiment 6 months before scanning (details on the pilot experiment are included in Footnote 1). In addition, detailed verbal and written task instructions were given prior to scanning and participants could practice the task until they felt comfortable with performing it. During the whole experiment, participants lay supine on the scanner bed, with their right hand resting on a custom-built pressure-sensitive MIDI tapping pad. Middle and index finger of the left hand were positioned over two buttons of a response box. Participants were carefully stabilized with form-fitting cushions to prevent arm and head motion. To attenuate scanner noise, participants were provided with earplugs. Auditory stimuli were presented over MRI compatible headphones (MR Confon GmbH, www.mr-confon.de). Visual stimuli and instructions were projected by an LCD projector onto a screen positioned behind the participant's head, which was viewed through a mirror on top of the head coil. Stimulus delivery and response registration was controlled by Presentation 14.7 software (Neurobehavioral Systems, www.neurobs.com) running on a Windows computer. There was a transmission delay of 46 ms between the tapping pad and the registration software, which was subtracted from the recorded tap times before data analysis.

fMRI DATA ACQUISITION
Functional imaging was conducted on a 3T scanner (Siemens TRIO, Erlangen, Germany) equipped with the standard birdcage head coil. Thirty-one axial slices (matrix 64 × 64, field of view 192 mm, in plane resolution 3 × 3 mm, slice thickness 3 mm, interslice gap 1 mm) positioned parallel to the bicommisural plane (AC-PC) were acquired using a single-shot gradient echoplanar imaging (EPI) sequence (TE = 30 ms flip angle 90 • , acquisition bandwidth 116 kHz, repetition time 2000 ms, ascending slice acquisition order). In total, 1265 functional images (volumes) were acquired in a single run.
Geometric distortions were characterized by a B0 field-map scan. The field-map scan consisted of a gradient-echo readout (24 echoes, inter-echo time 0.95 ms) with a standard 2D phase encoding. The B0 field was obtained by a linear fit to the unwrapped phases of all odd echoes.

Finger tapping
In a first computational step in the analysis of SMS data, pacing sounds and corresponding finger taps were aligned by assigning each pacing sound to the closest finger tap within a time window of ±200 ms. Subsequently, three indicators of individual SMS performance were computed: (1) Mean absolute asynchrony (i.e., the unsigned time difference between each finger tap and the corresponding pacing sound) was calculated as an index of tapping accuracy, with low asynchronies indicating high SMS accuracy. Furthermore, we calculated (2) the variability of signed asynchronies (i.e., the SD of signed within-trial asynchronies) to inform about tapping precision, with low variability reflecting high SMS precision. Both measures were computed at the level of single trials, and then averaged across trials within each condition, as well as across conditions, for each participant. Finally, we computed (3) prediction tendencies based on the lag-0 and lag-1 cross-correlations between the ITIs produced by the participant and the IOIs of the pacing signal. As suggested by Repp (2002), both the prediction index (lag-0 CC) and the tracking index (lag-1 CC) were computed in relation to the lag-1 autocorrelation (AC) of the timing sequence (i.e., CC-AC/1-AC). After this correction, relatively low prediction and tracking indices indicate a relatively high degree of prediction and tracking, respectively. To yield a prediction/tracking ratio that is comparable with our previous studies (i.e., ratios > 1 indicate relatively stronger prediction than tracking, while the opposite is true for ratios < 1), it was computed as the ratio of the tracking (lag-1 CC) over prediction (lag-0 CC) index. First, prediction/tracking ratios were computed separately for each trial. In a second step, prediction/tracking ratios were averaged across trials of each condition, as well as across conditions (using Fisher Z-scores). In rare cases (i.e., in less than 2% of all trials) where an individual's prediction/tracking ratio in one trial yielded a value exceeding 2.5 SD of the individual's mean ratio within a condition, this prediction/tracking ratio was replaced by the mean value of the remaining non-outlying trials of this condition. These corrected prediction/tracking values for separate trials were used for the parametric analysis of fMRI data. In addition, the lag-0 and lag-1 CCs were analyzed separately to investigate whether the prediction and tracking indices were influenced differently by the experimental manipulation. Please note that these absolute values are not informative about an individual's prediction ability. It is only when considered in relation to one another (as a ratio) that they are correlated with SMS performance and reflect the degree to which an individual predicts rather than or tracks (or follows) the ongoing tempo changes in a pacing sequence.

Response to questions
Ratings of subjectively perceived task difficulty were averaged across all trials within each experimental condition. To yield an index of working-memory task performance, we computed the deviation of the given response from the correct number of identity hits (1-5) within each trial. These error values were then averaged across trials within each experimental condition.

Statistical analyses
Separate analyses of variance (ANOVAs) were run to investigate the global effect of the working-memory manipulation on the degree of prediction (i.e., the prediction/tracking ratio and the separate prediction and tracking indices), SMS performance and task difficulty. When the assumption of sphericity was not met, the Greenhouse-Geisser correction (on the degrees of freedom) was applied. Two-tailed paired t-tests were employed to compare individual experimental conditions with each other. The level of significance for these comparisons incorporated a Bonferroni correction, taking the number of tests into account.

fMRI DATA ANALYSIS
Functional imaging data were analyzed using Statistical Parametric Mapping (SPM8, Wellcome Trust Centre for Neuroimaging, University College London, www.fil.ion.ucl. ac.uk/spm) running under Matlab 7.11 (Mathworks Inc., www.mathworks.com). Images of each participant were realigned to the first image and unwarping was applied. Image distortions were corrected using a B0 field-map scan and images were coregistered to the 3D anatomical image of the participant. The 3D anatomical images were normalized into standard stereotactic space (Montreal Neurological Institute template) and obtained parameters were used for normalization of the functional data. Finally, functional images were spatially smoothed with an 8 mm full-width half-maximum Gaussian filter and high-pass filtered with a cutoff period of 400 s. Activated voxels were identified by the General Linear Model (GLM). In a first level analysis, two models were computed. Each of these two GLMs comprised predictors of interest that were obtained by modeling a boxcar function with a length of 22 s (corresponding to the length of the tempo-changing phase in a trial, see Figure 1A) convolved with the canonical hemodynamic response function and the temporal derivative. Furthermore, predictors of no interest-task instruction, the isochronous phase, and the responses to the questions in each trial-were included in each of the models.
The primary aim of this study was to reveal brain areas that covary systematically in Blood Oxygen Level Dependent (BOLD) signal amplitude with the degree of prediction during SMS. To this end, a parametric analysis was conducted within a GLM. At the level of each participant, all tapping trials (Tap, Tap1B, Tap2B) were modeled together as a single predictor and weighted with (a) the individual prediction/tracking ratio from each trial as regressor of interest and (b) two further regressors of no interest, i.e., participants' difficulty ratings (values ranging from 1 to 5) and working-memory demands (value 0 for the trials without a working-memory task, condition Tap; value 1 for trials with a 1-back working-memory task, condition Tap1B; value 2 for trials with a 2-back workingmemory task, condition Tap2B). In addition, trials in which only the 2-back working-memory task was performed (condition 2B) were modeled as a further predictor. Contrast images for parameters of the prediction/tracking ratio, the condition 2B and the contrast "prediction-2B" were used in second level for random effects analyses (one-sample t-test and conjunction analyses).
In a second set of analyses, we investigated the networks of brain areas that were related to the different experimental conditions. Within the first level GLM, four predictors of interest were modeled for each participant, corresponding to the four experimental manipulations: Tap, Tap1B, Tap2B, and 2B. For each participant, individual contrasts were generated for activations related to finger tapping (i.e., Tap2B-2B). These contrast images were entered into a second-level one-sample t-test for random effects group analysis. Furthermore, conjunction analyses (Nichols et al., 2005) were calculated in order to identify shared activations between brain networks activated by working memory (2B) and brain areas positively and negatively related to prediction.
For all second level whole brain group analyses, we only report clusters meeting an extent threshold of k > 10 voxels and a significance level of p < 0.05 corrected (using False Discovery Rate correction, FDR, for multiple comparisons).
Displays of activations were created by means of the software package MRIcron (www.mccauslandcenter.sc.edu/mricro/ mricron/index.html) by superimposing SPM t-maps resulting from the second level analyses on a mean anatomy obtained by averaging the normalized anatomy of the 18 analyzed participants. The probabilistic atlas by Diedrichsen et al. (2009) included in the SPM anatomy toolbox Version 1.8 (Eickhoff et al., 2005; Version 1.8 released in 2011) was used to determine cytoarchitectonic probabilities for cerebellar activations.

SMS performance
An overview over SMS performance, difficulty ratings and n-back performance errors in the three finger-tapping conditions is provided in Table 1. Repeated-measures ANOVAs were computed to test whether the experimental manipulation (visual n-back task) had the intended effect on the degree of prediction during SMS. The ANOVA yielded a significant main effect of experimental condition on the prediction/tracking ratio [F (2, 34) = 16.83, p < 0.001, η 2 = 0.50; see also Figure 2A]. Paired comparisons between individual conditions revealed that prediction/tracking ratios were significantly higher during finger tapping without working-memory demands (Tap) than in the Tap1B [t (17) = 4.07, p < 0.01] and Tap2B conditions [t (17) = 5.01, p < 0.001]. Prediction tendencies did, however, not differ significantly between the two working-memory conditions [t (17) = 2.34, p = 0.03; the significance threshold due to Bonferroni correction is p = 0.016].
In line with our previous studies (Pecenka and Keller, 2009a,b), about two-thirds of the participants (13 out of 18) tended to predict (ratio > 1) the timing variations during finger tapping only (Tap), while the remaining individuals displayed a stronger tendency to track the ongoing tempo changes. In the more demanding dual-task conditions (Tap1B and Tap2B), the number of participants that showed behavioral evidence for relatively strong prediction decreased systematically (9 and 6 out of 18 participants, respectively). Overall, prediction/tracking ratios were highly correlated across the different finger-tapping conditions [Tap-Tap1B: r = 0.85; Tap-Tap2B: r = 0.74; Tap1B-Tap2B: r = 0.83; ps < 0.001].
Additional ANOVAs were conducted to test whether the experimental manipulation had an effect on our other two measures of SMS performance: SMS accuracy (i.e., mean absolute asynchronies) and SMS precision (i.e., SD of signed mean asynchronies). Regarding SMS accuracy, a significant effect of the experiment manipulation was observed [F (1.2, 20.6) = 7.25, p < 0.05, η 2 = 0.30]. Local comparisons revealed that mean absolute asynchrony was lower for the tapping only (Tap) condition than for the dual-task conditions Tap1B [t (17) = −2.74, p = 0.014] and Tap2B [t (17) = −2.81, p = 0.012; the significance threshold due to Bonferroni correction is p = 0.016]. Differences

Positive correlates of temporal prediction during SMS
The parametric analysis revealed that brain activation in a distributed network of areas covaried positively with the degree of prediction measured during SMS with a pacing signal that changed its tempo gradually (see Table 2 and Figure 3, all activations are significant at p < 0.05, FDR corrected). Large clusters of activation were revealed in medial cortical areas, including orbitofrontal and medial prefrontal cortex (Brodmann area, BA 10 and BA 11) extending into anterior cingulate cortex (ACC, BA 32), as well as in medial superior frontal gyrus (BA 8 and BA 9), middle cingulate cortex (BA 24 and BA 31) and SMA proper (BA 6), posterior cingulate cortex (BA 31) and precuneus. Bilateral activity clusters were revealed in temporal cortices covering the primary and secondary auditory cortex (BA 41, BA 42, and BA 22). In the left hemisphere, activation of the auditory cortex extended into the adjacent superior temporal gyrus (STG, BA 22) and middle temporal gyrus (BA 21). In the right hemisphere, the activation of the auditory cortex extended into the right posterior insular cortex. Additionally, bilateral activity was observed in the middle temporal gyrus extending into the inferior temporal gyrus and in the right hemisphere into the inferior frontal gyrus (IFG, pars orbitalis). Furthermore, activation clusters in the bilateral inferior parietal lobe (IPL, angular and supramarginal gyri, BA 39 and BA 40), right posterior middle temporal gyrus, left insular cortex, bilateral IFG (with a cluster on pars triangularis, BA 45, in the right hemisphere and two clusters in the left hemisphere on pars orbitalis and pars triangularis, BA 44), unilateral activity in left pre-and postcentral gyrus [primary motor areas, BA 4, dorsal premotor cortex (PMd), BA 6], left supramarginal gyrus (BA 1) and the right cerebellum (lobule VIIa/CrusI) also covaried with the degree of prediction.

Negative correlates of temporal prediction during SMS
Our parametric analysis also revealed areas that covaried negatively with prediction, i.e., these brain areas displayed higher activity when participants showed relatively weak prediction tendencies during SMS (see Figure 4A, and Table 3, all activations are significant at p < 0.05, FDR corrected). This network comprises clusters in the bilateral medial (vermis of lobule VI) and left lateral cerebellum (lobules VI and VIIa/CrusI/II), the left fusiform gyrus and inferior occipital lobe, the bilateral IPL extending into the superior parietal lobes and the right precuneus. Furthermore, the bilateral anterior insular cortex, the preSMA reaching anterior into the medial superior frontal gyrus, the bilateral frontal gyri (on the lateral surface anterior to the Frontiers in Human Neuroscience www.frontiersin.org August 2013 | Volume 7 | Article 380 | 7 premotor area) and the right middle frontal gyrus reaching into BA 45 and the left IFG (BA 45) showed stronger activation when participants displayed weaker prediction tendencies in finger tapping.

Finger-tapping
The contrast "finger tapping and 2-back object comparisons vs. 2-back object comparisons without finger tapping (Tap2B-2B)" is informative about brain regions that are activated during SMS with a tempo-changing pacing signal. Coordinates of peak activations are listed in Table 4 and activation clusters are displayed in Figure 4B, all activations are significant at p < 0.05, FDR corrected. Consistent with previous observations on SMS with auditory signals, activation was exhibited bilaterally in temporal lobes covering the auditory cortex and in a network of distributed motor-related areas (for a review see Witt et al., 2008). In the left hemisphere, superior temporal lobe activation extended into operculum and insular cortex, as well as subcortically into left thalamus and putamen. A separate cluster of left-hemispheric activity was localized on the primary motor cortex (BA 4) extending into the PMd (BA 6) and somatosensory cortex (BA 1, BA 2, BA 3). In the right hemisphere, a large cluster of activation was revealed in the cerebellum (lobule VI), while smaller clusters were localized in rolandic operculum extending into insular cortex, STG, and precentral gyrus (BA 6). Finally, medial activation was exhibited bilaterally in the SMA (preSMA and SMA proper, BA 6) reaching into the middle cingulate cortex.

2-back object comparisons (2B)
The 2B experimental condition was introduced as a control condition, to reveal brain regions involved in working-memory computations (while listening to an auditory pacing signal, but without overt synchronization). Coordinates of peak activations are listed in Table 5 and activation clusters are displayed in Figure 4C, all activations are significant at p < 0.05, FDR corrected. In line with previous n-back studies requiring object identity comparisons (for an overview see Owen et al., 2005), activations were found in a bilateral frontoparietal network that spanned IFG (BA 44/45), middle frontal gyri, precentral gyrus and preSMA (BA 6), and the inferior parietal lobes. Furthermore, areas in bilateral temporal cortices and insular cortex were activated. The largest cluster of activations spanned bilateral occipital lobes, extending into parietal lobes, the fusiform gyrus, and cerebellum (lobules VI and VIIa/CrusI in both hemispheres).

Common and distinct activations during temporal prediction and 2-back object comparisons (2B)
A conjunction analysis that was conducted to identify neural regions that were commonly activated during both conditions revealed no significant activation clusters at p < 0.05, FDR corrected. In addition to this analysis, we calculated the contrast "prediction-2B" to explore further whether the brain network underlying temporal prediction was distinct from the activation pattern related to the visual working-memory task (compare Figures 3 and 4C). This analysis confirmed that all areas reported for prediction remain significant in that contrast (p < 0.05, FDR corrected), i.e., show stronger activation for prediction than in 2B (see last two columns of Table 2). A second conjunction analysis was conducted in order to test whether there is overlap between the network negatively related to prediction and the visual working-memory task (2B). Results are listed in Table 6 (all significant at p < 0.05, FDR corrected) and confirm a highly overlapping network for both conditions.

DISCUSSION
The present fMRI study investigated the neural correlates of auditory temporal predictions during SMS with gradual tempo modulations. In our SMS task, participants tapped a finger in synchrony to an auditory pacing signal that contained tempo changes designed to resemble tempo modulations displayed in real performed music. Prediction ability during SMS was manipulated by means of a simultaneous visual n-back working-memory task comprising three levels of increasing difficulty: observation only (Tap), 1-back (Tap1B), and 2-back (Tap2B) object comparisons. Analyses of finger-tapping performance revealed that the experimental manipulation introduced by the simultaneous visual n-back task had a significant overall effect on the degree of prediction during SMS: prediction/tracking ratios decreased with increasing working-memory load. Additional analyses on the two components of the prediction/tracking ratio revealed that the working-memory manipulation significantly affected only the prediction index (i.e., the lag-0 CCs), while leaving the tracking Frontiers in Human Neuroscience www.frontiersin.org August 2013 | Volume 7 | Article 380 | 9 index (i.e., the lag-1 CCs) unchanged. A parametric analysis conducted on the hemodynamic responses recorded by fMRI revealed that higher prediction/tracking ratios during SMS were positively related to stronger activation in a distributed network of cortico-cerebellar motor-related brain areas, medial cortical areas, and the auditory cortex together with adjacent temporal areas. In addition, negative covariation with prediction was found in brain areas that corresponded very closely to the frontoparietal network that was also activated during the working-memory task without finger tapping (condition 2B).

FIGURE 4 | Clusters of increased brain activity related to (A) weak temporal prediction (i.e., negative covariation with prediction/tracking ratios) and activation clusters revealed during (B) finger tapping (contrast "Tap2B-2B") and (C) 2-back object comparisons and auditory perception of a pacing signal (2B
Our behavioral results confirmed that the experimental manipulation, which was introduced to increase cognitive load by applying a visual n-back task simultaneously with SMS, had the intended effect on finger-tapping performance: not only prediction/tracking ratios decreased with increasing workingmemory load, but also synchronization accuracy and precision. The manipulation had a significant overall effect on all measures of SMS performance, but differences between the two single working-memory conditions (Tap1B and Tap2B) were statistically not significant. Nevertheless, the Tap2B condition was perceived as significantly more difficult and individuals made more errors in the 2-back working-memory task compared to the Tap1B condition. Altogether, our experimental manipulation proved effective for increasing variability in individual prediction/tracking ratios, which are typically highly stable under single-task conditions (see Pecenka and Keller, 2011). Thus, intraindividual variability in prediction/tracking ratios can be used to analyze the neural correlates of prediction tendencies during SMS. Consistent with our previous studies (Pecenka and Keller, 2009a,b), strong prediction tendencies during SMS were also significantly related to better synchronization performance (as indexed by higher SMS accuracy and precision).

POSITIVE CORRELATES OF TEMPORAL PREDICTION DURING SMS
The parametric analysis addressing brain regions in which activation is positively related to prediction in SMS revealed a distributed cortico-cerebellar network that is traditionally associated with motor control, including such areas as dorsal premotor and motor cortices, SMA proper, IPL, and lateral cerebellum. These identified clusters correspond closely to the activation foci commonly reported in finger-tapping studies (see meta-analysis by Witt et al., 2008 for an overview). Our results suggest that the Frontiers in Human Neuroscience www.frontiersin.org August 2013 | Volume 7 | Article 380 | 10 areas that we identified as being related to prediction may not only be important for motor control in general but may play a specific role in aspects of prediction related to motor timing and temporal adaptation during SMS. This assumption is consistent with previous studies: The PMd has not only been linked to simple and complex motor processes (e.g., Rao et al., 1993;Sadato et al., 1996;Hlustik et al., 2002;Ullen et al., 2003;Debaere et al., 2004) but has also been suggested to play a role in mediating auditory-motor interactions during finger tapping by extracting higher-order information from an auditory stimulus and integrating this information with motor processes so that appropriately timed actions can be executed (Chen et al., 2006(Chen et al., , 2008a. SMA is also known to play a role in basic aspects of motor timing (e.g., Colebatch et al., 1991;Matelli et al, 1993;Rao et al., 1993;Ullen et al., 2003), but is also activated by perceptual timing tasks, independently of motor implementation (e.g., Macar et al., 2006;Bengtsson et al., 2009). In these studies, activation of SMA proper was positively related to sequence predictability (Bengtsson et al., 2009) and attention paid to time (Macar et al., 2006). Furthermore, the cerebellum is considered to be important for timing in motor, sensory, and cognitive domains (Ivry and Keele, 1989;Ivry et al., 2002;Bueti et al., 2008; O'Reilly  , 2008) and it has been identified as playing a key role in the generation of internal forward and inverse models that support fine motoric timing (Wolpert et al., 1998;Ito, 2008). The posterior cerebellum, which was activated as part of the network related to prediction, has been linked to temporal processing underlying working-memory for rhythm (Jerde et al., 2011). Finally, the IPL (together with the premotor cortex) is part of a circuit involved in action-perception matching (e.g., Molenberghs et al., 2012) and is therefore important for the integration of sensory signals, for internal models and action execution (Miall, 2003), as well as for the perceptual prediction of upcoming events (Schubotz, 2007). Altogether, the present study adds to the current knowledge about these traditional motor areas by showing that they do not only play a role in explicit timing (cf. Coull et al., 2011;Witt et al., 2008), but may fulfill a specific function in situations that require temporal prediction during motor synchronization. However, because perceptual and motor processes are closely intertwined in the parametric design of our study, an assignment of separate components of the prediction-related brain network to specific processes pertaining to temporal prediction, sensorimotor integration, or motor timing and temporal adaptation must remain speculative. Clusters in bilateral temporal cortices were also revealed to be part of the neural network related to prediction. In general, activation of auditory areas is not surprising as auditory stimulation (and scanner noise) was present during all conditions of the current experiment. However, it is remarkable that activation in the auditory cortex covaried with the degree of prediction even though auditory input was identical across all finger-tapping conditions. On the one hand, activation in auditory cortices may be related to (anticipatory) auditory imagery processes (Leaver et al., 2009;see Hubbard, 2010 for a review) that presumably support prediction during SMS (Keller, 2008). On the other hand, this modulation may reflect resources available for auditory processing, which varied depending on the cognitive load imposed by the working-memory dual task. Previous investigations using dual-task paradigms showed that workingmemory load in one task reduces resources available for a second task, even when the two tasks draw upon different sensory modalities (e.g., De Fockert et al., 2001;Klemen et al., 2009). Thus, it has been suggested that working-memory load can impact upon attentional control, which, in turn influences perceptual (and post-perceptual) processes (Klemen et al., 2009). Studies on auditory attention have furthermore revealed that activity in primary and secondary auditory cortex is modulated by attention (e.g., Jäncke et al., 1999;Petkov et al., 2004). Regarding our dual-task paradigm, these observations are consistent with the idea that high visual working-memory load may have reduced the attentional resources available for auditory processing (and presumably also for sensorimotor integration and motor timing), as indicated by lower activation in STG during decreased prediction. Finally, variations in auditory cortical activation with differing degrees of prediction may reflect auditory-motor interactions during SMS. Previous studies on rhythm perception and production have reported functionally coupled neural activity in auditory and premotor cortex (Chen et al., 2006(Chen et al., , 2008b, as well as SMA (Grahn and Rowe, 2009). A further remarkable result is that clusters along the medial wall (medial prefrontal cortex and ACC, middle cingulate cortex, posterior cingulate cortex and precuneus) were found to be more activated when participants employed greater degrees of prediction. The medial prefrontal cortex (mPFC), together with ACC, is central to an array of higher cognitive functions including social-cognitive processes (e.g., Amodio and Frith, 2006;Gilbert et al., 2006;Frith, 2007;Carrington and Bailey, 2009) as well as processes underlying action control and monitoring, such as error detection and response conflict monitoring (e.g., Cohen et al., 2000;Ullsperger and von Cramon, 2004). Contemporary approaches that have attempted to identify a common function have suggested that mPFC and ACC subserve general performance monitoring, in the sense that expectations about action outcomes are formed and moment-by-moment discrepancies between actual and expected outcomes are detected Brown, 2010, 2011). Interestingly, a recent study revealed that mPFC/ACC play a role in predicting not only action outcomes (e.g., in terms of valence), but were also sensitive to violations of the timing of these expected outcomes in a two-choice prediction task (Forster and Brown, 2011). A different study that investigated the influence of temporal predictability of auditory rhythm sequences in a passive listening task reported that activation in bilateral middle frontal gyrus (BA 10; although slightly more lateral than our mPFC activation cluster) and medial motor areas (SMA and preSMA) covaried positively with the rhythm's predictability (Bengtsson et al., 2009). In line with these findings, our results are consistent with the hypothesis that enhanced activity in medial areas (and particularly in mPFC) could reflect the formation of precise sensory predictions of temporal sequences (cf. Bengtsson et al., 2009).
Furthermore, mPFC (among other areas) has been identified as one of the core regions in a network involved in making inferences about others' beliefs and predictions about other people's actions on the basis of their mental states (for reviews on mentalizing see Frith and Frith, 2003;Saxe, 2006;Frith, 2007;Carrington and Bailey, 2009). Specifically, mPFC was commonly activated when individuals were engaged in real-time social interactions with either a real human co-actor (McCabe et al., 2001;Decety et al., 2004), a virtual partner that behaved similar to a human (Schilbach et al., 2006;Fairhurst et al., 2013), or when they were led to believe that they were interacting with a real human actor as opposed to a computer (Gallagher et al., 2002;Ramnani and Miall, 2004). Remarkably, our medial prefrontal cortex cluster overlaps with or is in very close proximity to peak activations reported in all of the social interaction studies mentioned above. This result is consistent with the possibility that our participants may have effectively interpreted the computer-generated pacing signal in the SMS task as the action outcome of another human agent. This might appear surprising, as the auditory signal was rather artificial in comparison to the sounds that are normally produced during joint music making (i.e., the signal lacked variations in microtiming, pitch, timbre and intensity). However, the gradual tempo changes implemented in our pacing signal were deliberately designed to resemble changes in real performed music (i.e., ritardando and accelerando). Furthermore, studies on action simulation and theory of mind have demonstrated that even very reduced stimuli such as point-light figures (e.g., Dittrich et al., 1996;Sevdalis and Keller, 2011) and animated shapes (e.g., Heider and Simmel, 1944;Castelli et al., 2000) can convey social meaning, particularly when participants believed that the stimulus (e.g., the music they heard) was a human product (Steinbeis and Koelsch, 2009). In addition, the activation of brain areas related to such higher-level social-cognitive functions is in line with our previous finding that the degree of prediction during SMS with tempo-changing pacing sequences is positively correlated with self-reported musical perspective taking (i.e., strong agreement to the statement "To achieve successful ensemble coordination I pay attention to weaker musicians and adapt my playing accordingly", see Pecenka and Keller, 2011). Taken together, our findings in combination with the above-mentioned evidence provide further support for the assumption, that music-even in a very reduced form-is not merely perceived as any auditory signal, but as a socially salient stimulus that conducts the presence of another agent whose intentions and actions can be interpreted and predicted (Molnar-Szakacs and Overy, 2006;Leman, 2007;Overy and Molnar-Szakacs, 2009).
Comparing the brain network underlying strong prediction with the activation pattern revealed during the visual workingmemory task (2B), we have shown that the neural underpinnings of temporal prediction are distinct from the brain network implicated in working memory. This is not surprising, as strong prediction was possible mainly in the tapping only condition (Tap), wherein no working-memory task was performed. Thus, the absence of an overlap between the two networks does not imply that the two tasks did not interfere with each other, as this was clearly evident in the behavioral data. Specifically, we hypothesized that the introduction of a visual n-back task results in a decrease in the degree of prediction during SMS because both tasks draw on shared working-memory resources. With respect to our working-memory task, we have some indication that it may have drawn on a subvocal rehearsal process, instantiated in the phonological loop system of working memory (Baddeley, 2003). In order to interfere as little as possible with motor processes during finger tapping, the visual n-back task was designed to be nonverbal (as novel objects that do not have common names, the so-called "Fribbles", were used). However, several of our participants reported having used a verbal strategy (by assigning names to the abstract objects) and subvocal rehearsal to facilitate n-back object comparisons. These reports are consistent with our finding that the neural working-memory network (associated with the 2B condition) included preSMA, an area that has been more typically observed during verbal compared to nonverbal working-memory tasks (and particularly identity comparisons; cf. Owen et al., 2005) and has been revealed as a key structure for subvocal rehearsal (e.g., Paulesu et al., 1993;Smith and Jonides, 1998;Baddeley, 2003). Therefore, interference between the two tasks in our dual-task paradigm may have occurred because both tasks draw on a shared process, namely subvocal rehearsal, instantiated in the phonological loop. The viability of this interpretation is furthermore supported by reports of interference effects of verbal processing tasks with internal timekeeping mechanisms during SMS (Sergent et al., 1993;cf. Rao et al., 1997). It has thus been suggested that internal timing during finger tapping may rely on the retrieval and rehearsal of internal auditory representations of time intervals (Rao et al., 1997;Witt et al., 2008), especially during conditions without an external pacing signal (e.g., continuation tapping). Finally, there are anecdotal reports that the learning of complex motor and rhythmic sequences (e.g., in drumming lessons) is facilitated by subvocal rehearsal.

NEGATIVE CORRELATES OF TEMPORAL PREDICTION DURING SMS
The parametric analysis on neural substrates related negatively to prediction tendencies during SMS revealed a close overlap with the frontoparietal network of brain areas activated during our visual working-memory condition (2B). The visual working-memory task was deliberately chosen to interfere with the degree of prediction during SMS and it resulted in lower prediction/tracking ratios in the Tap1B and Tap2B conditions compared to tapping only (Tap). Therefore, close overlap between brain areas that covary negatively with prediction and the brain network underlying the working-memory condition is plausible. This overlap, together with our finding that workingmemory performance did not differ between the two 2-back conditions (Tap2B and 2B), is consistent with the assumption that individuals focused their attention (and presumably also working-memory resources) predominantly on the n-back task under dual-task conditions, while simultaneous finger tapping was performed rather automatically (and resulted in a decline in synchronization performance).

CONCLUSION
In the current study, we have identified an extensive brain network related to prediction during SMS with auditory pacing sequences containing tempo changes that are similar to those that characterize expressive music performance. This network comprises motor-related cortico-cerebellar brain areas that mediate general motor control and, more specifically, perceptual prediction, sensorimotor integration, motor timing and temporal adaptation. Furthermore, our findings suggest that individuals may use higher-level cognitive functions related to mentalizing and perspective taking during action coordination with an auditory signal that bears resemblance to the action outcome of a human agent.