Keeping an eye on the conductor: neural correlates of visuo-motor synchronization and musical experience

For orchestra musicians, synchronized playing under a conductor’s direction is necessary to achieve optimal performance. Previous studies using simple auditory/visual stimuli have reported cortico-subcortical networks underlying synchronization and that training improves the accuracy of synchronization. However, it is unclear whether people who played regularly under a conductor and non-musicians activate the same networks when synchronizing with a conductor’s gestures. We conducted a functional magnetic resonance imaging (fMRI) experiment testing nonmusicians and musicians who regularly play music under a conductor. Participants were required to tap the rhythm they perceived from silent movies displaying either conductor’s gestures or a swinging metronome. Musicians performed tapping under a conductor with more precision than nonmusicians. Results from fMRI measurement showed greater activity in the anterior part of the left superior frontal gyrus (SFG) in musicians with more frequent practice under a conductor. Conversely, tapping with the metronome did not show any difference between musicians and nonmusicians, indicating that the expertize effect in tapping under the conductor does not result in a general increase in tapping performance for musicians. These results suggest that orchestra musicians have developed an advanced ability to predict conductor’s next action from the gestures.


Introduction
When listening to a symphony in a concert hall, we enjoy the music and admire the ability of the musicians to stay in excellent synchrony. How do orchestra musicians achieve such a high level of synchrony? Here we focus on the role of the conductor in a large orchestra. To produce a satisfactory performance, musicians follow the temporal cues provided by the conductors' gestures. Do orchestra musicians develop a special ability to read the conductors intentions or are they simply good at synchronized action in general? First of all in this introduction we discuss current findings in simple tapping tasks with mechanical pace makers. Second, we briefly review the field of joint action and interpersonal synchrony, and the brain regions that are activated during a tapping task. Finally, the choice of our experimental setup is motivated.
Simple tapping tasks were used in previous research on sensory-motor synchronization (SMS). Participants were asked to follow a constant rhythmic stimulation sequence mostly by finger tapping (for review, see e.g., Repp, 2005). Tapping performance is typically measured as mean asynchrony---time difference between the finger tap and the rhythmic stimulus. The difference is negative if the taps precede the stimuli. By using such a tapping task with rhythmic stimuli, previous studies have often reported negative mean asynchronies, although participants typically reported the subjective feeling of synchrony (Repp, 2005;Repp and Su, 2013). For tapping with auditory stimuli in a regular rhythm it is assumed that synchrony is established at higher cognitive (''central'') levels, and the negative values are due to different processing times for the different sensory modalities, here: the auditory pacer stimulus and the tap (Aschersleben and Prinz, 1995;Aschersleben, 2002). However, this account still needs to be detailed as the observed asynchronies depend on pacer modalities and the duration of the pacers (Aschersleben, 2002). Interestingly, tapping with rhythmic visual stimuli often shows larger variance than auditory-motor synchronization (Repp and Penel, 2002;Repp, 2003;Pollok et al., 2009). In addition, the lower limit of successful synchronization is about 400 ms for visual stimuli, compared to 150--200 ms for auditory stimuli (Repp, 2003). These modality-dependent differences were first attributed to the lower temporal resolution in the visual system, but recent studies utilizing a moving visual cue, i.e., a bouncing ball or up-down movement of a finger, observed a comparable tapping performance as with auditory clicks and better than with visual flashes (Hove et al., 2010(Hove et al., , 2013a. Musical training is known to reduce the mean asynchrony in auditory SMS (Franěk et al., 1991;Drake et al., 2000b;Krause et al., 2010). However, it is unclear whether it also affects visuo-motor synchronization.
Accurate synchronization between a conductor and musicians in an orchestra is a joint action, which requires integration of simultaneous self-and other-related behavior leading to a certain action-perception coupling in a musician's brain. This coupling may serve at least three cognitive functions: the first is to generate predictions about the outcome of one's own and others' movements (Sebanz et al., 2005;Atmaca et al., 2008;Sebanz and Knoblich, 2009), the second is to form the representation of actions by others (Keller et al., 2007;Novembre et al., 2012;Loehr et al., 2013), and the third is to integrate the co-actor's action with the self-generated action (Novembre et al., 2014). In addition, staying in synchrony with others---interpersonal synchrony---is also discussed as interest of individuals to show their affiliation to group (Pecenka and Keller, 2011;Cacioppo et al., 2014). Their results suggest that knowing what a partner will do by prediction of the partner's action is a cue for synchronized action. Interestingly, several studies in sports have further reported that expertize improves the ability to perceive and understand the behavior of opponents (Abernethy, 1990;Singer et al., 1996;Helsen and Starkes, 1999;Savelsbergh et al., 2002;Shim et al., 2005). A review paper also showed that experienced athletes are better than an amateur at detecting perceptual cues for prediction of other's actions (Mann et al., 2007). Based on this evidence, we hypothesize that orchestra musicians are superior to nonmusicians in synchronization especially when under the guidance of a conductor.
Neuroimaging studies have reported that subcortical and cortical areas whose functions range from basic timing processes to motor planning and action, such as the basal ganglia, the cerebellum, the thalamus, the motor cortex, and the supplementary motor area (SMA; Lewis and Miall, 2003;Rubia and Smith, 2004;Witt et al., 2008;Mendoza and Merchant, 2014;Merchant et al., 2015). Note, that studies on synchronous tapping of non-human primates show firstly that also monkeys can perform such tasks ideally under visual pace markers and secondly that their medial premotor areas host timer-like neurons measuring both, the time from the last marker as well as the expected time to the next marker. For a deeper discussion see the review by Merchant and Honing (2014). Although auditory and visual tapping tasks activate common brain areas such as the motor cortex, the SMA, and the cerebellum, the visual task recruits additional areas, including the ventral premotor cortex (vPMC), the insula, the putamen, and the inferior frontal gyrus (IFG; Jäncke et al., 2000;Jantzen et al., 2005;Pollok et al., 2009;Repp and Su, 2013). While musical experience increases the functional connectivity between the PMC and the thalamus in auditory-motor synchronization (Krause et al., 2010), it is unknown whether musical experience, especially the frequency of playing music under a conductor, affect the brain regions related to visuo-motor synchronization.
Current literature on the neural correlates of interpersonal synchrony report several brain regions being involved in successful synchronization. Neuroimaging studies have demonstrated that gesture recognition and imitation activates fronto-parietal areas, including the IFG and the inferior parietal lobe (IPL; Iacoboni et al., 1999;Hermsdörfer et al., 2001;Buccino et al., 2004;Chaminade et al., 2005;Mühlau et al., 2005;Pazzaglia et al., 2008;Villarreal et al., 2008;Green et al., 2009). These regions are known as a core of the mirror neuron network (Iacoboni and Dapretto, 2006;Cattaneo and Rizzolatti, 2009). The mirror neuron network is involved in both action observation and execution, leading to the concept that we interpret the actions of others by mimicking them mentally. A further region is the medial prefrontal cortex (mPFC), which is consistently activated when we think about other people's mental states (Frith and Frith, 1999;Amodio and Frith, 2006). In particular, the anterior medial part of the superior frontal gyrus (SFG) is activated by mental simulation of a partner's action (Decety et al., 1994(Decety et al., , 1997Grezes, 1998;Amodio and Frith, 2006). This region is also active during gestural communication and being in synchrony (Sebanz et al., 2007;Schippers et al., 2010;Fairhurst et al., 2013;Cacioppo et al., 2014). These results suggest that activity in the mPFC reflects successful mental simulation and more effective synchronized action.
Based on this evidence, we hypothesized that the effect of experience on predicting a partner's action would be reflected by the activity in the mPFC, particularly the SFG, as a result of more precise mental simulation than their inexperienced counterparts. This would also be the case for synchronization between a conductor and orchestral musicians. To elucidate, we measured brain activity using functional magnetic resonance imaging (fMRI) while orchestral musicians and nonmusicians performed a synchronized tapping task under the guidance of a conductor's gestures. Silent movies of conductor's gestures were chosen as stimuli as we had planned to have the stimuli as realistic as possible for musicians. It was one of our concerns that musicians might show their expertize only when they followed a conductor's gestures, but not during a simple tapping task with mechanical stimuli. Therefore we also designed a synchronized tapping task with a swinging metronome to investigate whether expertize effects in synchronized tapping are use-dependent or general improvement of sensitivity in timing processing. In addition, perturbation of rhythm was included in the task to evaluate how the brain areas associated with sensory-motor coordination respond to temporal modulation. We were interested in comparing differences between experts and novices using two groups of stimuli---the conductors as the stimulus taken from the field of expertize and the metronome as a somewhat related, though mechanical replacement.

Participants
Eleven participants who regularly played musical instruments in an orchestra (musicians: 6 males and 5 females) and 14 participants who have neither experience in playing music under a conductor or learn how to play a musical instrument (nonmusicians: 11 males and 3 females) participated in the experiment. All participants were right-handed, according to the Edinburgh Handedness Inventory (Oldfield, 1971) and had a mean age of 25 ± 3 years. They were paid for their participation and gave prior written informed consent. Procedures were conducted in accordance with the Declaration of Helsinki and the guidelines were approved by the Ethics Committee of the University of Leipzig. Musicians were members of an amateur orchestra and played music under a conductor regularly for 1--9 h per week (Mean ± SD: 4 ± 2) over the past 5 years, having at least 9 years of experience (M ± SD: 14 ± 4 years) in playing a musical instrument: violin, cello, contrabass, flute, trumpet, or trombone. Following the experiment, the musicians were asked how frequently they use a metronome during practice. Eight musicians used a digital metronome that only produced click sounds, but none of the musicians used an analog metronome with a swinging bar. None of the musicians practiced frequently with their metronome (M ± SD: 1.9 ± 0.9 on a 5-point scale; 1 = ''do not use at all'' and 5 = ''everyday use'').

Stimuli
Conducting performances of three different conductors (one male and two females) and a swinging metronome ( Figure 1A) were filmed without sounds using a digital video camera (Sony HDR-HC1E). The conductors were instructed to perform conducting gestures as they normally do. We selected 120 beats per minute (bpm) (500 ms inter-onset interval (IOI), fast condition) and 90 bpm (667 ms IOI, slow condition) as starting speeds as these are within the range of rates (400--800 ms IOI) known to yield reliable beat perception and are optimal for synchronized tapping (Drake et al., 2000a;McAuley et al., 2006). Each conductor was recorded performing two different conducting styles (constant tempo and deceleration: Figure 1B) at 120 and 90 bpm. For the constant tempo style, they were asked to maintain the speed until the last beat. For the deceleration style, they were asked to decelerate the speed during the last four beats either from 120 bpm to 90 bpm or from 90 bpm to 60 bpm, like a ritardando as they usually do in live performance. After some practice with a metronome, they conducted in each style without any external reference and filming took place. The movies were edited using Final Cut Pro (ver. 6, Apple Inc.) and the timing of each beat was calculated. A conductor's gestures generally follow a certain pattern and each beat is normally presented when the arm reaches the lowest point of each arm movement (Farberman, 1997;Luck and Nte, 2008). Thus, we defined the lowest point of the arm movements as the representation of each beat and estimated the latencies of that as reference times for each beat representation ( Figure 1C).
For the metronome stimuli, short movies of a swinging metronome ( Figure 1A, upper left) were initially filmed at nine different speeds (from 120, 112.5, 105 . . . to 60 bpm, using steps of 7.5 bpm). Since we recorded the metronome without sound, we were free to define the representation of beats as those moments in time when the bar of the metronome was located at the extreme left and right. All movies started from either 120 or 90 bpm. For the deceleration style, movies were kept at a starting speed for the first nine beats and the speed was gradually decelerated from the 10th beat to the last (13th). The impression of metronome deceleration was created as follows: The movies of the metronome at different speeds were first split into fragments showing one beat and arranged sequentially from the starting speed down to the final speed. The final movie for the deceleration style starting at 120 bpm was created as follows: the first nine beats were presented at 120 bpm, the 10th beat at 112.5 bpm, the 11th beat at 105 bpm, the 12th beat at 97.5 bpm and the final beat at 90 bpm.
The timing of each beat presented under all conditions (from the 1st to the last) was calculated by the number of frames (40 ms/frame) from the beginning of the movie. Please note that the steepness of deceleration differed between the fast condition (starting from 120 bpm) and the slow condition: deceleration started from 120 bpm (500 ms IOI) to 90 bpm (667 ms IOI) under the fast condition, while it started from 90 bpm (667 ms IOI) to 60 bpm (1000 ms IOI) under the slow condition. That is, the deceleration under the slow condition was steeper than that under the fast condition. Figures 1D,E show the time course of the inter-beat interval (IBI) of the stimuli under the constant tempo and deceleration conditions. While the IBI under the constant tempo condition appears to be similar between the conductors and the metronome, the deceleration condition showed variability among the conductors. We did not match the IBI between the conductors as we considered this variability critical for our investigation into the effects of expertize when tapping under the guidance of a conductor.

Procedure
During the fMRI scan, participants were required to synchronize the timing of tapping with their right index finger, with the timing of each beat presented by either the metronome or the conductors in the silent movie stimuli. The length of each stimulus was about 10 s in the fast condition and 12 s in the slow condition. Before entering the scanner, they trained to familiarize with the task. When training with the conductors' gestures, they were instructed to tap when the arm of the conductor was at the lowest point of each arm movement. When training with the swinging metronome, they were instructed to tap when the bar of the metronome reached extreme left and right positions. All participants understood the instruction without any problems. We checked their tapping performance visually and told them whether their tapping action was correct or not. The overall length of training lasted for 15 min and all participants performed as instructed. During the scan, participants lay supine on an MRI scanner bed with their right index finger placed on a custom-made tapping pad. The movies were projected onto a back projection screen via a video projector (Panasonic PT-D7700E). The timing of taps was measured using a custom-made air pressure sensor, which was connected to the tapping pad in the scanner room. The length of plastic tube connecting the sensor and pad was about 10 m and caused a delay of 67 ms. The sensor consists of two moving bars with an air-pressure sensor in between. Taps on the upper bar lead to a hardly perceivable noise when the finger lands on the bar. Additionally, the participants put a noise attenuating headphone because the scanner noise is normally over 100 dB. Thus, the participants were unable to hear their own taps during the experiment. Tapping performance was corrected by subtracting the delay from the timing of taps. In addition, a white cross was presented on the gray background for 12 s as the baseline (null) condition and the participants were requested to remain still at the time of recording. During the recording session, all stimuli were presented with 15 repetitions in a random order with a mean interval of 3 s jittering from 2 s to 4 s. Scans were conducted using an event-related design. The design of the experiment was a four-way mixed design with a between-subject factor of Group (musicians, nonmusicians) and three withinsubject factors of Stim (conductor, metronome), Style (constant tempo, deceleration), and Speed (fast: starting from 120 bpm, slow: starting from 90 bpm).

Behavioral Data Analysis
Participants' tapping performance was assessed using the temporal asynchrony, which is the subtraction of the time of taps from the time of corresponding beats presented by either the metronome or the conductors. A negative value represents a tap earlier than the beat. The analysis was focused on the temporal asynchrony during the last four beats (from 10th to 13th) of each sequence as the first nine beats in both style conditions were presented in the same way. The mean and the standard deviation (SD) of the temporal asynchrony were estimated for each participant under each condition and exported to R software (ver. 2.15.02). For ANOVA, we used an R package named ''anovakun'' produced by Dr. Ryuta Iseki. We conducted a four-way ANOVA with factors Group, Stim, Style, and Speed. Post hoc analyses were conducted using ANOVAs with pooled variances of the error terms from the original four-way model and Shaffer's modified Bonferroni corrected t-tests (Shaffer, 1986).

fMRI Scan Acquisition
Data were acquired using a 3 Tesla Bruker Medspec 30/100 system with a standard birdcage head coil. Functional scans were collected as gradient echo, echo-planar imaging (EPI) with a blood oxygenation level dependent (BOLD) contrast Data Pre-Processing MRI data processing was conducted using Statistical Parametric Mapping (SPM8, Wellcome Trust Centre for Neuroimaging, University College, London, UK). Using the first slice as the reference, EPI images were corrected for slice timing and realigned spatially to the first image in the series using a 6-parameter affine transformation for motion correction (3 parameters for translation and rotation, respectively). The T1 image was co-registered to the mean EPI image. Then, the T1 image was normalized (using affine and smooth nonlinear transformations) to the brain template in Montreal Neurological Institute (MNI) space. The resulting normalization parameters were applied to the co-registered EPI images. Images were smoothed using an 8 mm 3 full-width half-maximum Gaussian kernel.

Individual First-Level Analysis
First-level analysis was conducted using the general linear model. A statistical model for each participant was computed, applying a boxcar model, convolved with SPM's canonical hemodynamic function (HRF). Motion correction parameters were entered into the model as covariates and the low frequency noise was removed with a 128 s high-pass filter. For each participant, statistical parametric maps of the t-statistic (SPM [T]) were generated by comparing each condition against the null condition. These t-maps were taken to second-level analysis.

Second Level Analysis
Contrast images of each participant were subjected to secondlevel random effect analyses. In order to visualize commonly activated brain areas during both constant conditions (starting from 120 bpm and 90 bpm), a conjunction analysis was performed. Each participant's contrasts for both conditions against the null condition were used as the inputs to a secondlevel full factorial model. The obtained images were visualized with a threshold of cluster level FDR p < 0.05 and the cluster size of >100 voxels.
In order to identify regional brain activity modulated by the experience of playing music under the guidance of a conductor during synchronized tapping, we conducted separate threeway ANOVAs with factors Group, Style, and Speed for the conductor and metronome conditions. Follow-up comparisons were conducted using t-contrasts. To further investigate the interaction between the brain activity and musical experience, we conducted whole brain regression analyses using two kinds of musical experience as covariates: the number of years of playing musical instruments and the number of hours per week of playing under a conductor. A threshold was set for all statistical maps with a cluster level FDR p < 0.05. The surviving voxels were superimposed onto the MNI brain template. The voxel coordinates were converted to Talairach space using the GingerALE software (Laird et al., 2005). Anatomical labeling was provided by Talairach Client software (Lancaster et al., 2000).
To further investigate the musicians' expertize effect in the tapping performance, for both Stim conditions (conductor and metronome) we conducted separate correlation analysis between the temporal asynchronies and two kinds of musical experience: one being the number of years playing musical instruments and the other being the number of hours per week playing music under a conductor. The number of years playing musical instruments did not show correlation with the temporal asynchrony, neither for the conductors nor for the metronome. The number of hours per week playing music with a conductor, however, showed positive correlation in the deceleration conditions when tapping with the conductor (fast speed: r = 0.42, t (23) = 2.23, p = 0.036, 95% CI [0.71 0.01]; slow speed: r = 0.64, t (23) = 4.02, p < 0.001, 95% CI [0.83 0.31]). This indicates better synchronization with more frequent practice (Figure 4). The number of hours per week playing music with a conductor did not show significant correlation when tapping with the metronome.
To summarize the behavioral analysis, synchronized tapping was more challenging under the slow and deceleration conditions. Nevertheless, musicians showed higher accuracy of synchronization under the conductor than nonmusicians, which also correlates with the frequency of playing music with a conductor. In contrast, tapping with the metronome did not show any difference between musicians and nonmusicians.
fMRI Data Figure 5 displays the activated areas in the constant tempo condition while the participants kept in synchrony either with the conductor or with the metronome. A number of brain areas were found active in the conductor condition: the middle occipital gyrus (MOG), the motor areas including the pre-/post central gyrus and the SMA, widely distributed fronto-parietal areas, including the IFG and the IPL, and the cerebellum. There was also activation in the subcortical areas, including the thalamus, the insula, and the basal ganglia, although these areas are not shown in Figure 4. On the contrary, the activated areas in the metronome conditions were relatively small, but included similar areas as those found in the conductor condition, namely; the occipital lobe, the pre-/post central gyrus, the cerebellum, and the subcortical areas, including the thalamus, insula, and the basal ganglia.
Under the conductor condition, the three-way ANOVA only revealed a main effect of Group. Thus, we created t-contrasts between musicians and nonmusicians to brain regions being more strongly activated in either group. The left SFG was identified with stronger activity for musicians than nonmusicians (Figure 6A). There was no brain region with stronger activity for nonmusicians than musicians. Under the constant tempo condition, planned whole brain regression analyses with two kinds of musical experience did not show any correlated brain areas. In the deceleration condition, however, the regression analysis with the number of hours per week playing music with a conductor showed positive correlation in the anterior part of SFG/MFG (Figure 6B). These results indicated that playing music more frequently under the guidance of a conductor leads to stronger SFG/MFG activity, at least under the condition in which the conductors decelerated the tempo. On the other hand, the three-way ANOVA in the metronome condition only showed a main effect of Style. The t-contrasts between the constant tempo and deceleration conditions showed stronger activity in the right IFG, IPL, and FIGURE 5 | The brain areas that were activated by the conjunction analysis of the constant tempo condition (120 bpm and 90 bpm). (A) The activated areas under the conductor condition. (B) The activated areas under the metronome condition. A threshold was set at cluster level FDR of p < 0.05 and a cluster size of more than 100 voxels for all activated voxels. Activation in musicians (red) and nonmusicians (blue) are superimposed on the MNI template brain. the fusiform gyrus (FG) for the deceleration condition than the constant tempo condition (Figure 7). The peak coordinates of the t-contrasts, shown in Figures 6A, 7, are listed in Table 1.

Discussion
The present study investigated visuo-motor synchronization in musicians and nonmusicians using movies of a conductor's gestures and a swinging metronome. Behavioral performance showed that musicians' tapping following a conductor's gestures was synchronized more precisely than tapping with the metronome. The superiority of musicians' tapping was observed in the conductor condition, especially when the conductors  decelerated the tempo. Furthermore, fMRI results showed that the frequency of playing music with a conductor had a significant influence on the activity in the anterior part of the SFG/MFG, especially when the conductor changed the tempo during the tapping task. In contrast, when tapping with the metronome, neither behavioral performance nor brain activity showed significant differences between musicians and nonmusicians.

Temporal Asynchrony and Effects of Experience Playing Music Under a Conductor
In the present study, using complex human and mechanical motions as visual stimuli, we observed the negative mean asynchrony as in the previous studies (Repp, 2005;Luck and Sloboda, 2008;Repp and Su, 2013). Additionally, musicians produced small positive asynchrony (less than 50 ms) while tapping with the conductor in the fast constant tempo (Figures 2A, 3A left). This may be an interesting observation for research on models of SMS, but we suggest that this positivity is still evidence of predictive tapping because the value of positivity is shorter than possible reaction time (about 100 ms). Tapping performance under the conductor condition meets our hypothesis that orchestra musicians are better to synchronize with a conductor than nonmusicians. The previous studies in sports suggested that better performance in experts is based on better prediction of opponents' actions (Abernethy, 1990;Singer et al., 1996;Helsen and Starkes, 1999;Savelsbergh et al., 2002;Shim et al., 2005;Mann et al., 2007). The deceleration conditions in tapping under the conductor (Figure 2A) appear that the temporal asynchrony in musicians increased after the 10th beat and remained at this level, while the asynchrony in nonmusicians increased. Together with expertize effects in sports, this difference between musicians and nonmusicians may reflect musicians' higher proficiency in predicting the conductor's gestures than nonmusicians. Correlations between the temporal asynchrony and the frequency of playing music under the guidance of a conductor may support this interpretation. Although the effect of musical experience in tapping under a conductor is comparable with the previous tapping studies (Franěk et al., 1991;Chen et al., 2008;Luck and Nte, 2008;Repp, 2010), the results under the metronome condition did not show any effect of musical experience. The difference between conductor and metronome conditions may be due to the lack of familiarity and experience in the case of the metronome. Generally, an analog metronome produces click sounds, which is normally used as the primary cue to synchronize action. Thus, synchronized action with the silent metronome as in the present study was completely new for both musicians and nonmusicians. The lack of an expertize effect for the metronome condition shows that musicians' superiority in tapping with the conductor is not due to an improvement of visuo-motor coordination in general. Rather, the improvement was achieved in a task-specific manner and the experience of playing music with a conductor is a crucial factor for precise synchronization with the conductor's gestures.

Impact of Frequency of Playing Music Under a Conductor on Brain Activity
The present study found that tapping under the conductor activates the left SFG/MFG in musicians more strongly than in nonmusicians. This supports our hypothesis about the experience of playing music under a conductor and the activity in the mPFC. These regions are known as a part of the network for social interaction. In particular, several studies reported that these regions are activated when an observer mentally simulates a partner's actions (Decety et al., 1994;Grafton et al., 1996;Grezes, 1998) and predicts the intention of a partner's gestures (Grèzes et al., 2004;de Lange et al., 2008;Centelles et al., 2011;Liew et al., 2011;Spunt et al., 2011). In addition, being in synchrony activates these regions more than being out-of-synchrony in a tapping task (Fairhurst et al., 2013;Cacioppo et al., 2014). Similar processes should occur during tapping with the conductor. Better tapping performance and stronger activity in the SFG/MFG in musicians suggest that musicians had more precise mental simulation for tapping under the conductor condition than nonmusicians. This is represented by positive correlations between the activity in the SFG/MFG and the frequency of playing music under a conductor ( Figure 5B). Interestingly, the mPFC, including the SFG/MFG, is also related to ''mentalizing'', which is the ability to represent another person's psychological perspective (Frith and Frith, 1999;Amodio and Frith, 2006). Frith and Frith (1999) suggested three components of the mentalizing function with corresponding brain areas: (1) the superior temporal sulcus (STS) for detection of the behavior of agents; (2) the inferior frontal areas for representations of goals; and (3) the anterior part of the SFG for simulation of another's behavior with the representation of our own mental states. The STS is also involved in joint attention, such as following the gaze of a partner (Redcay et al., 2010). As the design of the present study does not allow specifying any relationship between synchronized action with a conductor and the mentalizing function, the distinct role in these areas remains an interesting question for future research.

Brain Activity when Tapping with the Metronome and Effect of Tempo Change
Under the metronome condition, musicians and nonmusicians showed similar activity patterns. This mainly included the motor-related areas, visual areas, cerebellum, and the subcortical structures as shown in previous studies (Rubia and Smith, 2004;Wiener et al., 2010;Merchant et al., 2013aMerchant et al., , 2015. Interestingly, non-human primates also showed spike activity in the corresponding areas of the SMA, the putamen, the premotor cortex while rhythmic tapping with a sequence of auditory/visual stimuli, possibly suggesting similar neural networks for synchronized action between species (Merchant et al., 2011(Merchant et al., , 2013bBartolo et al., 2014;Crowe et al., 2014;Merchant and Honing, 2014). In addition, the activity in the FG, the precentral gyrus, and the IPL increased with the tempo change. With regard to time management, two distinct systems have been suggested: automatic and cognitively controlled timing systems (Lewis and Miall, 2003). The automatic timing system involves brain regions within the motor network, including the motor cortex, SMA, and cerebellum. That being said, the cognitive controlled timing system involves brain regions that contribute to cognitive abilities, such as working memory or attention, within the prefrontal and parietal cortices. The deceleration conditions in the present study requires many more cognitive resources to follow the beats than the constant tempo condition, thus the observed difference between the deceleration and constant tempo conditions may reflect the contribution of the cognitive timing system.
Although behavioral performance showed an effect of deceleration under both conductor and metronome conditions, brain activity did not show corresponding changes under the conductor condition. As far as we are aware, no study has addressed which regions of the brain are related to the tempo change in human sequential action. Therefore, we are only able to speculate as to why the aforementioned results were obtained in the present study. There are several differences between the deceleration in the metronome and the conductor. One possible interpretation might be related to the difference in the familiarity with the tempo change between a conductor and a metronome. Before the experiment, no one had ever seen the deceleration by the metronome. Thus, the unique experience in the present study might strongly stimulate the brain regions related to the cognitive processing of temporal information. In addition, deceleration only occurred during the last 2 s of the movie stimuli. Considering the delay of the BOLD change after stimulation in general, our fMRI measurement might only detect the initial rise of the BOLD change by the deceleration. Although the effect of deceleration did not reach significant level under the conductor condition, a small spot was found in the right IFG with a relaxed threshold (uncorrected p < 0.005). This might indicate the initial rise in activity caused by deceleration under the conductor condition.

Conclusion
The present study demonstrated that the frequency of playing music under the guidance of a conductor has an impact on visuo-motor synchronization following a conductor's gestures. The results indicated better tapping performance while tapping under the conductor, which corresponded with the wide distribution of the brain activity, including the fronto-parietal areas. The fMRI results also indicated that the anterior part of the left SFG specifically was more engaged in musicians than nonmusicians while tapping under a conductor. One possible interpretation is that musicians predicted the timing of the beats by mental simulation from the conductor's gestures. In contrast, tapping with the metronome showed effects relating to the temporal modulation in both musicians and nonmusicians. This might be comparable with the theory of the cognitively controlled timing system. These results suggest that frequent practice in playing music under a conductor improves orchestra musicians' ability to mentally simulate a conductor's gestures, leading to superior performance in synchronized tapping and stronger activity in the SFG than nonmusicians.