Reduced Performance During a Sentence Repetition Task by Continuous Theta-Burst Magnetic Stimulation of the Pre-supplementary Motor Area

The pre-supplementary motor area (pre-SMA) is engaged in speech comprehension under difficult circumstances such as poor acoustic signal quality or time-critical conditions. Previous studies found that left pre-SMA is activated when subjects listen to accelerated speech. Here, the functional role of pre-SMA was tested for accelerated speech comprehension by inducing a transient “virtual lesion” using continuous theta-burst stimulation (cTBS). Participants were tested (1) prior to (pre-baseline), (2) 10 min after (test condition for the cTBS effect), and (3) 60 min after stimulation (post-baseline) using a sentence repetition task (formant-synthesized at rates of 8, 10, 12, 14, and 16 syllables/s). Speech comprehension was quantified by the percentage of correctly reproduced speech material. For high speech rates, subjects showed decreased performance after cTBS of pre-SMA. Regarding the error pattern, the number of incorrect words without any semantic or phonological similarity to the target context increased, while related words decreased. Thus, the transient impairment of pre-SMA seems to affect its inhibitory function that normally eliminates erroneous speech material prior to speaking or, in case of perception, prior to encoding into a semantically/pragmatically meaningful message.


INTRODUCTION
The supplementary motor area (SMA) can be subdivided into SMA proper and a more anterior part, i.e., the pre-SMA (Picard and Stick, 2001;Nachev et al., 2008). SMA proper-the border of which to pre-SMA is defined by the vertical crossing of the anterior commissure (Picard and Stick, 2001;Kim et al., 2010)-seems to be primarily involved in motor control tasks as an interface for movement initiation and temporal triggering in case of, e.g., syllable repetitions (Brendel et al., 2010). By contrast, the pre-SMA is assumed to be associated with cognitive control functions beyond the motor domain (Chouinard and Paus, 2010). For instance, pre-SMA was found to be involved in task switching (Kennerley et al., 2004), managing inhibitory mechanisms, e.g., in stop-signal tasks (Chen et al., 2009;Kwon and Kwon, 2013), response selection processes Gracco, 2006, 2010), and complex sequencing, e.g., coordination of memory representations and temporal event structure (Kotz and Schwartze, 2010). Pre-SMA and SMA proper have, so far, not been considered to be part of the language system, but seem to play an important role during speech processing (Geranmayeh et al., 2017). As concerns cortico-cortical connectivity, pre-SMA is linked to regions in prefrontal cortex, inferior frontal cortex, angular gyrus, and ACC (Kim et al., 2010). Both, pre-SMA as well as the ACC are connected to Broca's region via the frontal aslant tract (Lawes et al., 2008;Oishi et al., 2008;Ford et al., 2010) which was found to be lateralized to the language-dominant hemisphere (Catani et al., 2012). Further, SMA shows a wide range of white matter connections with motor, language areas as well as the limbic system (Vergani et al., 2014). Thereby, parallel fiber bundles feed from SMA and ACC into inferior frontal regions and both were found to contribute to initiation of vocalization (Penfield and Welch, 1951;Simonyan and Horwitz, 2011). Lima et al. (2016) argued for the engagement of SMA across a variety of sound categories such as speech, nonspeech vocalizations, and music. Even the processing of socio-emotional information could be supported by anatomical connections between SMA and the limbic system (Rodigari and Oliveri, 2014). Both, left SMA and IFG (sub-) modules, are suggested to build a network (SMA-IFG-complex) relevant to speech reconstruction (Hertrich et al., 2016).
Regarding speech and language processing, pre-SMA was found to be relevant especially under conditions with high task demands, e.g., under time-critical circumstances. A recent functional magnetic resonance imaging (fMRI) study suggested that the "bottleneck" for understanding accelerated speech is limited by frontal cortex functions rather than auditory processing, as indicated by activation of pre-SMA and inferior frontal gyrus (IFG) when speech rate reaches the limits of intelligibility (Vagharchakian et al., 2012). Accordingly, a further fMRI study found increased left pre-SMA activation in individuals trained to comprehend ultra-fast speech at rates of 16-18 syllables per second (syl/s) (Dietrich et al., 2013a). The authors thereby suggested that pre-SMA is involved in the coordination of phonological-phonetic representations (left hemisphere) with syllable-prosodic event timing (right hemisphere), adjusting inner speech components to the timing of the incoming speech signal during listening (Dietrich et al., 2013a;Hertrich et al., 2013). In other words, pre-SMA has been suggested to trigger the mechanisms of predicting the next incoming speech items or reconstructing unintelligible linguistic items (Hertrich et al., 2016). Thus, reconstruction of missing information should be compatible to the already understood information, i.e., it should fit into the semantic and phonological context. Previous studies describing the role of pre-SMA during stop-signal tasks (Chen et al., 2009;Kwon and Kwon, 2013) and response selection processes Gracco, 2006, 2010) suggested inhibitory mechanisms of the pre-SMA for managing these top-down aspects. This means that implausible top-down derived materials are inhibited whereas semantically plausible items are facilitated. Thus, pre-SMA is considered as a superordinate control structure with regard to inhibitory mechanisms on top-down processing during speech perception.
In order to test this hypothesized functional role of pre-SMA, continuous theta-burst transcranial magnetic stimulation (cTBS) was applied to healthy subjects. Selecting such kind of TMS protocol was found to induce inhibitory effects (virtual lesion) shortly (ca. 10 min) after stimulation for ∼30 min (Huang et al., 2005;Murakami et al., 2015). For the present study, a pre/post1/post2 design was used predicting the TMS effect in the post1 compared to the two baseline measurements (pre/post2). The post-stimulation baseline was introduced in order to control for potential training effects during the experiment. The exact stimulation site was determined on the basis of previous fMRI data showing increased activity after training of ultra-fast speech comprehension (18 syl/s) (Dietrich et al., 2013a). Thus, pre-SMA was found to be a region relevant to accelerated speech comprehension. As a control stimulation site, left middle occipital gyrus (MOG) was chosen. MOG is involved in visual processing, but cannot be considered part of the auditory speech processing network (Restle et al., 2012;Murakami et al., 2015). Speech comprehension was measured with a sentence repetition task comprising sentences at distinct syllable rates ranging from moderately fast (8 syl/s) to ultra-fast speech (16 syl/s). The latter syllable rate was almost unintelligible to untrained listeners. We hypothesized that TMS-induced suppression of pre-SMA activity transiently reduces speech comprehension at high speech rates, due to an impairment of the reconstruction of unintelligible information by top-down processing. Furthermore, the assumption of an inhibitory function of pre-SMA was addressed by a qualitative analysis of errors in subjects' repetitions indicating unsuccessful attempts to reconstruct missing material. As a second hypothesis, suppression of pre-SMA was expected to impair the inhibition of implausible errors. Plausibility of reconstructed alternatives was considered in terms of semantic and/or phonological similarity between the reproduced material and the correct target words.

Participants
Thirty-six adult volunteers participated in the study. Half of them (n = 18) underwent cTBS over left pre-supplementary motor area (pre-SMA, experimental group, mean age = 30.4, SD = 9.04), half of them served as a control group (n = 18) performing the repetition task with cTBS over left middle occipital gyrus (MOG, age = 29.2, SD = 11.18). All participants were male, right-handed (Edinburgh handedness inventory, laterality index of the experimental group: 88.9, SD = 13.23; control group: 83.5, SD = 16.56), native German speakers. None of them had any signs of neurological or psychiatric disorders, and all had normal hearing thresholds between −10 and +15 dB on each ear, tested for frequencies between 250 Hz and 4 kHz. Women were excluded from participation because the menstrual cycle can alter neuronal network excitability (Smith et al., 2002). All subjects provided written informed consent prior to participation, and the experimental procedures were approved by the ethics committee of the Medical Faculty of the University of Tübingen.

Experimental Design and Procedure
Participants performed sentence repetition tasks encompassing 135 sentences of a length of 18 syllables each (∼10 words in order to limit memory load). The stimuli were based on newspaper materials and converted to speech by formant synthesis ["eloquence" implemented in the screen-reader software JAWS (freedom scientific, USA)] at five distinct speech rates: 8, 10, 12, 14, and 16 syl/s. Materials comprised three subtests for (pre-/post-) baseline and test measurements, with nine items per speech rate within each subtest (45 sentences per subtest, see Supplementary Table 1). The sentences were different at each recording point, and speech rates were randomized. The sentences were played via headphones in a sound attenuated room. Subjects were asked to repeat them "as accurately as possible" and "as fast as possible" after sentence offset, even when they failed to grasp all words. The subjects' repetitions were digitally recorded (M-audio Microtrack 2496, 16 bit, 44,100 samples/s) and underwent subsequent quantitative and qualitative evaluation of speech comprehension. Participants performed the repetition task prior to cTBS (pre), 10 min after cTBS (within the assumed time interval of the maximum virtual lesion effect, post1), and 60 min after cTBS (post2) ( Figure 1A). The TMS protocol was adopted from Huang et al. (2005) who documented the return of changes in MEP amplitude to baseline levels after 60 min. Data from this group were from six subjects showing suppression at 25 and 45 min but no effect at 61 and 65 min (Huang et al., 2005). Since "training" effects from pre to post1 and post2 could occur when subjects adapt listening to synthetic speech, the post2-baseline was added in order to eliminate interference between "training" (increase of performance) and TMS inhibition (decrease of performance). Prior to the experimental session, a set of 18 practice trials was presented to the subjects to get acquainted with the test situation and the sound of the speech synthesizer. The three subsets of stimuli were rotated across participants with regard to the baseline (pre, post2) and test (post1) runs. The time intervals between pre, post1, and post2 testing were equal for the two subject groups (cTBS over pre-SMA or MOG). The repetition task per time point of measurement (pre or post1 or post2) had a duration of ∼10 min.

Transcranial Magnetic Stimulation
Prior to the main experiment, all participants were MRI-scanned to obtain a high-resolution T 1 -weighted anatomical dataset using a 3T Prisma Siemens scanner (resolution = 1 × 1 × 1 mm 3 ) for the purpose of neuro-navigation. The pre-SMA stimulation site (MNI group coordinates x = −6, y = 9, z = 60, Figure 1B) was determined on the basis of the group average of fMRI activity in a previous study (performed with different subjects) that showed pre-SMA is involved in accelerated speech processing (Dietrich et al., 2013a). The control stimulation site in left MOG (MNI coordinates x = −44, y = −74, z = 6, Figure 1B) was adopted from studies that similarly used MOG as a control area for speech experiments (Restle et al., 2012;Murakami et al., 2015). These authors investigated the effect of cTBS on speechrelated MEPs during a passive listening task. Regarding these studies, stimulation was applied to target sites of the left dorsal and ventral auditory stream within IFG and temporal lobe. In order to investigate the topographical specificity of cTBS effects, they targeted left MOG as control site, which is involved in visual processing, but is not considered as part of the auditory speech-processing network (Restle et al., 2012;Murakami et al., 2015).
These coordinates were identified on each subject's brain by using a TMS neuro-navigation system (Localite GmbH, Sankt Augustin, Germany). The individual structural T1-weighted image (3D high-resolution) was imported and MNI target coordinates were transformed to individual ones. For spatial calibration, five skull landmarks (nasion, bilateral corners of the eyes, and bilateral pre-auricular points) and 200 points on the surface of scalp of each subject were fitted to the 3D image. Errors between the subject's scalp and the image were allowed within 3 mm. The stimulating coil was visually navigated to the stimulation site and kept there on the basis of real time feedback of the coil position throughout the cTBS application. CTBS was applied through a biphasic magnetic stimulator (Magstim Super Rapid) and a 70 mm figure-of-eight stimulating coil (Magstim Company, UK). In line with Huang et al. (2005), 600 pulses were applied in a theta burst-pattern (bursts of three pulses at 50 Hz, repeated at 5 Hz for a duration of 40 s). By adopting the approach of Mars et al. (2009), stimulation intensity was adjusted to 120% resting motor threshold (RMT) for the abductor pollicis brevis muscle of the right hand. RMT was defined as the minimum stimulator output which was required to produce motor-evoked potentials (MEP) >50 µV peak-to-peak amplitude in at least 5 out of 10 consecutive trials at the optimal scalp position overt left motor cortex (Rossini et al., 2015). According to a previous study (Grossheinrich et al., 2009) a stimulation intensity of 80% RMT of the tibialis anterior muscle for cTBS over the medial prefrontal cortex (electrode position Fz = midline frontal) can be considered safe. Given that the RMT of lower limb muscles is considerably higher than that of small hand muscles by a factor of about 1.6 (Chen et al., 1998), a stimulation intensity of 120% RMT of the abductor pollicis brevis muscle applied over the pre-SMA was considered to be safe for cTBS in the present study. The average stimulation intensity (120% of RMT) was 46.2% of maximum stimulator output (range 40%−58%) in the experimental group (pre-SMA). In the control group (MOG) a stimulation intensity of 40% of maximum stimulator output (lowest value of the experimental group) was used for all subjects. This low intensity was used because during stimulation of MOG some subjects reported strange feelings and/or pain around the stimulated region (temporal muscle) when RMT was in the range defined individually for a subject. The stronger stimulus intensity for pre-SMA as compared to MOG might confound the results to some extent. However, a more serious influence would have been expected by irritating nerve stimulation in case of high MOG stimulation intensities. The coil was placed tangentially over the left pre-SMA, with the coil handle pointing backwards at an angle of 45 • to the anterior-posterior axis. Concerning the left MOG, the coil handle pointed downwards parallel to the vertical crossing of the anterior commissure. Thus, direct stimulation of the temporal muscle could be avoided.

Evaluation of Behavioral Data
Based on the audio recordings, the repetition trials were orthographically transcribed. Word by word, the utterances were categorized as correct when the production was identical or nearly identical such as in case of a deviant singular/plural endings (e.g., "Flugzeug"-"Flugzeuge"), deviant tense (e.g., "soll"-"sollte"), or deviant (short) prefix (e.g., "um-geworfen"-"geworfen"). In these cases, the syllables of the target were judged as correct (see Supplementary Table 2 for examples). Incorrect responses were categorized as errors that could either be "unrelated" or "related" to the target words. Thereby, items showing semantic and/or phonological similarity to the target word were categorized as related whereas all other errors were considered as unrelated. Semantic relatedness comprises synonyms (e.g., "Fest"-"Feier"), words belonging to the target word field (e.g., "Pferde"-"Schafe"), substitution of definite and indefinite articles or possessive pronouns (e.g., "eine"-"die"-"ihre"), separated or fused prepositions (e.g., "bei dem"-"beim"), semantically plausible prefix addition or substitution (e.g., "um-geworfen"-"herunter-geworfen"), substitution of auxiliary verbs (e.g., "muss"-"soll"), and deviant gender of pronouns (e.g., "er"-"sie"). Phonological relatedness was determined by overall phonological similarity such as shared phonological features with the target, e.g., p/b, p/f, l/r, p/t, substitution of only single vowels while at least half of syllables of a response word was identical or nearly identical with the target (e.g., "Künstlerin"-"Kanzlerin"). In case that in a reproduced word syllables of a target word were missing while the remaining part showed phonological similarity (e.g., "Marathon"-"Mal"), the reproduced syllables were counted as related (for further  examples see Supplementary Table 2). Missing words, which were not substituted by an incorrect word, were counted as silent events weighted with the number of syllables within these words (missing minus incorrect = silent). In case of negative values (more incorrect than missing target syllables) the silent value was set to zero. Finally, qualitative aspects of incorrect repetitions, i.e., the percentage of unrelated errors (based on the total amount of incorrect errors), indicating the monitoring functions with respect to plausibility, were analyzed.
The person evaluating individuals' responses was blinded, i.e., the time points of measurement (pre/post1/post2) were not disclosed. In order to assess inter-rater reliability, a second person additionally evaluated responses of 11 subjects from the experimental group (pre-SMA). Inter-rater reliability was acceptable as determined by Cronbach's alpha with respect to the number of syllables given to each parameter within each sentence (correct repetitions α = 0.986, silent events α = 0.984, incorrect repetitions α = 0.967, unrelated errors α = 0.933, related errors: α = 0.843).

Statistical Analysis
First, in order to describe alterations of speech comprehension as a function of speech rate, the distributions of parameterscorrect and incorrect responses, silent events, unrelated, and related errors-were plotted against the five speech rates (8, 10, 12, 14, 16 syl/s). Data of each speech rate were pooled across all measurement time points (pre, post1, post2) and across the experimental (pre-SMA) and control (MOG) group. Repetition performance was quantified as the proportion of correct, incorrect (related plus unrelated), and silent events, which additively resulted in 100% (= 18 syllables per sentence). Numeric values are listed in Table 1.
In order to obtain a single estimate of the overall performance (correct repetitions) across all speech rates (quantitative analysis), a psychometric function (Wichmann and Hill, 2001a,b) was fitted to the percentage of correct syllables across the five speech rates (Figure 2), and from this function the syllable rate with correct reproduction of 80% was determined (Figure 2). The 80% value was chosen because at this point speech comprehension is still present, but under time-critical circumstances, requiring the hypothesized function of pre-SMA.
Statistical assessment of the cTBS effect was performed by means of a repeated measure ANOVA using the withinsubject factor TIME (pre/post1/post2) and the between-subject factor SITE (SMA/MOG). This kind of analysis was applied to the overall performance (= syllable rates at 80% correctly reproduced words). Thereby, statistical analysis was based on three a priori assumptions: (i) Since a pre/post1/post2 design was used predicting the effect of cTBS after post1 compared to baseline measurements (pre/post2), a quadratic relationship was strongly expected. (ii) Further, since an inhibitory TMS Syllables resulted from nine sentences per speech rate (= 45 sentences) were summed and afterwards averaged across subjects (n = 18). Mean and standard deviation (in parenthesis) for the experimental (pre-SMA) and control (MOG) stimulation site. Slight inaccuracies of the sum (162 syllables for each rate or 810 syllables across all rates) of correct, silent, and incorrect events resulted from incorrect syllables more than substituted for the target.
protocol was applied, reduction of performance ("dip" of the quadratic function) was predicted justifying one-tailed testing of the quadratic relationship of TIME. (iii) Further, transient reduction of speech performance was expected exclusively for the pre-SMA stimulation site, while stimulation of MOG, which was selected as speech irrelevant control area (Restle et al., 2012;Murakami et al., 2015), should not lead to a reduction in speech performance. Therefore, statistical analyses focused on the interaction TIME × SITE, expected to be significant with a quadratic relationship of TIME. Since the approach provided clear a priori justifications given by the design (post1), protocol ("dip"), and control area (MOG), we skipped the first analysis of global effects (tests out of interest, many degrees of freedom concealing possibly effects of interest) and directly considered the interaction of interest. Additionally, comparing directly preto post1-measurements after pre-SMA and MOG stimulation, a FIGURE 2 | Percentage of correctly reproduced speech material as a function of syllable rate (8, 10, 12, 14, 16 syl/s), fitted to a psychometric function (exemplified for a single subject = blue curve), Blue dots correspond to the individual syllable rate at which subjects' performance of speech comprehension amounts 80%, as determined by the psychometric function. Each data point corresponds to a single subject (exemplified for the pre-baseline condition of the experimental subject group).
repeated measures ANOVA with the inner-subject factor TIME (pre/post1) and the between subject factor SITE (pre-SMA/MOG) was conducted. Thereby, the difference between pre-and post2measurement was added as covariate controlling for baseline variables. Furthermore, investigating the hypothesis that TMS-induced suppression of overall performance depends on speech rate, a repeated measures ANOVA with the factors TIME, SITE, and RATE (with the levels low rates = 8, 10 syl/s and high rates = 12, 14 syl/s) was applied. Thereby, percent values of syllables within correctly reproduced words (based on maximal 18 syllables per sentence) were averaged across low (8, 10 syl/s) and high (12, 14 syl/s) speech rates. We expected a significant three-way interaction between TIME × SITE × RATE indicating a stronger TMS-induced reduction of performance at high as compared to low speech rates.
In order to obtain a second, qualitative parameter of performance a differential analysis of erroneous speech material was performed. To these ends, the percentage of unrelated errors based on the total number of incorrect syllables were analyzed (the sum of unrelated and related errors resulted in the number of incorrect repetitions). This analysis was performed on pooled data across all five speech rates, since incorrect repetitions showed only few events within single speech rates.
As concerns the qualitative analysis (percentage of unrelated errors), we expected an increase of unrelated errors after cTBS (post1) compared to the baseline measurements (pre, post2) after pre-SMA stimulation. Additionally, comparing directly pre-to post1-measurements after pre-SMA and MOG stimulation, a repeated measures ANOVA with the inner-subject factor TIME (pre/post1) and the between subject factor SITE (pre-SMA/MOG) was conducted. Thereby, the difference between pre-and post2measurement was added as covariate controlling for baseline variables. Effect sizes for the pre-post1 comparisons were given as Cohen's d.
Values of each parameter (syllable rates at 80% correct repetitions; percent correct repetitions [based on max. 18 syllables per sentence] at low (8, 10 syl/s) and high rates (12, 14 syl/s); percentage of unrelated errors [based on the total number of incorrect repetitions]) were tested for normal distribution (Shapiro Wilk's test). If normal distribution could not be assumed, non-parametric testing was used. Furthermore, the repeated measures ANOVA was validated with respect to the within subject factor TIME (Mauchly's sphericity) and the between-subject factor SITE (Levene's test).

Descriptive Analyses and Baseline Effects
As shown in Figure 3, the overall performance (percentage of correctly reproduced material) strongly decreased while incorrectly reconstructed words or silent events increased with speech rate. On average (across all conditions, speech rates, and stimulation sites), correct repetitions amounted to ca. 78%, silent events to ca. 18% and incorrect (related and unrelated) reproductions to ca. 4%. Absolute numbers of syllables are listed in Table 1 for each of the three measurement time points (pre/post1/post2) and for the experimental and control stimulation site.
Normal distribution was asserted using Shapiro Wilk's Test (p > 0.05) which showed that normality could be assumed in each stimulation site (pre-SMA/MOG) and each time point of measurement (pre/post1/post2) with respect to the syllable rate at 80% correct repetitions, percentage of correct repetitions at high speech rate, and percentage of unrelated errors. The percentage of correct repetitions at low speech rates did not reach normal distribution, caused by the ceiling effect (almost 100% performance in each subject). Variances between stimulation sites were found to be homogeneous (p > 0.05) and sphericity of the factor TIME was given in all parameters (p > 0.05).

cTBS Effects on Correct Repetitions
As expected, speech comprehension declined at high syllable rates (Figure 4 left). Descriptively, stimulation of pre-SMA caused a transient reduction of performance at syllable rates of 12 syl/s or faster, indicated as a "dip" in the post1 as compared to pre and post2 runs, which was absent in the control group (Figure 4 left). Performance of speech comprehension (= syllable rate at which 80% of the stimulus text could be reproduced) revealed a significant interaction TIME × SITE with respect to the quadratic relationship of TIME [F (1, 34) = 3.614, p = 0.033,   (Figure 4 middle) suggests a group difference, the main effect of SITE was not found to be significant [F (1, 34) = 0.318, p = 0.577]. Furthermore, the interaction TIME × SITE with a linear trend of TIME (slight increase in performance from pre to post2) was not significant [F (1, 34) = 0.025, p = 0.875]. Direct comparisons between preand post1 measurement (pre-, post2 differences as covariance adjustment) revealed a significant interaction TIME × SITE [F (1, 33) = 3.605, p = 0.033, one-tailed] as well as a significant interaction TIME × COVARIATE [F (1, 33) = 6.666, p = 0.014] indicating that the covariate ran counter to the pre-post1 effect. Post hoc the factor TIME as well as the interaction TIME × COVARIATE revealed significant effects after pre-SMA [TIME: F (1, 16) = 5.657, p = 0.030, d = 0.3; TIME × COVARIATE: F (1, 16) = 7.486, p = 0.015], but not after MOG stimulation [TIME: F (1, 16) = 0.624, p = 0.441, d = 0.2; TIME × COVARIATE: F (1, 16) = 1.501, p = 0.238]. A main effect of COVARIATE was not found to be significant.
Regarding the differential TMS effect on high vs. slow speech rates, the three-way interaction TIME × SITE × RATE was found to be significant with a quadratic trend of TIME [F (1, 34)  Since, data of slow rate conditions were not normally distributed caused by a ceiling effect (subjects understood almost 100%), effects were additionally analyzed by a non-parametric test. Thereby, considering low speech rates the three time points (pre/post1/post2) did not differ after pre-SMA stimulation (Friedman test, X 2 = 2.735, p = 0.255), while a significant effect could be observed in the control (MOG) condition (Friedman test, X 2 = 6.206, p = 0.045). The latter was related to pre-post2 (baseline) differences (Wilcoxon test post2 vs. pre: X 2 = −2.155, p = 0.031 not reaching Bonferroni correction). However, regarding high speech rates, significant differences between the three time points were found after pre-SMA (Friedman test, X 2 = 7.111, p = 0.012, one-tailed), but not after MOG stimulation (X 2 = 0.592, p = 0.744). Post hoc, after pre-SMA stimulation significant differences could be found between pre and post1 as well as post1 and post2 conditions (Wilcoxon test, pre vs. post1: X 2 = −1.677, p = 0.047, d = 0.3, one-tailed, post2 vs. post1: X 2 = −2.636, p = 0.008, d = 0.5). Direct comparisons between stimulation sites (pre-SMA vs. MOG) with respect to pre and post1 differences were found to be larger under the experimental than control condition at high speech rates (Mann-Whitney U test: U = 91, p = 0.025, d = 0.7).

cTBS Effects on a Specific Error Type-Unrelated Errors
As concerns the percentage of unrelated errors (based on the total amount of incorrect repetitions), a significant two-way interaction TIME × SITE with a quadratic relationship of TIME [F (1, 34)  Direct comparisons between pre-and post1 measurement (pre-, post2 differences as covariance adjustment) revealed a significant interaction TIME × SITE [F (1, 33) = 12.422, p = 0.001] as well as a significant interaction TIME × COVARIATE [F (1, 33) = 19.131, p = 0.000]. A main effect of COVARIATE was not found to be significant. Post hoc unrelated errors significantly increased during post1 measurement compared to pre-condition after pre-SMA stimulation (T = −2.470, p = 0.024, d = 0.9), while MOG stimulation did not show any significant effects (T = 1.855, p = 0.081, d = 0.6).

DISCUSSION
As hypothesized, a transient "virtual lesion" in the pre-SMA resulted in reduced sentence repetition performance. Transient reduction of performance was found to be significant at 80% correct repetitions, i.e., reduction of syllable rates from 13.2/13.4 (pre/post2) to 12.9 syl/s (post1). Thereby, the 80% threshold guarantees speech perception under high demand, while still enough is understood. Further, the cTBS effect was found for fast (12, 14 syl/s), but not for moderate syllable rates (8, 10 syl/s). As concerns qualitative aspects, unrelated errors significantly increased after cTBS over pre-SMA. The results did not show any change in performance from pre-to post2-baselines.

Task Difficulty-Reconstruction and Prediction
We used cTBS to induce a transient disruption of cortical processing in pre-SMA to gain knowledge about the functional role of pre-SMA in speech comprehension under time-critical circumstances. Hearing accelerated formant-synthesized speech of single sentences is a quite artificial condition. However, the mechanisms of phonological/lexical encoding have been suggested to be still similar to normal speech, and task difficulty (in terms of speech rate) was easy to manipulate under these conditions. Although natural speech represents "real life speech comprehension" and synthesized speech sounds a little unfamiliar, formant synthesis, due to its simple rulebased structure seems to have even some advantages regarding intelligibility at high syllable rates. This has been shown for blind subjects who use accelerated speech for text reception by comparing formant synthesis to accelerated natural speech (Moos and Trouvain, 2007) or to natural sounding diphone synthesis (Trouvain, 2007).
In the present data, transient impairment of speech comprehension was only observed for high speech rates of 12 syl/s or faster. In line with these findings, pre-SMA was found to be stronger activated in fMRI studies during presentation of ultra-fast as compared to moderately fast speech (Dietrich et al., 2013a,b) and to be particularly active near the limit of intelligibility (Vagharchakian et al., 2012). An effect of task difficulty on pre-SMA activation was also found in case of degraded speech in a sentence matching task (Clos et al., 2014a,b) or in experiments on switching between native and foreign speech perception (de Bruin et al., 2014). Thus, pre-SMA seems to be generally involved in speech processing in case of high task demands. The present pre-SMA location for stimulation was taken from a previous study (Dietrich et al., 2013a), which was conducted on five late-blind participants and one sighted subject trained to comprehend ultra-fast speech at 16-18 syl/s. Thereby, all subjects showed extended frontal and premotor activation, i.e., pre-SMA and left IFG, after training (Dietrich et al., 2013a). Blind subjects were found to use additional strategies for accelerated speech comprehension at the sensory level, which could not be used by sighted subjects (recruitment of primary visual cortex in order to detect speech features). However, dealing with the "frontal bottleneck" of speech perception, blind and sighted subjects seem to be similar. This bottleneck, including functions of the pre-SMA, comprises the coordination of memory representations with the temporal event structure (Kotz and Schwartze, 2010) and seems to be involved in the buffering of phonological materials (Dietrich et al., 2013a).
Although the functional role of pre-SMA seems to be evident from the present data as well as previous fMRI data, its differential contribution to the entire process of speech processing should be considered more in detail. Presumably, time-critical speech perception cannot totally be performed in a bottom-up mode. Pre-SMA involvement during perception of sentence stimulus materials and adverse listening conditions could be explained by the assumption that procedural representations may contribute to disambiguate linguistic information when lexical/semantic access is difficult. Thereby, it is hypothesized that these procedural representations are linked to predictive top-down mechanisms (Hertrich et al., 2016). Utilizing general redundancy in speech and language, the speech generation mechanism can make predictions for upcoming speech material in order to save time during lexical access. Similarly, when part of the speech signal is unintelligible, a reconstruction of missing information has to be performed. In both cases, the top-down generated data stream has to be synchronized with the bottom-up information stream of the incoming speech signal. This temporal adjustment can be performed on the basis of the prosodic structure of speech, e.g., syllable rhythm, which is predominantly represented in the right hemisphere, but also on the basis of phonological and semantic content kept in the verbal working memory represented in the left hemisphere (Ross, 1981;Gorelick and Ross, 1987;Ross and Monnot, 2008;Friederici and Gierhan, 2013).
Various studies document pre-SMA involvement in repair mechanisms occurring under high demand conditions (Scott et al., 2004;Lima et al., 2016 for a review; Adank and Devlin, 2010). Similarly, the current results showed an effect of pre-SMA stimulation only when the task requires more effort/attention (high speech rates). Actually, reconstruction of missing materials (in the auditory representation) becomes only necessary under high demand conditions. Thus, the increased effort (requiring pre-SMA) seems to reflect the necessity of using predictive topdown mechanisms. Regarding the timing of motor events, the right hemisphere has an inhibitory control function on leftdominant forward action control, working as a kind of "brake" (Aron et al., 2014). Similar control mechanisms may be present for predictive inner speech generation in the absence of any overt motor activity. Evidence for the involvement of pre-SMA in predictive language mechanisms has been provided in review papers emphasizing the role of pre-SMA as an output region from cerebellar-thalamic and basal ganglia-thalamic circuits enhancing temporal processing such as interval estimation and extraction of temporal regularity (Schwartze et al., 2012a,b). Once temporal regularity is perceived, context-based predictions would allow the system to reconstruct omitted events (Kotz and Schwartze, 2010). Previous studies showed that syllable onsets are tracked at the sensory level (Hertrich et al., 2013), presumably, in order to predict the timing of the next incoming item. To facilitate lexical access of the incoming signal, the anticipatory timing of syllable onsets has to be imposed on predicted phonological chunks. Thus, conceivably, in addition to timing information, content-related predictions based on articulatory-phonological as well as lexico-semantic information might help to overcome the difficulty of accelerated speech comprehension. Since the function of pre-SMA was disturbed, consistent predictions/reconstructions (= successful adjustment of timing and content) could no longer be signaled to the speech generation system (left IFG) in order to reproduce the sentence correctly.
Monitoring or anticipation of articulatory gestures, i.e., access to motor representations without execution, i.e., inner speech, seems to be an effective way for speech perception under high demand such as high speaking rates (Hertrich et al., 2016). In the actual study, perception could not be separated from production since perception was tested by a repetition task. However, comparing high and slow speech rates, the cTBS effect occurred on high, but not on low speech rates. If pre-SMA inhibition had an impact on speech production instead of perception, the TMS effect would be observable under high as well as slow speaking rates. Since this was not the case, inhibition of pre-SMA was suggested to be relevant to speech perception. However, the repetition task force participants to perform a sensory-tomotor transformation in the way that syllable perception might also include the storage of the articulatory gestures. Thus, during perception of low syllable rates articulatory gestures are clearly represented requiring no further inhibitory process from pre-SMA during production. If the motor plans are not represented clearly during perception (high rates), pre-SMA needs to more strongly inhibit wrong motor plans during production. Based on the current results, this possibility cannot be excluded. In other words, when sentences are more difficult to understand, the planning of speech production will also be more challenging, e.g., participants could be unsure about which words to be produce. Thus, it is obvious that the repetition task chosen in the current study makes it difficult to separate aspects of perception from production with respect to the "planning" function. As concerns direct motor execution (not planning) of repetition, SMA proper would be an appropriate candidate for controlling this function (Picard and Strick, 1996). However, the comprehension of sentences generally requires inner speech mechanisms including monitoring and phonological planning stages, irrespective of whether a motor response (repetition task) is required or not. Lima et al. (2016) discussed strong functional (and structural) connections between pre-SMA and SMA proper enabling auditory perception and auditory imagery. Inner speech could be considered as an imagery of speech sounds initiated by the mental representation of articulatory gestures. Thus, the present findings are in line with Lima et al.'s hypothesis 2016 that pre-SMA is involved in planning and/or monitoring inner speech. It cannot be completely excluded that the present pre-SMA stimulation also affected the motor action of the repetition task (rather than reconstruction and monitoring). Maybe this could be shown by an analysis of speech motor characteristics such as articulator velocities, but this was not measured in the present study. Nevertheless, since previous studies reported strong activation of pre-SMA during passive listening to accelerated speech (Vagharchakian et al., 2012;Dietrich et al., 2013a,b), concomitant with activation of left IFG, the present effect of pre-SMA stimulation may indicate an involvement of the speech generation system in the process of speech perception rather than an impairment of motor output.

Inhibitory Control Mechanism
Pre-SMA activity was observed in monkeys when they had to discard a current motor plan and acquire a new plan for future performance (Shima et al., 1996;Tanji, 1996). In humans, a frontal inhibitory control ("no go") network has been outlined comprising, especially, pre-SMA and righthemispheric inferior frontal cortex (Sharp et al., 2010;Swann et al., 2012;Aron et al., 2014). In case of erroneous response selection (i.e., detection of an implausible item that does not fit into the context), the ongoing top-down modulated information stream must be interrupted and restarted in order to avoid further misunderstandings when the system is "on the wrong track". In the absence of such an inhibitory mechanism, i.e., after suppression of pre-SMA, implausible alternatives will no longer be inhibited as shown in the present results. Evidence for the inhibitory control function of pre-SMA was also provided by intra-individual comparisons of reaction time in a stop signal task (Chao et al., 2009). In line with these findings, anodal transcranial direct current stimulation of pre-SMA resulted in enhanced inhibitory control in a stop movement task (Kwon and Kwon, 2013).
In order to integrate the proposed inhibitory mechanism during speech perception in a broader functional role of pre-SMA, tasks requiring monitoring (active construction of perceived information) should be in the focus, particularly with respect to the inhibition of implausible alternatives that are not in the range of expectations. Anticipation of sequences (e.g., music) or scenarios (e.g., face-to-face communication) might enable effective/complete and fast/automated comprehension of the whole over a longer time window using internal statistics. Therefore, pre-SMA may be considered as a supra-modal region translating the results of the internal statistics (verification/falsification) into potential action patterns (inhibition/passing).

Pre-SMA and the Network for Speech Perception
Previous studies reported a division of SMA into higher cognitive (pre-SMA) and motor-related (SMA-proper) functions (Chee et al., 1999;Moore-Parks et al., 2010). As concerns the perisylvian language network, various studies indicate a modular structure with respect to sub-functions such as articulatoryphonological vs. semantic processing (Anwander et al., 2007;Friederici, 2009;Murakami et al., 2015). The posterior-anterior organization of SMA into motor-related functions (SMA-proper) and higher-order cognitive processing (pre-SMA) is organized largely parallel, also in structural connectivity patterns, to the ventral and dorsal premotor regions (Anwander et al., 2007), with their articulatory-phonological (dorsal stream) and semantic (ventral stream) sub-functions, respectively (Hickok and Poeppel, 2007;Saur et al., 2010). Speculatively, pre-SMA might feed into these premotor or inferior frontal nodes (BA44, BA45) of the dual pathways in a selective way: anterior parts of pre-SMA regulate lexico-semantic reconstruction while posterior parts of pre-SMA link auditory speech information with related motor programs (articulatory-phonological) in order to optimize speech processing (Lima et al., 2016). Erroneous speech material of the present study was classified with respect to phonological or semantic similarity to the target. In case of semantic relatedness, reproduced words were synonyms or words belonging to the same semantic field whereas phonological similarity was characterized by a similar surface structure, irrespective of the meaning of these words. However, statistical inferences on semantic vs. phonological errors were not given any further consideration due to the small number of syllables. Furthermore, the stimulation site of the present study was not chosen to selectively influence phonological vs. semantic processing. For future studies, it might be hypothesized that stimulation of a more anterior (MNI coordinate y > 9) vs. a more posterior region (MNI coordinate y < 9)-compared to the present stimulation site (MNI coordinate y = 9)-will allow for observing differential error patterns regarding semantic vs. phonological top-down strategies, respectively: For example, stimulation of a more anterior region might result in reduction of semantically plausible alternatives while phonological plausible items remain unaffected.
In order to gain a more complete insight into sub-functions of the speech network, other stimulation sites should be considered, such as frontotemporal parts of the language network comprising the various nodes of the dorsal (phonological) and ventral (semantic) pathways. Each of these regions might lead to a specific error pattern after TMS stimulation. Nevertheless, the paradigm used in the present study has shown that manipulations of cognitive processes during speech perception are possible and that the analysis of incorrect repetition behavior provides some insight into the function of stimulated region.

MOG-A Speech-Irrelevant Control Area?
The MNI coordinates of the control area (MOG) used in the present study correspond to occipital lobe area V5, a region which was found to be sensitive to visual motion processing (Zeki, 2015). This region is laterally surrounded (closer to the stimulation coil) by the occipital lateral area V4. Functionally, V4 is part of the ventral visual "what" pathway [running from V1 to the temporal lobe (Desimone, 1991;Tanaka, 1997)], involved in visual object identification and recognition (Zeki, 1983;Desimone and Schein, 1987). Inhibitory stimulation of V5 (MOG) and V4 might, on the one hand, reduce "visual noise" (which is normally regulated/decreased by higher cognitive mechanisms) so that the auditory system (and also speech processing) is facilitated. On the other hand, cTBS over V5/V4 might reduce speech comprehension due to the interruption of audio-visual interactions resonating within the mental lexicon ("ventral pathway" V4) and the visual imagery of articulation (V5). Regarding the overall performance neither facilitation nor any impairment could be observed after MOG stimulation compared to the baseline conditions (pre, post2).
A limitation of the current study might be the fact that, in order to avoid uncomfortable side effects of nerve stimulation, TMS intensity was lower in the control area (MOG) than in the test region (pre-SMA). Thus, it cannot be excluded that higher MOG stimulation might have an effect on speech comprehension. However, in line with previous reports in the literature (Restle et al., 2012;Murakami et al., 2015), the present results did not even show a tendency of decreased performance after MOG stimulation.

CONCLUSION
Taken together, cTBS of pre-SMA reduced the performance of speech comprehension, indicating an engagement of pre-SMA in language functions. However, significant effects of cTBS of pre-SMA occurred only under time-critical circumstances, which might be explained by the assumption that in case of increased task demands additional pre-SMA-dependent top-down mechanisms are engaged, enabling prediction, and reconstruction of partially unintelligible speech materials. Thereby, pre-SMA might contribute to an integration of right-hemispheric (syllable timing) and left-hemispheric (phonological sequencing, semantic mapping) functions, eventually mediated by subcortical structures. As concerns the kind of errors being made after cTBS-induced pre-SMA suppression, implausible errors increased, suggesting that under suppression of pre-SMA implausible errors will no longer be inhibited.

AUTHOR CONTRIBUTIONS
HA, IH, SD, UZ, and FM-D delineated the rationale and developed the design of the study. IH, SD, PB, DD, FM-D, and VS were engaged in data collection and development of analyses methods. SD and VS performed the behavioral and MRI and TMS data analyses, and drafted the first version of the paper. All authors contributed to the final version of the manuscript and approved its content.

ACKNOWLEDGMENTS
This study was supported by the German Research Foundation (DFG Project HE 1573/6-2) and by the Hertie Institute for Clinical Brain Research (Tübingen, Germany). The authors would like to thank Fotini Scherer for excellent technical assistance. It should be noted that a preliminary version of the study was presented on the 18th International Congress of Phonetic Sciences (Dietrich et al., 2015). We acknowledge support by Open Access Publishing Fund of University of Tübingen.