The functional anatomy of self-generated and predictable speech

Sensory attenuation refers to the cortical suppression of self-generated sensations relative to externally-generated sensations. This attenuation of cortical responsiveness is the result of internal forward models which make precise predictions about forthcoming sensations. Forward models of sensory attenuation in the auditory domain are thought to operate along auditory white matter pathways such as the arcuate fasciculus and the frontal aslant. The aim of this study was to investigate whether brain regions that are structurally connected via these white matter pathways are also effectively connected during overt speech, as well as as when listening to externally-generated speech that is temporally predictable via a visual cue. Using Electroencephalography (EEG) and Dynamic Causal Modelling (DCM) we investigated network models that link the primary auditory cortex to Wernicke’s and Broca’s area either directly or indirectly through Geschwind’s territory, which are structurally connected via the arcuate fasciculus. Connections between Broca’s and supplementary motor area, which are structurally connected via the frontal aslant, were also included. Our results revealed that bilateral areas interconnected by indirect and direct pathways of the arcuate fasciculus, in addition to regions interconnected by the frontal aslant best explain the EEG responses to both self-generated speech, and speech that is externally-generated but temporally predictable. These findings indicate that structurally connected brain regions thought to be involved in auditory attenuation are also effectively connected. Critically, our findings expand on the notion of internal forward models, whereby sensory consequences of our actions are internally predicted and reflected in reduced cortical responsiveness to these sensations. Significance statement Auditory attenuation refers to the reduction of cortical responsiveness to sensations resulting from self-generated actions relative to identical sensations that are externally generated. This attenuation is thought to be caused by internal forward models whereby self-generated sensations such as speech and thought are predicted via an efference copy of the motor command. These efference copies have been suggested to be sent along auditory white matter pathways which connect brain regions involved in speech production and processing. The findings from the present study indicate that structurally connected brain areas involved in auditory attenuation are also effectively connected, which is in line with the notion of internal forward models of auditory attenuation to self-generated and predictable speech.


Introduction
The ability to predict imminent auditory sensations that are either self-generated in the form of speech, or based on past experiences such as hearing a familiar song, is crucial for processing the abundance of auditory stimulation we experience at any moment, as it helps us to adapt to unexpected auditory events in the environment. Sensory attenuation refers to the reduction in the neurophysiological response to sensations that are generated by our own actions relative to identical sensations that are generated in the external environment. Wolpert et al. (1995) suggested that sensory attenuation is the result of internal forward models, whereby the sensory consequences of our own actions are predicted on the basis of an efference copy of the motor command. Due to the predicted sensory consequences of internally generated actions, the central nervous system tends to be less responsive to selfgenerated actions as opposed to identical sensations that are externally generated.
Healthy individuals exhibit significantly reduced cortical responsiveness to sounds that are self-versus externally generated (Schafer and Marcus, 1973). This is reflected in a reduced N1 amplitude, an auditory evoked potential component, which peaks at approximately 100ms after the onset of a sound and is reportedly elicited in the auditory cortex (Zouridakis et al., 1998). Auditory attenuation has been observed for willed vocalizations (Curio et al., 2000;Heinks-Maldonado et al., 2005), button-press elicited sounds (Schafer and Marcus, 1973;Martikainen et al., 2005;Aliu et al., 2009;Baess et al., 2011) and temporally predictable sounds which are not self-generated but temporally cued via a visual stimulus (Ford et al., 2007;Sowman et al., 2012;Oestreich et al., 2015).
The underlying functional anatomy engaged during auditory attenuation is yet to be determined. The arcuate fasciculus provides a direct connection between speech production (Broca's) and speech perception (Wernicke's) areas and is therefore a plausible white matter connection for the conveyance of efference copies according to internal forward models of auditory attenuation during willed speech. In addition to direct, long segment fibers connecting Broca's and Wernicke's area, the arcuate fasciculus also has shorter, indirect connections consisting of an anterior pathway which connects Broca's area to Geschwind's territory, and a posterior pathway which connects Geschwind's territory and Wernicke's area . These long and short distance pathways of the arcuate fasciculus possess different functional roles, whereby the direct pathway is thought to be involved in phonologically functions and the indirect pathways in semantic functions . Specifically, the posterior indirect pathway is thought to be involved in auditory comprehension and the anterior indirect pathway in the vocalization of semantic information . The arcuate fasciculus has been suggested as the most likely connection to be utilized during auditory attenuation to willed speech (Pynn and DeSouza, 2013).
Additionally, the frontal aslant, which directly connects Broca's area with the supplementary motor area (Catani et al., 2012), might also play a role in auditory attenuation, as it is involved in verbal fluency (Catani et al., 2013) and speech initiation (Fujii et al., 2016). It is therefore conceivable that these white matter pathways are functionally engaged and effectively connected during auditory attenuation.
In this study, we formulated a set of dynamic causal models (DCMs), which map onto the plausible functional anatomy of auditory attenuation to self-generated and temporally predictable speech. These DCMs included brain regions interconnected via the arcuate fasciculus and the frontal aslant. According to the forward model, efference copies are transmitted via backward connections along the arcuate fasciculus (Pynn and DeSouza, 2013). In keeping with this theory, it was hypothesized that models with both forward and backward connections, which convey sensory input and prediction, respectively, are better at explaining auditory attenuation than models with forward connections alone. Furthermore, we explored whether auditory attenuation was better explained by alternative models that 6 included or excluded the above mentioned regions along the arcuate fasciculus (Geschwind's territory) and the frontal aslant (supplementary motor area).

Participants
Seventy-five healthy participants (38% males, aged 18-44 years, 95% right-handed) were recruited through the online recruitment systems SONA-1 and SONA-P at the University of New South Wales, Australia. Participants were either monetarily reimbursed for their time or received course credit. One participant was excluded from the analyses due to a self-reported diagnosis of an Axis I disorder (American Psychiatric Association, 2000). Event-related potential (ERP) analyses and a detailed description of the demographic data have been reported previously elsewhere (Oestreich et al., 2015). All participants gave written informed consent. This study was approved by the UNSW Human Research Ethics Advisory Panel (Psychology) and the University of Queensland Research Ethics Committee.

Procedure
Participants completed a number of questionnaires about their demographics, alcohol, nicotine, caffeine and recreational drug use, as well as history of Axis I disorders.
Participants then underwent electroencephalographic (EEG) recordings while performing an experimental task in a quiet, dimly lit room. The experiment consisted of three conditions, namely the Talk, Passive Listen, and Cued Listen conditions (Ford et al., 2007;Oestreich et al., 2015). Before the experiment, an instruction video was played, which demonstrated how to vocalize the syllable 'ah' in a clear manner while maintaining the gaze on a fixation cross.
Following the instruction video, participants were trained to vocalize the syllable 'ah' with a duration of less than 300ms and an intensity between 75dB and 85dB. During the Talk condition, participants vocalized a series of 'ah's in a desk-mounted microphone, every one to three seconds until 3 minutes had elapsed. In the Cued Listen condition, participants were instructed to listen to a recording of their own willed vocalizations whilst watching a video of 8 the vocalization waveforms. Participants were therefore able to make exact temporal predictions about the onset of a speech sound. Lastly, during the Passive Listen condition, participants listened to their own willed vocalizations played back without a cue. During the Passive Listen condition, participants are therefore unable to make temporal predictions about the onset of the next speech sound.

Data Acquisition and preprocessing
EEG was recorded with a 64-channel BioSemi ActiView system at a sampling rate of 2048Hz, 18dB/octave roll-off and 417Hz bandwidth (3dB). External electrodes were placed on the mastoids, the outer canthi of both eyes and below the left eye and the EEG data were referenced to the average of the mastoid electrodes. Preprocessing was performed using SPM12 (Wellcome Trust Centre for Neuroimaging, London; http://www.fil.ion.ucl.ac.uk/spm/) with MATLAB (MathWorks). Triggers were inserted at the onset of each 'ah' and the EEG data were then segmented into 800ms intervals with 200ms pre-and 600ms post-stimulus onset. Eye blinks and movements were corrected with a regression based algorithm using vertical and horizontal electrooculogram (VEOG, HEOG; Gratton et al. (1983). The low and high frequency components of the EEG signal were attenuated using a 0.5-15Hz bandpass filter (Ford and Mathalon, 2004) and trials containing artefacts exceeding ±50µV were rejected. The remaining artifact free trials were averaged per condition for each participant in order to obtain event-related potentials (ERPs).

Dynamic Causal Modelling (DCM)
DCM relies on a generative spatiotemporal model for EEG responses evoked by experimental stimuli . It uses neural mass models (David and Friston, 2003) to infer source activity of dynamically interacting excitatory and inhibitory neuronal subpopulations (Jansen and Rit, 1995), and the connectivity established amongst different brain regions. DCM sources are interconnected via forward, backward and lateral connections (Felleman and Van Essen, 1991), and are arranged in a hierarchical manner (David et al., 2005;Kiebel et al., 2007). DCMs are designed to test specific connectional hypotheses that are motivated by alternative theories . Every connectivity model defines a network that attempts to predict (i.e. generate) the ERP signal.
Differences in the ERPs to different experimental stimuli are modelled in terms of synaptic connectivity changes within and between cortical sources . Several plausible cortical network connections are compared by estimating the probability of the data given a particular model within the space of models compared, using Bayesian Model Selection (BMS; Penny et al., 2004). BMS provides estimates of the posterior probability of the DCM parameters given the data, as well as the posterior probability of each model (Penny et al., 2004). The winning model is the model, which maximizes the fit to the data while simultaneously minimizing the complexity of the model. The posterior probability of each model was computed over all participants using a random effects approach (RFX; .

Model specification
The models compared in this study include up to 10 brain regions hierarchically organized in one to five levels. These alternative models are motivated by speech related brain regions that are connected via the auditory white matter pathways of the arcuate fasciculus and the frontal aslant, which are thought to be involved in auditory attenuation.
The bilateral primary auditory cortex (A1) was defined as the cortical input node for auditory Since the effective connectivity of auditory attenuation has not been studied before, we considered a comprehensive model space including a total of 96 models comprising symmetric and non-symmetric hierarchical models, with forward connections only and combined forward and backward connections, with and without indirect connections between W and B via G, as well as models with and without the frontal aslant, which connects B to SMA (for a full description of the model space see Figure 1). All models allowed for changes of intrinsic connectivity at the level of A1. All 96 models were estimated and individually compared to each other using BMS. The 96 models were then partitioned into a number of different families.

INSERT FIGURE 1 ABOUT HERE
We investigated whether auditory attenuation is driven by feedback loops, through both forward and backward connections, or by bottom-up inputs alone, via forward connections between brain regions along the arcuate fasciculus, and possibly also through the frontal aslant. Models with feedback loops would support the theory of internal forward models whereby self-generated actions and predictable sensory consequences are cortically attenuated. To this end, a family consisting of all 48 models with Forward family connections only was compared to a family consisting of all 48 models with Forward and Backward family connections.
We then grouped our models into families that included specific regions and tracts as follows: 1) The Null family consisted of 8 models that included A1 only and models connecting A1 to W, 2) the Arcuate direct pathway family included 10 models, with connections between A1 and W as well as W and B, 3) the Arcuate direct and indirect pathways family consisted of 28 models including connections between A1 and W, W and G, G and B, as well as W and B, 4) The Aslant-Arcuate direct pathways family included 14 models that connected A1 and W, W and B, as well as B and SMA and 5) the Aslant-Arcuate direct and indirect pathways and Aslant comprising 18 models, included connections between A1 and W, W and G, G and B, W and B as well as B and SMA (see Figure 1 and 2).

INSERT FIGURE 2 ABOUT HERE
To follow up whether models with or without the frontal aslant (i.e. connections to SMA) better explained auditory attenuation, we first combined the Arcuate direct pathway family (10 models with connections linking A1, W, and B directly; see Figure 1 and 2) and the Arcuate direct and indirect pathways family (28 models linking A1, W, G and B) into one single family -the Arcuate family. We then compared this to the Arcuate-Aslant family, which resulted from combining the Arcuate-Aslant direct pathways family (14 models) and the Arcuate-Aslant direct and indirect pathways families (36 models) consisting of all the 50 models with connections to SMA (see Figure 1 and 2).
Lastly, to investigate whether Geschwind's territory is part of the circuit engaged in auditory attenuation of speech, we compared families of models with and without Geschwind. To this end, we combined all the models precluding Geschwind into one family -no Geschwind family -by grouping the Arcuate direct pathway family (10 models) and the Arcuate-Aslant direct pathways family (14 models; see Figure 1 and 2). We compared the no Geschwind family to the Geschwind family, which included a combination of the Arcuate direct and indirect pathways family (28 models) and the Arcuate-Aslant direct and indirect pathways family, that is, all the models that included Geschwind (36 models).
Each of the 96 models was fitted to each individual participant's mean response for the contrast between the Passive Listen and Talk conditions (i.e. effective connectivity of self-generated speech), as well as to the contrast between the Passive Listen and Cued Listen conditions (i.e. effective connectivity of predictable speech), whereby the Passive Listen condition was used as the baseline condition for both DCM contrasts.

Results
In a first step all 96 models with forward connections only as well as forward and backward connections were individually compared to each other. The DCM analysis of the self-generated speech condition (compared to the Passive Listen condition) indicated that the best model included reciprocal connections between bilateral A1, W, G, B and SMA, as well as direct connections between W and B in both hemispheres (expected probability = .04, exceedance probability = .21; BOR = .01, see Figure 3). This indicates bilateral connectivity along the arcuate fasciculus and the frontal aslant best explain attenuation of self-generated speech. The second best model, which was also rather probable, was equal to the winning model except that it did not include any connections to SMA via the aslant (expected probability = .03, exceedance probability = .17; see Figure 3). The DCM analysis of the predictable speech condition (compared to the Passive Listen condition) revealed that the winning model was the same as the second most likely model for self-generated speech, including reciprocal connections linking bilateral A1, W, G and B, as well as direct connections between W and B in both the left and the right hemispheres (expected probability = .04, exceedance probability = .32; BOR < .01, see Figure 3). The second best model for predictable speech was in all equal to the winning model except that it included connections to SMA via the aslant in the left hemisphere (expected probability = .03, exceedance probability = .17; see Figure 3).

INSERT FIGURE 3 ABOUT HERE
When comparing a family with modulations of forward connections only (i.e. Forward family) to a family of both forward and backward connections (i.e. Forward and Backward family), we found that the family consisting of a combination of Forward and Backward connections better explained auditory attenuation for both self-generated speech (expected probability = .59, exceedance probability = .95) and temporally predictable speech (expected probability = .56, exceedance probability = .85), than the families including Forward connections only (see Figure 4).

INSERT FIGURE 4 ABOUT HERE
To test specific hypotheses as to which brain regions interconnected by the arcuate fasciculus and the frontal aslant were engaged in auditory attenuation, five families of models were created as described in the methods section (see Figure 1 and 2). BMS of these families indicated that the Aslant-Arcuate direct and indirect pathways family was the winning family for both self-generated speech (expected probability = .55, exceedance probability = .98) and predictable speech (expected probability = .54, exceedance probability = .98; see Figure 4).
When comparing families with the arcuate fasciculus alone (i.e Arcuate family) to families including both the arcuate fasciculus and the frontal aslant (i.e. Arcuate-Aslant family), BMS revealed that the winning family Arcuate-Aslant family was much more likely than the Arcuate family during self-generated speech (expected probability = .63, exceedance probability = .99) and predictable speech (expected probability = .60, exceedance probability = .95; see Figure 4).
Lastly, we investigated whether families of models with or without Geschwind, which enquired as to whether Geschwind plays a role in the functional circuit engaged in auditory attenuation (Geschwind family vs no Geschwind family). Our results indicated that the family of models including connections to Geschwind outperformed models without Geschwind during self-generated speech (expected probability = .91, exceedance probability = 1) and predictable speech (expected probability = .88, exceedance probability = 1; see Figure 4).

Discussion
This study investigated the functional anatomy underlying auditory attenuation to self-generated and temporally predictable speech sounds using DCM. Model comparison revealed that modulations in both forward and backward connections better explained auditory attenuation than forward connections alone, which is in line with the theory of internal forward models of auditory attenuation (Ford and Mathalon, 2004), whereby an efference copy, or prediction, is conveyed via backward (i.e. top-down) connections.
Connectivity models linking primary auditory cortex, Wernicke's area, Geschwind's territory and Broca's area via the arcuate fasciculus and the supplementary motor area, through the frontal aslant tract, outperformed models without connections to the supplementary motor area and Geschwind's territory. These findings indicate that the circuitry underlying auditory attenuation to self-generated and temporally predictable speech sounds most likely involves brain regions interconnected by both the short distance, indirect pathways, and the long distance, direct pathway of the arcuate fasciculus in addition to brain regions interconnected by the frontal aslant.
The finding that a combination of forward and backward connections better explained auditory attenuation than forward connections alone is in line with the theory of internal forward models, whereby a prediction, in the form of an efference copy, is conveyed through backward connections. Forward connections can be conceptualized as bottom-up processes (Friston, 2005;Chen et al., 2009), which convey environmental sensory information from the primary auditory cortex to higher cortical levels. On the contrary, backward connections represent top-down (Chen et al., 2009), predictive processes based on self-monitoring or past experiences. In this study, we used a Talk condition during which participants vocalized speech sounds and a Cued Listen condition whereby participants were cued to the exact onset of each speech sound while listening to the previously recorded vocalizations from the Talk condition. We compared these conditions to the baseline, Passive Listen condition, during which participants were passively listening to the series of previously recorded vocalizations from the Talk condition. During the Talk and Cued Listen conditions, participants are able to make predictions about each speech sound, via top-down, backward connections, which are then compared with actual sensory inputs sent upwards via forward connections. On the contrary, during the Passive Listen condition participants are unable to make temporal predictions about to the onset of a speech sound. The theory of internal forward models is therefore supported by the findings from this study, whereby changes in effective connectivity from the baseline (i.e., Passive Listen condition) to the Talk conditions is best explained by a feedback loop comprising conjoint forward (bottom-up) and backward (topdown) connections. Similarly, feedback loops were found to underlie connectivity differences between the Cued Listen compared to the Passive Listen condition, suggesting that a forward model for predictable sounds is also internally generated and conveyed via backward connections.
The arcuate fasciculus has been proposed as the most likely route for the efference copy of a motor act during internal forward models of auditory attenuation (Whitford et al., 2011;Pynn and DeSouza, 2013). On the contrary, the frontal aslant seems to be a likely connection for the initiation of the motor act to trigger willed speech as it has a connection to the supplementary motor area. The results from the individual model comparisons support this theory, as the winning model during self-generated speech included bilateral connections along the arcuate fasciculus and the frontal aslant, thereby facilitating the transmission of an efference copy and a motor efference. During temporally predictable speech on the other hand, the most probable model included connections along the arcuate fasciculus only.
In order to determine whether the frontal aslant adds to the functional anatomy of auditory attenuation or whether connections along the arcuate fasciculus alone are sufficient, we compared families of all models with and without connections along the frontal aslant (while keeping the arcuate fasciculus pathways intact). The findings indicated that during both self-generated speech and predictable speech, models with connections along the arcuate fasciculus and the frontal aslant better explained auditory attenuation than models including the arcuate fasciculus only. This can be explained by the motor efference in the Talk condition, whereby speech sounds are actively generated and the motor efference is sent along the frontal aslant. However, the Cued Listen condition did not involve a motor act, which means that the frontal aslant tract is not being utilized for the transmission of a motor efference. A possible explanation for the involvement of connections to the supplementary motor area during predictable speech and therefore the engagement of the frontal aslant is the notion of an efference copy for thought. Indeed, a proposal put forward by Jackson (1958) states that since internal forward models are working reliably during processes of sensory motor control, the same internal forward models, developed later in evolution, might also be utilized during higher cognitive processes such as thought or inner speech, which can be seen as our most complex motor act without actions. In the context of the present study, while participants are not actively generating the vocalization in the Cued Listen condition, watching the waveforms of the speech sounds might lead them to internally simulate the next vocalization, which might explain the activation of the supplementary motor area without a motor act.
The arcuate fasciculus consists of long distance fibers which connect Broca's and Wernicke's area as well as short distance fibers which connect Broca's and Geschwind's territory via an anterior pathway, and Geschwind's territory and Wernicke's area via a posterior pathway . The results of the present study indicate that models including long distance connections in addition to short distance connections, via Geschwind's territory, better explain auditory attenuation than models including long distance connections only. The direct, long distance pathway is thought to be involved in phonological repetitions  and therefore represents a plausible connection to be utilized during this experimental tasks, whereby the same speech sound was vocalized and played repetitively. The indirect, short distance pathways of the arcuate fasciculus are thought to be involved in semantic functions . The engagement of these connections during auditory attenuation might be explained by the nature of the speech sounds used in the present study. Since phonemes are the building blocks of language which are used to distinguish one word from another, it is possible that participants assigned semantic meaning to these sounds, which would likely not occur if the sounds were simple tones.
The involvement of brain areas interconnected via the arcuate fasciculus during auditory attenuation is in line with findings from studies of auditory attenuation in schizophrenia. There is substantial evidence that patients with schizophrenia possess abnormal auditory attenuation to self-generated speech (Ford et al., 2001;Ford and Mathalon, 2004;Ford et al., 2007), button-press elicited sounds (Whitford et al., 2011;Ford et al., 2014), and temporally cued sounds (Ford et al., 2007). Individuals at high-risk for developing a psychotic disorder exhibit auditory attenuation that is intermediate between healthy participants and patients with schizophrenia (Perez et al., 2012). Moreover, healthy individuals with psychotic-like experiences show less auditory attenuation compared to healthy individuals without psychotic-like experiences (Oestreich et al., 2015(Oestreich et al., , 2016. The underlying mechanisms inducing these auditory attenuation deficits in schizophrenia and psychosis are still unclear. However, several studies have reported changes to the white matter structure and specifically to the myelin sheath of the axons constituting the arcuate fasciculus in patients with schizophrenia. This is insofar important as it indicates that connectivity along the arcuate fasciculus during auditory attenuation should be delayed due to a loss of conduction velocity induced by demyelination. Support for this contention comes from a study by Whitford et al. (2011), which reported that auditory attenuation abnormalities typically exhibited by patients with schizophrenia could be completely eliminated by imposing a 50ms delay between a self-generated button press and the delivery of a sound. This was interpreted to indicate that efference copies travelling along the arcuate fasciculus during auditory attenuation were delayed by 50ms in the group of schizophrenia patients. Furthermore, the study reported that the degree to which auditory attenuation improved as a result of the delay between button press and tone delivery was linearly correlated with white matter abnormalities in the arcuate fasciculus. The findings from the present study add further support for the role of the arcuate fasciculus during auditory attenuation by showing that the brain regions that are structurally interconnected by the arcuate fasciculus are also effectively connected.
In summary, our study shows that auditory attenuation for self-generated and predictable speech involve brain regions such as Wernicke's area, Broca's area, and Geschwind's territory, interconnected through the arcuate fasciculus via both short and long distance fibers, as well as the supplementary motor area, which is linked to Broca's area via the frontal aslant. Critically, we found that auditory attenuation to self-generated and temporally predictable speech sounds engages feedback loops with conjoint forward (bottomup) and backward (top-down) connections. This is consistent with internal forward models, whereby the sensory consequences of our actions and thoughts are internally predicted, which is reflected in reduced cortical responsiveness to these sensations.   well as direct bilateral connections between W and B. This model was followed by a model, which was in all equal to the winning model except that it did not include bilateral connections from B to SMA. Attenuation of predictable speech was best explained by a model with recurrent connections between bilateral A1, W, G and B, as well as direct bilateral connections between W and B. This model was followed by a model, which was in all equal to the winning model except that it included a connection from B to SMA in the left hemisphere.