Prior Precision Modulates the Minimization of Auditory Prediction Error

The predictive coding model of perception proposes that successful representation of the perceptual world depends upon canceling out the discrepancy between prediction and sensory input (i.e., prediction error). Recent studies further suggest a distinction to be made between prediction error triggered by non-predicted stimuli of different prior precision (i.e., inverse variance). However, it is not fully understood how prediction error with different precision levels is minimized in the predictive process. Here, we conducted a magnetoencephalography (MEG) experiment which orthogonally manipulated prime-probe relation (for contextual precision) and stimulus repetition (for perceptual learning which decreases prediction error). We presented participants with cycles of tone quartets which consisted of three prime tones and one probe tone of randomly selected frequencies. Within each cycle, the three prime tones remained identical while the probe tones changed once at some point (e.g., from repetition of 123X to repetition of 123Y). Therefore, the repetition of probe tones can reveal the development of perceptual inferences in low and high precision contexts depending on their position within the cycle. We found that the two conditions resemble each other in terms of N1m modulation (as both were associated with N1m suppression) but differ in terms of N2m modulation. While repeated probe tones in low precision context did not exhibit any modulatory effect, repeated probe tones in high precision context elicited a suppression and rebound of the N2m source power. The differentiation suggested that the minimization of prediction error in low and high precision contexts likely involves distinct mechanisms.


INTRODUCTION
Our brain constantly predicts forthcoming sensory inputs. The predictive coding model of perception postulates that perception entails two distinct neurocomputational components, the top-down propagation of prediction and the bottom-up propagation of prediction error (Rao and Ballard, 1999;Friston, 2005Friston, , 2009Summerfield et al., 2008; for a review see Clark, 2013). The flow of information takes place between multiple hierarchical levels harboring both representational units and error units (Egner et al., 2010). While the representational units encode prediction about the causal structure of the environment and feed it backward to the next lower level, the error units encode the discrepancy between prediction and sensory input as prediction error and communicate it forward to the next higher level. The message-passing between hierarchical cortical levels iterates to match prediction and sensory inputs as much as possible to minimize prediction error in the system.
Recent research further suggested the necessity to distinguish between two conditions inducing prediction error: the unpredicted condition (where there is no precise prediction) and the mispredicted condition (where there is a precise prediction being violated). Conceptually, the unpredicted condition is mainly associated with prediction error generated by sensory input that is not anticipated, whereas the mispredicted condition triggers not only prediction error generated by sensory input that is not anticipated but also prediction error generated by prediction that is not fulfilled (Arnal and Giraud, 2012). The dissociation was supported by electroencephalography (EEG) evidence demonstrating that unpredicted and mispredicted stimuli are associated with different amounts of cortical activity (Hsu et al., 2015(Hsu et al., , 2018. Relative to predicted stimuli, unpredicted stimuli are associated with smaller neuronal responses whereas mispredicted stimuli are associated with larger neuronal responses on the N1 event-related potential (ERP) component, which is typically considered an electrophysiological indicator for automatic predictive processing (for a review see Bendixen et al., 2012).
The result pattern can be interpreted in terms of how the prediction error is adjusted depending on the precision of the sensory input (Friston, 2005(Friston, , 2009, which is suggested to be encoded by gain in superficial pyramidal cells in sensory cortices (Feldman and Friston, 2010;Auksztulewicz et al., 2017;Fardo et al., 2017). Precision refers to the inverse of a signal's variance, which quantifies the degree of certainty about the signals in general statistical usage (Feldman and Friston, 2010;Ransom et al., 2017). Unpredicted stimuli, relative to mispredicted stimuli, are embedded in contexts of larger variance (i.e., lower precision); therefore, prediction error is weighted less in the former than the latter (Schröger et al., 2015). The idea conforms to previous research on the mismatch negativity (MMN) which reported a significant difference when contrasting between a deviant sound embedded in an equiprobable sequence (i.e., a low precision context) and a deviant sound embedded in a standard sequence (i.e., a high precision context; e.g., Jacobsen and Schröger, 2001; for a review see Näätänen et al., 2005; but see Ahmed et al., 2011;Astikainen et al., 2011;Nakamura et al., 2011vs. Farley et al., 2010Fishman and Steinschneider, 2012;Kaliukhovich and Vogels, 2014; for an ongoing debate in animal research). The effect of contextual expectancies was also demonstrated by comparing mismatch responses to sounds with frequencies sampled from broad and narrow Gaussian distributions (Garrido et al., 2013). It was reported that sounds in the tail of the distribution evoked larger mismatch responses than those that fell at the center. Moreover, responses to physically identical outliers were greater when the distribution was narrower, indicating that the brain can implicitly track the certainty in distributions of events. It was suggested that manipulating contextual expectancies is equivalent to manipulating the precision of prediction errors higher in the processing hierarchy, which in turn has a modulatory effect on neuronal responses similar to that of attention (Auksztulewicz and Friston, 2015).
The differentiation raised the question whether contextual precision also affects how prediction error is minimized in perceptual learning, a major function of the predictive brain. Here, we conducted a magnetoencephalography (MEG) experiment with a novel design which orthogonally manipulated prime-probe relation (for contextual precision) and stimulus repetition (for perceptual learning which decreases prediction error). Specifically, we presented participants with cycles of tone quartet which consisted of three prime tones and one probe tone of randomly selected frequencies. Within each cycle, the three prime tones remained identical while the probe tones changed once at some point (e.g., from repetition of 123X to repetition of 123Y). Therefore, the repetition of probe tones can reveal the development of perceptual inferences in low and high precision contexts depending on FIGURE 1 | (A) Schematic representation of a cycle. In this example, the first tone quartet (F4-E4-G4-A4) was repeated four times before the second tone quartet (F4-E4-G4-D4) was repeated four times. (B) Schematic representation of a tone quartet.
Frontiers in Human Neuroscience | www.frontiersin.org their position within the cycle ( Figure 1A). In the beginning of a cycle where the three prime tones lack indicative value, probe tone X triggers prediction error in a low precision context (because listeners would predict a probe tone to be presented but cannot be quite sure of its frequency, resembling the aforementioned unpredicted condition). Its repetition thus reveals how a prediction is established from scratch, or how prediction error is minimized in low precision context for perceptual learning. In the middle of a cycle where the three prime tones are already associated with probe tone X, probe tone Y triggers prediction error in a high precision context (because listeners would tend to predict that after this particular tone trio a probe tone X would follow but such expectation is violated, resembling the aforementioned mispredicted condition). Its repetition thus reveals how a violated prediction is re-established, or how prediction error is minimized in high precision context for perceptual learning. Notably, previous neurocomputational modeling already proposed that stimulus repetition results in perceptual learning which might increase the precision weighting of prediction error (Garrido et al., 2009; for a review see Auksztulewicz and Friston, 2016). However, it is undetermined whether the putative increase in precision weighting of prediction error depends on its initial precision status. If the minimization of prediction error in low and high precision contexts involves distinct mechanisms, the repetition of tone quartets 123X and 123Y should modulate MEG responses in different manners. Specifically, the current research focused on examining the N1m and N2m modulations, given that responses in the N1m time window are sensitive to context precision (Näätänen and Picton, 1987;Hsu et al., 2015) whereas responses in the N2m time window show responsiveness to stimulus probability (represented in the MMN; Näätänen et al., 2005) and representation build-up (Karhu et al., 1997). These long-latency components are also reported to be mediated by top-down effects in cortical networks and therefore rest on backward connections (Garrido et al., 2007).

Participants
Eighteen healthy adults aged between 20 and 26 years (average age: 24; six males; 14 right-handed) with no history of neurological, psychiatric, or visual/hearing impairments as indicated by self-report participated in the experiment. This study was carried out in accordance with the recommendations of the ethics committee of National Taiwan Normal University (Taiwan) and the University of Jyväskylä (Finland). All subjects gave written informed consent in accordance with the Declaration of Helsinki. Four participants were excluded from data analysis for excessive measurement noise, leaving 14 participants in the final sample (average age 24; three males; 12 right-handed).

Stimuli
Sinusoidal tones with a loudness of 80 phons (i.e., 80 dB for tones of 1,000 Hz) were generated using Matlab. The duration of each tone was 50 ms (including 5 ms rise/fall times). The frequency of each tone was within the range of 261.626-493.883 Hz, matching the absolute frequency of a series of seven natural keys on a modern piano (i.e., C4 D4 E4 F4 G4 A4 B4).
A total of 90 pairs of tone quartets (consisting of three prime tones and one probe tone) were created. Each pair of tone quartets was identical in the prime tones but different in the probe tone in terms of frequency (e.g., F4-E4-G4-A4 and F4-E4-G4-D4). The frequency of the prime tones was determined by a random sampling without replacement, with the exception of any continuously rising or falling sequence to avoid the step inertia expectation (i.e., the expectation that the frequencies of upcoming tones continue in the same direction when the frequencies of previous tones are presented as a scale; Lange, 2009). The frequency of the probe tone can be anything except that of the prime tones.

Procedures
A total of 10 blocks of nine cycles were presented. Each cycle consisted of the repetition of a pair of tone quartets, where the first tone quartet was repeated 4-6 times before the second tone quartet was repeated 4-6 times. The reason we presented each tone quartet 4-6 times was to prevent participants from learning high-order regularities (e.g., correctly anticipating a change in probe tone). Therefore, a cycle could contain 8-12 tone quartets. While the repetition of the first tone quartet turned the initially non-predicted probe tone into a predicted tone in a low precision context (resembling the minimization of unpredicted error), the repetition of the second tone quartet turned the initially non-predicted probe tone into a predicted tone in a high precision context (resembling the minimization of mispredicted error; Figure 1A). Figure 1B illustrates a tone quartet, which started with a silent interval of 500 ms. Each tone was separated by a 500 ms stimulus onset asynchrony (SOA). 10% of the probe tones were of attenuated loudness of 20 dB (which were excluded from data analysis). Participants were required to press a key as soon as they detected a softer probe tone to maintain their attention. The offset of the probe tone was followed by a jittered inter-trial interval (ITI) of 700-800 ms. There was no separation between cycles distinct from the ITI. A fixation cross remained on the screen for the duration of the block. The whole experiment took around 42 min (i.e., 900 trials × 2,800 ms). Presentation (Neurobehavioral Systems, Inc., Berkeley, CA, USA) was used for stimulus presentation. Stimulation was randomized individually for each participant and delivered through two panel speakers situated to the left and right of the participant.

Data Recording and Analysis
MEG data was collected using a 306 channel whole-head device (Elekta Neuromag, Finland) in a two-layered magnetically shielded room at the University of Jyväskylä (Finland). The sampling rate was 1,000 Hz. A high-pass filter of 0.03 Hz and a low-pass filter of 200 Hz were used. Continuous head position monitoring was used based on five Head-Position Indicator (HPI) coils, with three at the forehead and two behind the ears. Electrooculography (EOG) was recorded using electrodes lateral to each eye and above and below the left eye. Offline, head movements were corrected and external noise sources were attenuated using the temporal extension of the source subspace separation algorithm (Taulu et al., 2005) in the MaxFilter program (Elekta Neuromag, Finland).
After the initial head movement correction, the data was analyzed using BrainStorm 3.2 (Tadel et al., 2011). Signal subspace projection was used to correct for eye blinks. The MEG signal was filtered at 1-40 Hz and segmented from −100 to 500 ms relative to the onset of the stimulus using a 100 ms pre-stimulus baseline. Segments with over 5,000 fT/cm peak-topeak values in gradiometers or 7,000 fT peak-to-peak values in magnetometers were rejected. As all tone quartets were repeated at least four times, segments to the 5th and 6th presentations of tone quartets were also rejected to ensure our analysis is based on equal numbers of trials. The trial numbers after artifact rejection in each condition are listed in Table 1.
The experimental effects were examined in source space. As individual magnetic resonance images (MRIs) of the participants were not available, the ICBM152 MRI template was used. The weighted minimum norm estimates (wMNE) were calculated using the unconstrained option to allow free orientation of the dipoles in relation to the cortical surface. A three shell spherical head model was used. The wMNE solution was restricted to the cortex. Noise covariance matrix was calculated from the baseline of the averaged responses.
To extract the N1m and N2m measures, we first identified the N1m and N2m from the grand average global field power (GFP) of the gradiometers (across 14 participants and 32 conditions; Figure 2A). The topographies for the magnetometers at N1m and N2m peaks are plotted in Figure 2B showing the magnetic field distribution. Then, we identified brain regions from the Dessikan-Killiany parcellation, which showed the largest source activity around the auditory cortices at the N1m and N2m, including transverse temporal, superior temporal, middle temporal, supramarginal, and postcentral regions ( Figure 2C). The grand average source solution (across two hemispheres, five brain regions, 14 participants, and 32 conditions) was used to identify the N1m and N2m time windows for statistical analysis. N1m peak was at ca. 110ms (time window 85-135ms) and N2m peak was at ca. 220ms (time window 170-270ms; Figure 3).
The source powers in the N1m and N2m time windows of the probe tones were submitted to the 2 (precision: low/high precision context) × 4 (repetition: 1st/2nd/3rd/4th presentation)  repeated measures analysis of variance (ANOVA). Greenhouse-Geisser correction was applied when appropriate (and will be indicated in the following section with epsilon values).

RESULTS
The ANOVA on the N1m source power showed only a main effect of repetition (F (3,39) = 6.33, p < 0.01, partial eta squared = 0.33; Figure 4, left). The effect was due to the larger response to the 1st presentation compared to all the other presentations ( Table 2). No significant differences were found between the response strength for the other presentations.

DISCUSSION
The current research used MEG to examine whether prior precisions modulate the cortical dynamics of the making of perceptual inferences. We presented participants with cycles of tone quartets which consisted of three prime tones and one probe tone, where the repetition of the probe tone can reveal the development of perceptual inferences in low and high precision contexts depending on their position within the cycle. We found that the two conditions modulate the N1m source power in a similar manner. However, there was a significant precision × repetition interaction on the N2m source power. While repeated probe tones in low precision context did not exhibit any modulatory effect, repeated probe tones in high precision context were associated with a suppression and a rebound of the N2m source power. The results confirm the necessity to dissociate the processing of non-predicted stimuli of different prior precision (Friston, 2005(Friston, , 2009Feldman and Friston, 2010;Arnal and Giraud, 2012;Hsu et al., 2015Hsu et al., , 2018Schröger et al., 2015). Moreover, it is likely that the minimization of prediction error in low and high precision contexts involves distinct mechanisms. In electrophysiology literature, N1/N1m is known to reflect multiple processes of signaling unspecific changes in the auditory environment (Näätänen and Picton, 1987;Crowley and Colrain, 2004). Mounting evidence of the N1/N1m predictability effect further supports that it indicates the operation of an internal predictive mechanism, as predicted stimuli were associated with robust N1/N1m suppression (Schafer and Marcus, 1973;Schafer et al., 1981;Lange, 2009;Todorovic et al., 2011;Todorovic and de Lange, 2012;SanMiguel et al., 2013;Timm et al., 2013;Hsu et al., 2014aHsu et al., ,b, 2016. Our findings confirm previous research by showing that such predictability effects resemble each other in low and high precision contexts, where prediction error is weighted differently. This suggests that changes in N1m source power cannot distinguish between the minimization of prediction error due to prediction formation (as in low precision context) and prediction alteration (as in high precision context). Instead, N1m seems to reflect the overall reduction in prediction error.
According to the predictive coding model of perception, prediction error can be adjusted depending on the precision of the sensory input (Friston, 2005(Friston, , 2009Feldman and Friston, 2010). Prediction error is weighted less in low than high precision contexts (Schröger et al., 2015), leading to smaller N1 responses to target tones following random then regular tone sets in EEG (Hsu et al., 2015). We speculate that the difference between low and high precision contexts might be less conspicuous here so that we did not obtain a main effect of precision on the N1m source power in MEG. In particular, in our previous experiment (Hsu et al., 2015), the difference between the two conditions depended on the regularity of their prime tones. That is, stimuli were preceded by either random or rising tones. However, in the current research, the difference between the two conditions depended on their position within the cycle of stimulus presentation. That is, stimuli were presented in either the beginning or the middle of each cycle. Such procedural differences between investigations might influence how much the two conditions differ in prior precision. This possibility should be systematically tested in future research.
Nevertheless, the differentiation between prediction error processing in low and high precision contexts was evident on the N2m source power. Specifically, tones in low precision context triggered smaller source power than tones in high precision context upon the 1st presentation and the 4th presentation. More importantly, stimulus repetition triggered different response patterns in low and high precision contexts. While repeated tones in low precision context did not exhibit any modulatory effect, repeated tones in high precision context were associated with a suppression and a rebound of the N2m source power. Previous neurocomputational modeling already proposed that stimulus repetition results in perceptual learning which might increase the precision weighting of prediction error (Garrido et al., 2009; for a review see Auksztulewicz and Friston, 2016). Our result pattern further suggests that such increase in precision weighting of prediction error can manifest differently at the cortical level depending on its initial precision status.
Specifically, novel probe tones presented in the beginning of each cycle are associated with lower prior precision, as listeners had a general expectation that a probe tone would appear but had little if any idea concerning its frequency. It is possible that, when the initial precision status is low, the modulatory effect of stimulus repetition is limited to the earlier processing stage. On the other hand, novel probe tones presented in the middle of each cycle are associated with higher prior precision, as listeners already formed predictions on its frequency during the previous stimulation within the cycle, but the prediction was not fulfilled. Cortical responses to these novel probe tones resemble more the MMN, which is interpreted as a failure to inhibit prediction error due to deviation from a learned regularity (Friston, 2005;Garrido et al., 2007). Thus, the U-shaped profile on the N2m due to stimulus repetition can be understood as follows. The N2m suppression (from the 1st presentation) is in line with the finding that the MMN vanishes with few stimulus repetitions (Garrido et al., 2009), indicating that the brain can efficiently adjust a perceptual model. Meanwhile, the N2m rebound (toward the 4th presentation) resonates with the finding of enhanced sustained field at around 200 ms to stimulus repetition (Näätänen and Rinne, 2002;Bendixen et al., 2007;Ylinen and Huotilainen, 2007;Recasens et al., 2015). It supports the notion that factors other than mere probability should be considered in order to account for the way perceptual model modification is implemented in the brain (Hsu et al., 2016). For example, there might be a gradual decrease in the bandwidth of the prediction tuning curve in the high precision context (cf. sharpening model for repetition effect; for a review see Grill-Spector et al., 2006). The sparser representation of prediction can paradoxically elicit an increase in prediction error. Alternatively, there might be a build-up of representations in prediction alteration. This can introduce a learning function of escalating sensitization which upweights neuronal responses (Karhu et al., 1997;Barascud et al., 2016;Southwell et al., 2017). Finally, there might be heightened expectations for the onset of a novel pair of tone quartets after the repetition increases in the current pair of tone quartets. Such preparation for change can also account for the rebound in the high precision context. Nevertheless, one should notice that the rebound effect achieved statistical significance at a rather liberal level. Its replicability should be established in future research.
Another possible explanation for the U-shaped profile on the N2m is that, among these brain regions around the auditory cortices, some showed repetition suppression and others showed repetition enhancement. However, due to the distributed nature of the MNE source estimate and the lack of individual MRIs (which increased the coregistration error between MEG and the template MRI), in the current research it is difficult to separate the activity from adjacent brain regions. This possibility would be better addressed with neuroimaging techniques with higher spatial resolution such as functional MRI (fMRI) combined with EEG or MEG in future research.
It is unlikely that the dissociation of probe tones in low and high precision context was due to confounding factors. Admittedly, although measures were taken to prevent participants from learning high-order regularities in the current research, it cannot be excluded that participants might become aware of the stimulus structure (i.e., the probe tones would change after 4-6 repetitions). However, if this happened, participants would expect changes of probe tones in both the low and high precision contexts. Therefore, it would only downsize the difference between low and high precision contexts and cannot account for the difference between conditions reported here. It is also improbable that the dissociation of probe tones in low and high precision contexts resulted from how much the probe tones differ from their preceding tones (i.e., the three prime tones) in terms of frequency. It is because the frequency of these tones was determined by random sampling. The allocation of these tones to low/high precision context was dependent on their position within a cycle (i.e., whether they were presented in the beginning/middle of a cycle) rather than their frequency. Meanwhile, it is noteworthy that in the current research participants were required to maintain their attention on the stimuli. It remains an interesting question whether the result pattern holds without participants' overt attention.
The dissociation of probe tones in low and high precision contexts is closely related to the mixed results in the literature of repetition-related effects as previously suggested in Hsu et al. (2015). Although repetition-related effects are commonly explained in the language of the Bayesian models of predictive coding (Summerfield et al., 2008), there is a puzzling juxtaposition of repetition enhancement and repetition suppression across fMRI research. While the repetition of unfamiliar stimuli was associated with enhanced neuronal responses, the repetition of familiar stimuli was associated with suppressed neuronal responses (Henson et al., 2000;Fiebach et al., 2005;Gruber and Müller, 2005;Gagnepain et al., 2008;Soldan et al., 2008;Subramaniam et al., 2012;Müller et al., 2013). It was proposed that stimuli of different familiarity differ in whether there is a pre-existing representation (Turk-Browne et al., 2008), which can be understood as the top-down activation of predictions (Cheung and Bar, 2012;Grotheer and Kovács, 2014). The repetition of unfamiliar stimuli (initially associated with no pre-existing representation) would resemble the development of perceptual inferences in low precision context. The repetition of familiar stimuli (initially associated with certain pre-existing representation) would resemble the development of perceptual inferences in high precision context. Interestingly, the dissociation of repetition-related effects in the current research did not manifest as repetition enhancement and repetition suppression. Rather, it was expressed as repetition suppression on N1m followed by a lack of modulatory effect on N2m in low precision context vs. repetition suppression on N1m followed by a U-shaped profile on N2m in high precision context. Future research is needed to develop theories that relate hemeodynamic responses to electrophysiological data.