Making Sense of Mismatch Negativity

Fitzgerald, Kaitlin; Todd, Juanita

doi:10.3389/fpsyt.2020.00468

REVIEW article

Front. Psychiatry, 11 June 2020

Sec. Schizophrenia

Volume 11 - 2020 | https://doi.org/10.3389/fpsyt.2020.00468

This article is part of the Research TopicSensory Information Processing Abnormalities in Schizophrenia and Related Neuropsychiatric DisordersView all 36 articles

Making Sense of Mismatch Negativity

Kaitlin Fitzgerald^*

Juanita Todd

School of Psychology, University of Newcastle, Callaghan, NSW, Australia

Evoked potentials provide valuable insight into brain processes that are integral to our ability to interact effectively and efficiently in the world. The mismatch negativity (MMN) component of the evoked potential has proven highly informative on the ways in which sensitivity to regularity contributes to perception and cognition. This review offers a compendium of research on MMN with a view to scaffolding an appreciation for its use as a tool to explore the way regularities contribute to predictions about the sensory environment over many timescales. In compiling this work, interest in MMN as an index of sensory encoding and memory are addressed, as well as attention. Perspectives on the possible underlying computational processes are reviewed as well as recent observations that invite consideration of how MMN relates to how we learn, what we learn, and why.

Objective

In this special issue, the reader is invited to consider “sensory information processing abnormalities in Schizophrenia and related neuropsychiatric disorders”. No issue on this topic would be complete with addressing apparent anomalies in the auditory event-related potential (ERP) component known as mismatch negativity (MMN). However, the growth in papers on MMN in schizophrenia since its first observation in 1991 (1) is formidable, and furthermore, it is exceeded by growth in the various applications for, and changes in the understanding of, MMN more generally. In this paper, we provide a review of MMN from fundamental background through to controversial new applications and in doing so we endeavor to present a perspective that represents a balance between a comprehensive and comprehensible scaffold for making sense of MMN.

Background

To perceive, interact with, and learn about our world is perhaps the most impressive of everyday feats. We access the world by little more than the cumulative activation of sensory neurons used to build a useful representation of an environment that is endlessly complex. In doing so, we are limited by the fact that our environment is richer in information than a limited and noisy sensory system could ever fully attend to, and the information itself is often imperfect. To properly understand sensation and perception therefore requires understanding both how sensation is produced in the world, and how our sensory systems could construct a meaningful representation of the world from these sensations especially when the information carried is uncertain. Bregman (2) defined the “job” of perception as “to take sensory input and to derive a useful representation of reality from it”. It is a challenging yet vital task that sensations are rapidly perceived and organized in order to guide adaptive behavior.

Studies of brain function have revealed strategies that may help simplify sensory processing by reducing the resources required for adequate perception. These strategies involve “short cuts” or heuristics, where assumptions are made which invite some possibility of error. One example is in the processing of repetition, where brain responses are observed to be smaller to a repeated stimulus compared to an equivalent novel stimulus. Predictive coding is a dominant theoretical model for this process, which sits among several alternative accounts that will be contrasted later in this review. These models recognize that our world is both ever-changing and constrained by regularity, and it is of little benefit to process a repeated stimulus as if we are encountering it for the first time on each repetition. Predictive coding in particular suggests that the brain is sensitive to the rate at which stimuli have occurred in the recent past and uses this information to actively infer the future state of the world (3, 4). That is, repeated and predicted stimuli require little effort to process, while neural resources are prioritized for processing novel events which are more likely to carry new and behaviorally relevant information. While this process is seemingly labor-intensive, the result is ultimately a more parsimonious use of neural resources which facilitates the complex task of translating sensation to perception.

This review culminates into a discussion of another possible heuristic—a first-impression bias in predictive coding where initial learning about the probability, transitions, and importance of a sound influences how that sound is later processed even after conditions change (5–8). This effect has been shown via the application of electroencephalography (EEG), a common neuroimaging technique, to study the MMN component of the ERP. MMN is well supported as an index of automatic change-detection which is elicited following any change to an established regularity in sensory stimuli, including sound, with its amplitude providing some quantification of the salience of the unexpected stimulus for processing [see (9–12) for reviews]. From a predictive coding perspective, MMN is viewed as a “prediction error signal” which can be used to study how the brain monitors environmental statistics to detect regularity and change, and generate top-down predictions which facilitate stimulus processing (3, 12, 13). This assertion was based on the notion of a system which adjusts rapidly to change in order to maximize predictive accuracy. However, the first-impression bias shows that to the contrary, the categorization of a stimulus when first encountered can be perseverative [e.g., (7, 8, 14, 15)].

The Case for Auditory Processing Heuristics

Auditory signals are often immediately informative for behavior (consider the urgency with which we respond to fire alarms, car horns, and cooking timers), yet are complex sensory signals to process. Auditory input can be endlessly layered; a single signal consisting of the summed output of every sound-producing object present in the environment at any given time (2). Representations of the sound environment and its constituent objects must be derived solely from temporal changes in pressure in this single composite signal imposed upon the ear. The transient nature of auditory input also leaves a limited time window for this complex process to occur. Auditory processing therefore presents a particular challenge to translate a complex signal under significant time pressure and respond with adaptive behavior.

A solution to the problem of complex processing in any limited system is the implementation of heuristics or short-cuts that serve to reduce and expedite processing. In an adaptive organism, these mechanisms should reflect an optimal accuracy-effort trade-off, where the chance of a negligible degree of error is accepted in exchange for an overall reduction in processing effort. Fortunately, for perception, we can often successfully apply the assumption that the macroscopic world has order. Therefore, a useful basis for heuristic processing in perception is patterns or regularities in the environment. There are many examples of the brain's use of these patterns to manage complexity and uncertainty. For example, in auditory scene analysis, the auditory system parses the single chaotic auditory signal into meaningful representations of discrete auditory objects on the assumption of ecologically valid regularities (2). Sounds are grouped on the basis of shared characteristics that increase the probability that they originate from the same source, such as consistent timber, continuation of a pattern or feature (e.g., step-wise ascending frequency), or termination at the same point in time. Heuristics also provide a means to infer sensory information that is lacking.

Bayesian perspectives provide a formal account of how regularities in the world are exploited to support perception. These models assert that complexity and uncertainty is optimally resolved through the use of probability statistics (16). When sensory data is missing or unreliable, it is inferred based on the relative probabilities of all possible states (e.g., possible distance from an object) and the likelihood they would produce the current data [e.g., that an auditory signal at a given distance would produce a sound of the current intensity (17)]. In contrast to frequentist statistics where conditional probability is calculated from many trials, a Bayesian approach represents probability as the likelihood an event will occur based on prior experience as well as previously held beliefs or information. Bayes' theorem specifies that the conditional probability of an event A given event B can be estimated as P(A|B) = P(B|A) P(A)/P(B), where P(A) represents a prior probability based on previously held beliefs about A, and P(B) represents some new data or observation related to event A. New information about event B causes P(B) to be updated, leading to a new calculation of posterior probability P(A|B), or an updating of one's prior beliefs about P(A) when new evidence P(B) is generated. Optimal perception is achieved when these estimates are then given appropriate weighting based on their uncertainty (16).

Importantly, the use of probability naturally entails that perception is not infallible but reflects an optimal effort/accuracy trade-off where some likelihood of error is accepted in exchange for the conservation of neural resources. Sensory illusions provide examples of the type of negligible errors that can occur when this shallower processing approach is adopted. To maintain an optimal level of accuracy, it is vital that the system can detect and differentiate potentially meaningful errors in order to adjust its estimates of the world accordingly. MMN, as a distinct component of the ERP that is elicited only by a detected change to an established regularity in the environment, has been isolated as a distinct neural marker of such error and change detection (3, 12, 13).

Introduction to the MMN

General Characteristics

The auditory MMN is an evoked response that appears in neurophysiological recordings as a brief negative deflection in amplitude following a sound that deviates from some established repetition or consistency in the recent past (18). In a laboratory setting, MMN is typically studied using an oddball paradigm, where it is observed following each occurrence of a low-probability “deviant” sound irregularly interspersed among a series of highly repetitive “standard” sounds from which it differs on some dimension (18–20). This additional negative component is most easily observed in a difference waveform produced from the subtraction of the response to the standard from that to the deviant. Onset is observed as early as 50 ms with a peak 100- to 250-ms post-stimulus, though latency and amplitude does vary with the specific characteristics of the sound sequence [(11, 21); see (22, 23) for reviews]. In ERP scalp recordings with the nose as reference, MMN is maximal at fronto-central electrode sites, often with a right-hemisphere preponderance, with a polarity inversion of this component at sites located at and around the mastoid bone (24). Table 1 presents a summary of many of the variables observed to impact MMN as reviewed below.

TABLE 1

Table 1 A number of variables observed to affect MMN amplitude*.

Discriminability

Early studies confirmed the separability of MMN from the highly similar N1 and N2b negative components on which it is often superimposed (18, 21, 39). The N1 is an exogenous response to the change in energy posed by a stimulus and is therefore observed to both standard and deviant tones (40). The N1 shows directional modulation, decreasing in amplitude with decreasing intensity of the stimulus whereas MMN reflects only the absolute value of a difference between standard and deviant stimuli [(19); see (11) for a review]. The N2b follows MMN [also referred to as the auditory N2a; e.g., (39, 41)] as a latter subcomponent that occurs only when the deviant stimulus is consciously and voluntarily processed, whereas MMN persists in the absence of conscious attention (20, 42, 43). Anatomically, MMN is unique in that it is more anterior than both N1 and N2b with modality-specific variations in topography [see (44) for a review] and is likely produced by distinct cortical sources (45–47). The reversal of polarity at mastoid sites in nose-referenced recordings is also unique to MMN and presents a useful way to isolate a measure of “pure” MMN from this overlapping N2b subcomponent (21, 48, 49).

Functionally, MMN is defined by two key characteristics: that it is context-dependent and does not rely on conscious attention to the stimulus. Whereas both N1 and N2b can be elicited by a deviant stimulus alone, MMN occurs only when the sound is interspersed among a series of repetitive standards (27, 45). Where N2b is only elicited when deviant stimuli are consciously attended and N1 is highly prone to modulation by attention, MMN will be observed to deviations in both attended and unattended stimulus streams and is far less permeable to attention effects [(21, 39); but see also (50)]. MMN is also independent of the later P3 component which reflects stimulus significance and attentional capture (51, 52). Cleverly designed control paradigms have ruled out the possibility that MMN could be an artefact of effects on these other exogenous components, cementing MMN as a distinct component that uniquely reflects stimulus discrimination and change detection processes [see (53) for a review]. The MMN has since become the most widely utilized method of studying same.

MMN and Sensory Memory

MMN generation is assumed to rest on the comparison of the incoming deviant stimulus to a stored neural representation of the standard, and can thereby provide a putative index of sensory memory formation and decay. MMN will only be elicited to a deviant sound presented in a stream of repetitive standards when it is sufficiently rare [probability of 0.30 or below (54)]. This sensitivity betrays two features of this change detection mechanism: (1) the ability to detect the actual physical difference in sensations, and (2) the extraction of patterns in sound and their relative probabilities. Both processes are dependent on the formation and short-term maintenance of a memory trace for the standard and deviant sound, rendering MMN a useful probe for the formation of sensory memory representations and their discrimination [(19); see (55) for a review]. In this review, the term sensory memory is used to refer to the brief retention of information about a sound that has just occurred, and we assume it adheres to estimated limits associated with passive memory decay [e.g., (56)]. Meanwhile, the term memory trace refers more broadly to any activated (or reactivated) state which includes, but is not restricted to, sensory memory. A predictive model at minimum is supposed to entail the additional property of being a memory trace associated with probability estimates regarding the likely “next state” (i.e., transition probabilities).

Encoding

MMN will be elicited following a deviation in any sound characteristic [e.g., frequency, intensity, duration, location; see (11, 19) for reviews]. Deviations may be characterized by simple departures from a single static feature of a repeated sound, or more complex regularities formed across multiple features of single tones or repeated tone pairs or groupings [e.g., changes in a repeated 5-tone serial sequence with a short stimulus-onset asynchrony (57), or an unexpected repetition within a series of two consistently alternating tones (58)]. The change may also be built into the experimental design of the sequence, such as changes in the interstimulus interval (ISI) in a stream of physically identical tone bursts (59). MMN also shows sensitivity to “abstract” deviations such as a change in the relative interval or direction of differences between adjacent tones [e.g., an occasional descending-frequency tone pair among a series of ascending-frequency tone pairs where no absolute characteristics of the tones are shared to form a physical or “first-order” standard (60); see (61) for a review] or where there is an unexpected stimulus omissions (62–64). Therefore, it is important to note that the terms “standard” and “deviant” refer not to individual tones necessarily, but rather the neural representations of a regularity and a violating event which can vary in complexity (12).

Discrimination

MMN latency is also highly variable and is considered to index the nature and difficulty of the standard-deviant comparison process as it is assumed that MMN will only be elicited after some “decision point” where an uncommon change is realized (65). This decision point may be impacted by the actual point of difference between the stimuli [e.g., will occur later for a longer-duration deviant than a shorter-duration deviant (11)] as well as discrimination difficulty. MMN latency is reduced where the two tones are more clearly distinct [(21, 51, 66); see (10) for a review] and will extend to as long as 200–300 ms in the case of barely discriminable differences (67).

MMN amplitude is also taken to reflect some quantification of discrimination difficulty (10). Broadly speaking, measured MMN can increase with two factors likely related to the clarity or certainty of a change: (1) the degree of physical difference between the repetitive and deviant stimulus, and (2) some quantification of the “strength” with which the regularity is encoded [the exact interpretation of this variable varies among models of MMN, as will later be discussed (12)]. MMN is larger when the difference between the standard and deviant is more marked, whether this is due to a greater degree of physical difference between the tones (19, 68) or concurrent deviation on multiple stimulus dimensions (10, 25). MMN amplitude appears to reflect the strength of the memory trace for the standard, increasing with the number of consecutive standards (26, 27, 69, 70), reduced ISI between sounds (71), and is reduced by backward masking the standard (28). Meanwhile, modulations associated with the degree of variability in sound have led to the assertion that MMN may additionally reflect some estimate of certainty or accuracy of this memory trace. MMN amplitude will increase with decreased variability in the characteristics of the standard (29), smaller local probability of the deviant (19, 30), and the overall period of time that a regularity has been stable [e.g., (8); but see later discussion of this study].

MMN and Attention

Another important feature of MMN is that it can be observed without conscious attention to the sound stream, suggesting that sophisticated sensory discrimination processes are initiated at the pre-attentive level (72–74). Observations of MMN have been made across passive listening conditions (21, 75), states of reduced consciousness such as coma and sleep (76–78), and in the absence of behavioral discrimination ability (31, 79). These observations have led to the conclusion that MMN is pre-attentive and reflects some “primitive intelligence” within the auditory cortex (18, 80, 81).

However, modulations of MMN amplitude with attention challenge the extent to which MMN can be considered truly pre-attentive. While a number of studies have displayed no difference in amplitude across ignored and attended sound streams (73, 74, 82, 83), an equally strong body of research has shown systematic increases in MMN amplitude with the level of conscious attention to a deviant (84–87). In an attempt to reconcile these findings, it has been suggested that attention effects reflect biased encoding of the memory trace for the standard, but deviant detection itself remains impermeable to attention (35, 88). MMN may therefore be best conceptualized as an index of sensory memory representations which is not dependent on attention, but can be manipulated by the effect of attention on how sensory memory representations are formed.

Automatic deviance detection is conversely thought to have implications for attention by serving as an information filter—a bottom-up signal of new information that can redirect attention toward the deviant sound. Source localization has consistently identified a frontal contribution to MMN generation which is thought to be responsible for this proposed attention switch [(24, 89, 90); c.f. (91)]. Frontal cortices have a specialized role in selective attention and orienting (92, 93) and typically show the same right-hemisphere preponderance which has been observed for MMN at frontal electrode sites [(91, 94, 95); however see (96) for discussion of left-lateralized MMN to speech and language deviants]. In accordance with this idea, MMN is regularly followed by the P3a component which is considered a neural indicator of involuntary attention capture with origins in frontal cortex (97–100). The three-stage model of involuntary attention (23, 101) assumes that MMN is responsible for initiating a series of upstream processes related to further evaluation of the deviant event (102). Specifically, this involves an involuntary direction of attention toward and subsequent evaluation of this change indexed by P3a (99), and the re-direction of attention back to the task at hand indexed by the reorienting negativity (25).

An important aspect of involuntary attention is the ability to appropriately filter relevant change such that only events of sufficient importance trigger an attention switch and the resulting distraction. Suitably, MMN and P3a amplitude appear to correlate with some quantification of the surprisingness or perceived importance of a deviant stimulus, increasing with the discriminability (103–105), task-relevance (104) and rarity of the deviant sound (106–108). Further, both components show attenuation with repetition consistent with a reduction of perceived stimulus importance as it is becomes familiar and a subsequent filtering of this information (51, 109–111).

Importantly, MMN and P3a are dissociable—MMN is not invariably followed by a P3a nor do their amplitudes reliably correlate (112–114). As a result, not every deviant event results in an attention switch (115, 116). Instead, it is more likely that the amplitude of MMN must exceed some variable threshold signifying its likely importance for behavior for the involuntary redirection of attention indexed by P3a to occur [e.g., MMN amplitude increases with deviant rarity (19, 117)]. These features serve the adaptive processing of new events—the ability to detect and direct attention for rapid evaluation, and subsequent habituation of this response in order to conserve resources once the stimulus is adequately assessed.

Scalp Topography and Brain Networks

The distinct relationship of MMN to both sensory memory and attention gives legitimacy to a dual-generator model of MMN generation. Näätänen and Michie (118) first noted the large MMN amplitudes observed at temporal and frontal sites as indicative of two generators likely to be separately responsible for pre-attentive change detection and directing neural resources toward the change (i.e., attention) as per the previously assumed functions of these respective cortices. Separate temporal and frontal generators have been consistently identified using various source localization methods [e.g., (24, 90, 91, 119)]. More recently, dynamic causal modeling (DCM) has repeatedly favored a network of hierarchical cortical sources comprising the primary auditory cortex (A1), superior temporal gyrus (STG), and inferior frontal gyrus (IFG), as will later be discussed in detail (120–122). Cumulative observations have built a strong case for the early suggestion that these frontal and temporal components are differentially responsible for these sensory memory comparison and attention allocation functions respectively (72, 118).

A temporal generator for MMN is localized in primary auditory cortex, and is considered the primary generator responsible for MMN elicitation [(46, 123); see (89, 101) for reviews). This temporal contribution was first observed in magneto-encephalogram (MEG) studies identifying an equivalent current dipole on the supratemporal plane of the auditory cortex [(46, 123); see (89) for a review], and subsequent support has been accumulated across electrophysiological, hemodynamic, animal, and lesion studies [see (44, 101) for reviews]. This generator is believed to be responsible for the sensory memory component of MMN elicitation, given its direct receipt of sensory input and unique sensitivity to stimulus features. Temporal activation systematically increases with the degree of deviation on a single given dimension (91, 119), shows additivity in the case of multiple deviant features (124) and is impaired under increasing competition for resources when deviants are present across multiple sound streams (125). The precise area of activation within the supra-temporal cortex is modality-specific, showing variation based on deviant type (24) and tone complexity (57).

An additional source in prefrontal cortex has been proposed to be uniquely sensitive to the assumed relevance of the stimulus for behavior and redirection of attention toward this change. Frontal activation during MMN production was first identified in scalp current density maps (24) and subsequently confirmed in positron emission tomography [PET; (126, 127)], MEG (128), fMRI (90, 91, 129, 130), and optical imaging studies (131). Both frontal lesions (33, 132) and transcranial direct current stimulation of frontal sites (34, 133) have been associated with a general attenuation of MMN amplitude, highlighting this generator as a necessary contributor to adequate MMN production. Where temporal activation is highly sensitive to specific stimulus features, activity at frontal sites appears more reliant on an overall evaluation of global stimulus relevance which occurs upstream of initial sensory discrimination processes (91, 119, 124, 134). Consistent with this, activation follows a rostro-caudal gradient comprised of an “early MMN” component in the STG and a latter component in the IFG (95, 119, 135). This frontal component is believed to be responsible for the proposed “attention switch” toward the deviant stimulus, on the basis that it shows the same right-hemisphere asymmetry observed in the fronto-parietal network underlying spatial attention and orienting (136–139). While the literature emphasizes these two distinct temporal and frontal contributions to MMN generation, it is important to acknowledge that numerous source analyses, dipole models, and depth recordings in both human and animal studies reveal that these contributions occur within a complex network of activation including sub-regions comprising both temporal and frontal sources [see (140) for a review].

The placement of these generators within a hierarchically organized system has led to discourse around whether MMN generation should be considered a purely bottom-up process [i.e., initiated in lower-level, pre-attentive, sensory cortices with a processing cascade to increasingly higher (more frontal) areas], or may be subject to a top-down modulation [i.e., higher-order (more cognitive) processes originating in frontal cortices]. Early observations supporting the involvement of dual generators suggested that the initial activation of lower-level, temporal areas preceded any input from higher order regions [e.g., (91, 95, 141)]. Observations of impaired behavioral task performance during presentation of non-attended deviants even in the absence of deviant awareness suggests that this is indeed the case (142). However, MMN has also shown an early permeability to top-down effects which supports the reciprocity of these components. For example, explicit knowledge of the global sound sequence will determine whether MMN is elicited (35). An early top-down influence is also necessary to explain a shorter latency observed to omission deviants by Wacongne and colleagues (143). Further support for a concurrent top-down modulation stream is provided by observed effects of prediction and expectation discussed in later sections.

Theoretical Accounts of MMN Generation

Naatanen (19, 72) acknowledged two possible theoretical interpretations for MMN—as either a legitimate memory-based ERP component or an artefact of differences in the adaptation of neurons tuned to the standard and deviant tones. Both perspectives offer an account of MMN which is substantial but non-exhaustive, and due to key differences are largely regarded as mutually exclusive. While the vast majority of studies into MMN since the 1970s have favored a memory-based account, there remains prominent discourse due to the explanatory power of the adaptation account and unanswered criticisms of memory-based perspectives (144, 145). These two lines of argument will be briefly expanded and the evidence for each reviewed, before the alternative possibility that these accounts could be unified as complementary components of MMN generation is presented.

Memory-Based Hypotheses

The “sensory memory” or “memory mismatch” account views MMN as a distinct cognitive component of the auditory ERP which arises from the active comparison of current input with a memory trace for recently encountered sounds (58, 89, 102). MMN shares a number of characteristics with memory processes. The temporal window of integration for MMN elicitation is estimated between 7 and 20 s (142, 146, 147) which is consistent with the 5- to 20-s capacity previously observed for auditory sensory memory stores (56, 148). Meanwhile, elicitation of MMN to a previous deviant after a long period of intervening sound patterns suggests that multiple memory traces can lie dormant in longer-term memory and be reactivated when the stimuli are re-encountered (27, 32).

A popular explanation attributes MMN to a specialized change-detection or “feature-detector system” which actively analyzes and encodes physical features for storage in sensory memory (19, 53, 72). While it was initially asserted that the temporal scale of MMN necessarily separated any such system from the exogenous differences in neuronal activity which produce N1 to simple afferent changes, recent single-unit studies extending the time course of stimulus-specific adaptation (SSA) to as long as 60 s (149) suggest that a contribution to deviance detection at the cellular level may not necessarily be excluded (150). In any case, given the sensitivity of more frontal brain areas to longer-timescale information (151, 152), these observations are also consistent with the temporo-frontal network of activation previously discussed (24, 91, 95, 119). This memory-based account therefore considers the response to the deviant as the sum of the exogenous N1 response and an additional MMN component (53).

Following the observation of MMN to deviations of increasingly complex abstract rules [see (61) for a review], it was concluded that deviance detection cannot adequately rest on the direct comparison of current input to an afferent memory trace, and must instead involve a more sophisticated stored abstraction of the world constructed over longer time periods (12, 153, 154). The elicitation of MMN in the absence of any afferent basis for deviation highlighted a predictive component to deviance detection—discrepancy arises not from the features of sensory input per se, but rather the unfulfilled expectation of that stimulus. This is best evidenced by the elicitation of MMN by an unexpected sound omission (155), or violations of relative properties between sounds where discrepancy cannot be deduced by the simple comparison of absolute physical characteristics [e.g., a descending frequency interval within a consistently increasing-frequency scale (19, 156)]. This revised “regularity violation” or “model adjustment” hypothesis assumes that future input is actively extrapolated from the current memory store, and “absorbs” input consistent with this estimate, leaving only the remainder for processing (101, 153). Subsequently, this model is adjusted to better extrapolate future events (12), and some have argued that it is this maintenance of regularity representations which is the key function of MMN rather than the detection of deviance (154).

The model-adjustment hypothesis is furnished by the observed flexibility and sensitivity of MMN to recent exposure. MMN will be observed after as few as 2–3 repetitions of a new sound and show rapid reductions in amplitude as a new tone is repeated. The predictive representations are quickly formed, highly dynamic and incredibly sensitive to current contingencies in the world (157). This memory-based interpretation therefore assumes a distinct population of neurons capable of producing MMN which contribute to higher order perceptual-cognitive operations and embed a type of “primitive intelligence” within the auditory cortex (19, 53, 81, 101).

The Adaptation Hypothesis

The adaptation hypothesis asserts the SSA of primary auditory cortex (A1) neurons tuned to the repeated standard sound would cause an attenuated N1 response much smaller than that produced by the “fresh afferents” tuned to the less probable deviant (144, 158). When compared, these responses would yield an additional negativity to the deviant sound in the 100- to 200-ms latency range of MMN. Take together, these ideas have led to the assertion by some that MMN represents a subtraction artefact rather than a distinct memory-based component. While this perspective accepts that long-latency and stimulus-specific A1 SSA may have a distinct and possibly specialized role in novelty detection (157), this is not commensurate with the functionally and anatomically distinct population of “comparator” neurons inferred by memory-based accounts (145). Rather, it is argued that A1 SSA is the single-unit correlate of MMN and the summed activity of A1 neurons is sufficient to account for the observed differences in the human ERP in the absence of any higher-order operation. The sensitivity of A1 SSA to multiple timescales—apparent in fast time constants of adaptation during short sequences and slow constants over long sequences similar to MMN—further demonstrated the ability of these simple, low-level mechanisms to mimic more sophisticated perceptual-cognitive effects (159, 160). On this basis, adaptation and memory-based hypotheses have been considered by some as mutually exclusive accounts of MMN production [e.g., (144, 145)].

Contention between the adaptation and memory-based interpretations of MMN is ongoing, given the outstanding criticisms and shortcomings for both hypotheses. Memory-based perspectives use observed differences in the morphology, topography, and sensitivity of the N1 and MMN as evidence that MMN arises from a distinct cognitive contribution to the deviant response [e.g., (116, 161, 162)]. Yet, empirical support for this idea is weakened by criticisms of the extent to which these differences reflect a pure measure of deviance, the absence of any direct evidence for the proposed population of neurons capable of this higher-order change detection, and a lack of consistent support from animal and intracranial studies [e.g., (163–165); c.f. (166)]. The adaptation account rests on conflicting studies which have failed to identify any unique change-specific activation in the response to a deviant sound [e.g., (167, 168)] and convincing demonstrations of neural refractoriness to produce MMN-like responses [e.g., (144)] with higher-order sensitivities [e.g., (160)]. These studies argue that the lower-level attributes of sensory neurons are in fact sufficient to account for any differences that might be observed in the response to the deviant sound including in both amplitude and topography [e.g., (169)].

However, neural adaptation also falls short of an exhaustive account of all aspects of MMN amplitude modulation. Adaptation fails to account for the large MMN elicited by repetition deviants (170–172), stimulus omissions (63, 64, 143), and unpredicted versus predicted deviant tones (173, 174). Further, additional negativity observed to a deviant tone using the previously discussed “controlled standard” or “many standards” paradigms reveals a modulation of responses that cannot be attributed to SSA (36, 161, 175). Here, the difference waveform is generated by the subtraction of the response to the same tone when separately encountered within a block of equiprobable control tones, necessarily ruling out any effect of physical differences in stimuli or the rate with which it was previously encountered. More recently, this paradigm has been widely adopted among animal studies and has provided compelling support for populations of cells along the auditory hierarchy which demonstrate genuine change detection as opposed to simple SSA (176–178). An additional important contribution to resolving such issues is strong evidence that the dominant influence over whether MMN is observed is reliant on transitional probabilities and not probability itself—a result inconsistent with stronger adaptation for frequent than for infrequent sounds (179).

The Predictive Coding Framework

More recently, a memory-based account of MMN generation has been formalized within the framework of predictive coding, a general theory of brain function which frames perception as the integration of sensory input with predictions about the likely characteristics of this input based on prior exposure (4, 180–182). From this perspective, MMN is considered the neural substrate of “prediction error” elicited when there is a discrepancy between current input and the prediction [such as when an unexpected deviant sound is encountered (3, 12, 183)]. Prediction error is a proxy for surprise which serves to (1) alert the system and direct neural resources toward the unexpected event [consistent with (23, 72, 81, 101)] and (2) trigger an update to the existing “prediction model” to integrate discrepant input [consistent with (12, 153, 184)]. Critically, predictive coding models rest on the assumption that neural populations dynamically adjust responding to minimize prediction error and optimize predictions over repeated exposure to a stimulus (3, 120, 121). By specifying parameters for model updating and a neurobiological scheme in which they might be implemented, predictive coding allows for structured models in which memory-based mechanisms can be tested.

Predictive coding models MMN generation within a cortical hierarchy which uses reciprocal forward and backward connections to integrate input with predictions (3, 184). Afferent input is communicated “bottom-up” via forward connections from sensory cortices, while predictions about this input are communicated “top-down” via backward connections from higher brain areas (4, 181). The prediction error quantified by MMN is determined by the relative strength of intrinsic (within-area) and extrinsic (between-area) connections to modulate responding via changes in synaptic efficacy and sensitivity (3, 184). Higher cortical areas work to “explain away” predicted input via top-down suppression of error units to redundant sounds, while lower level areas feed forward an exuberant bottom-up prediction error to any aspects of input which are not predicted (3, 181).

Computational models have had some success in explaining the activity of neural populations during predictive coding via empirical Bayesian methods of prediction generation and updating (3, 180, 185). Empirical Bayesian approaches involve estimation of posterior probability based on a prior probability distribution derived from observation. This specific approach is in contrast to standard Bayesian approaches where the prior distribution is pre-defined. Sensory information is often limited, and a Bayesian perspective affords computations by which the brain effectively fills in the gaps for perception (17, 186). To maximize the accuracy of these estimates the “internal model” (180) or “prediction model” (12) is specified by Bayesian estimates of likelihood (probability that the given sensation would be produced by a particular cause) and a prior (the probability that cause would be encountered), which is based on previous observation and continually updated in line with the current observation. Where prediction error occurs, these estimates are updated to consistently reflect the most recent state of the world.

The relative influence of prediction errors at any one time is further weighted by estimates of “confidence” (12) or “precision” (3) which reflect the expected accuracy of the prediction model and are embodied in the post-synaptic sensitivity or gain of populations encoding prediction error units (122, 185, 187, 188). The more accurate a prediction has been in the recent past the stronger top-down suppression and less permeable it is to immediate revision following prediction error. Conversely, in more variable and unpredictable environments, larger prediction errors will impact the prediction model which is more readily adjusted. This variable weighting of observed data is further represented as a hierarchical implementation of Bayesian methods, where the estimated probability is derived from estimates of several inter-dependent values. This updating of stored representations over multiple encounters of a stimulus, referred to as perceptual learning, shares commonalities with more general optimal learning algorithms such as the Kalman filter (189). Empirical Bayesian methods of estimation provide constraints for predictive coding which can feasibly be transcribed on neuronal populations to ensure the optimal minimization of perceptual uncertainty at all levels of the cortical hierarchy [see (185) for discussion].

A hierarchical Bayesian model of predictive coding as described above is theoretically sufficient to account for numerous aspects of change detection including enhanced gamma-band (190, 191), blood-oxygen-level-dependent (91, 119), and electrophysiological responses to a deviant sound (3, 121, 143, 183), the prediction-dependent suppression of responses to a standard sound (192–194) and reductions in MMN onset latency with repetition as a result of top-down facilitation (87, 193). The proposed hierarchical structure is in accordance with a temporo-frontal network of MMN generation [e.g., (24, 95)] where more frontal areas display longer latencies of activation [e.g., (195)] and there is a disinhibition of responses to the standard when these frontal areas, responsible for top-down suppression of error signals, are lesioned (33, 196, 197).

At the neural level, the N-methyl-D-aspartate (NDMA)-dependent plasticity of cortical connections provides a feasible basis for predictive coding, given that NMDA receptors have been implicated in both synaptic learning and MMN generation [(3, 185, 198, 199); see (171) for a recent model] and MMN itself has been proposed as an index of NMDA-receptor (NMDAR) function (200). More recently, DCM has provided more direct empirical support for predictive coding by demonstrating that changes in cortical connectivity during deviant versus standard sound processing is best explained by a hierarchical model with nodes in primary auditory, temporal, and prefrontal cortices comprised of both forward (bottom-up), backward (top-down) and lateral (within-area) connections (3). Taken together, these results provide cumulative support for a hierarchical generative model of MMN generation, where synaptic plasticity between a hierarchy of brain areas is used to generate and optimize predictive inferences about sensory input to facilitate perception in line with empirical Bayes. MMN is a functional neural substrate of prediction error which reflects a synergy of smaller-scale sensory processes within and between cortical areas in order to construct higher-order memory representations.

Uniting Predictive Coding and Adaptation Accounts

Computational models of predictive coding also have the capacity to unify the conflicting adaptation and memory-based accounts of MMN generation (3, 121). Predictions are modeled as adjustments of the post-synaptic sensitivity of intrinsic and extrinsic connections which are optimized over repeated exposures to a stimulus to minimize prediction error. Reduced sensitivity at the neuronal level within these models resembles the SSA of A1 neurons which forms the basis for the adaptation hypothesis [e.g., (201)]. DCM studies have consistently demonstrated that a network comprised only of intrinsic connections, representing an adaptation-only account, is inferior to more distributed network models in explaining MMN generation (3, 120, 122). These computational models therefore support the earlier suggestion that in fact neither of the competing accounts alone are sufficient (3) and that the explanatory power of one account does not necessarily render the other obsolete (53).

One criticism of the adaptation account has been the interchangeable use of terms relating to active adaptation and passive refractoriness in the MMN literature which lead to interpretive error (202). The predictive coding models constructed by Garrido (120–122) emphasize the purposeful adjustment of the post-synaptic sensitivity or gain of error units [i.e., what O'Shea (202) argues is true adaptation, as opposed to passive “sluggish” refractoriness] as crucial to optimizing predictive processes. This is consistent with a conceptualization of MMN as reflecting a compound mismatch process, of which both a sensitized response to a deviant sound and suppression of response to a repeated sound are necessary components and are adequately captured by predictive coding (203). A move toward a unified account of MMN is also being observed in animal models. A similar sensitivity to deviant probability and degree of difference shown by auditory SSA and MMN has led to the suggestion that auditory SSA likely represents an early single-neuron correlate in auditory cortex which is necessary but not sufficient to explain the longer-latency MMN response arising from a compound of primary auditory and higher cortical areas (204–206). This is consistent with more recent research delineating the reduction of early (40–60 ms) latency components with repetition which is presumed to arise from SSA, from that observed in later (100–200 ms) latency components which is exclusively reliant on prediction (194).

A First-Impression Bias in Auditory Processing

While the utility of MMN rests on this ability to flexibly represent up-to-date probability statistics, a growing body of research suggests that MMN amplitude modulation does not always consistently reflect environmental change. First-impression or primacy bias refers to the novel observation that MMN amplitude to two tones will show differential patterns of modulation over the course of a changing sound sequence based on their relative probabilities when first encountered at sequence onset. This lasting effect of initial learning on subsequent processing demonstrates that while MMN amplitude may be highly dynamic, it can be biased by prior experience. These more novel studies therefore suggest that MMN does not necessarily provide a veridical representation of the current state of probability statistics at any given time.

Experience Matters: An Order-Driven Effect

The first-impression bias is revealed and studied using an augmentation of a traditional oddball sound sequence termed the multiple-timescale paradigm, depicted in Figure 1, where two tones alternate in the role of standard and deviant across two block types (represented as dark versus light boxes in Figure 1 and hereafter referred to as first and second context) at different rates between sequences. The term multi-timescale reflects the fact that there are visibly both local regularities (within the blocks) and longer-term regularities (in regular block length).

FIGURE 1

Figure 1 Representation of original multiple-timescale sequence. Depiction of sound sequence design in the multiple-timescale paradigm used by (8). Dark blocks represent “first context” blocks where one tone is presented with standard probability (p = .825) and the other tone with deviant probability (p = .125). Light blocks represent “second context” blocks where these tone probabilities are reversed (i.e., the originally standard tone becomes the deviant and the originally deviant tone becomes the standard). Sound sequences were created using these block types with different lengths, forming a “slow change” sequence consisting of 2.4-min blocks, and a “fast change” sequence consisting of 0.8-min blocks.

Traditional accounts of MMN as a highly dynamic confidence-weighted error signal might lead us to suppose that MMN amplitude will show a consistent and parametric increase with the stability of current patterns which rapidly adjusts when these patterns change. It follows that MMN should therefore be larger in blocks of longer duration for both first and second context blocks. The multiple-timescale paradigm has revealed that MMN to the two tones throughout the course of the sequence remains differentially sensitive to the stability of current patterns based on probabilities of these two tones at sequence onset. MMN was only larger in longer, more stable (2.4 min) blocks compared to shorter, comparatively less stable (0.8 min) blocks for the tone which was initially in the role of deviant [i.e., in first context blocks (8)]. MMN to the tone which was deviant in the second context (i.e., MMN to the tone which initially occurred with standard probability, after it became a deviant; in the second context) did not differ in amplitude across periods of relatively longer or shorter pattern stability (8, 15). To illustrate these effects, data from (15) are reproduced in Figure 2A where the black dots depict MMN amplitude to deviant that were 60 ms in duration among common tones that were 30 ms in duration. The white diamonds in Figure 2A depict the MMN amplitude to deviants that were 30 ms in duration among common tones that were 60-ms long. The data on the left depict the MMN amplitudes when these sounds were deviant in the grey blocks of Figure 1 (i.e., the first context), while those on the right depict the MMN amplitudes to the same tones when they were deviant in the white blocks of Figure 1 (i.e., the second context). It is clear from Figure 2A that MMN is only larger in longer blocks for sounds that were the deviant encountered in the first heard context, irrespective of tone feature; that is, this is an order-driven effect. MMN in these sequences, under these experimental conditions, did not provide a veridical representation of probability statistics in both contexts as traditional accounts would predict. All of the data presented in Figure 2A was acquired from participants naïve to the sequence in that they had not participated in any previous multiple-timescale study and did not know about the sequence structure. Each participant was told that brain activity being measured occurred automatically and was best measured when participants ignore the sound and focus on the task of watching a DVD with subtitles. This finding violates the idea that the confidence weightings which underlie MMN generation are solely governed by current (local) probability statistics.

FIGURE 2

Figure 2 Data from published multi-timescale studies. Detailed descriptions of the studies are provided in text. (A) Mean MMN amplitudes obtained from studies using a long block sequence before short block sequence. Black dots and white diamonds represent mean MMN amplitudes obtained in (15) where participants heard the sequences first with the long tone as the deviant in the first context and then the short tone as the deviant in the first context. The red squares represent mean MMN amplitudes obtained in (207) when the long tone was the deviant in the first context, but participants were first informed about the structure and the composition of the sequences before hearing them. (B) Mean MMN amplitudes obtained in (208) where the long tone was the deviant in the first context. Data show amplitudes obtained from the whole sequence (black dots), the first and second encounter with a given long block context (white diamonds), and the early and later half of the long blocks (red squares).

In a subsequent study designed to investigate the mechanisms underlying these order effects, the data from within blocks was divided to look separately at what happened to MMN early in the blocks when a local model had just been established (1^st half), versus later in blocks once the model had been stable for a while (2^nd half, see graphic in Figure 2B right). When examining MMN amplitude change within blocks, the differential effect of stability in first and second contexts was most pronounced in the first half of blocks immediately after tones change roles and effectively “washed out” such that there was no difference between MMN to the two tones as deviants when comparing the latter half of sequence blocks (209). The first-impression bias therefore appeared to arise from some order-driven bound on the accumulation of predictive confidence which was formed at sequence onset and skewed pre-attentive sensory processing toward the confirmation of what was first learnt until sufficient evidence to override this first learning was accumulated. This difference between block halves has been repeated in subsequent studies and an example of these half-effects for the longer blocks is presented in Figure 2B in data reproduced from (208). In Figure 2B, the black dots depict MMN to deviants that is calculated from all relevant blocks of the sequences and the red squares depict the MMN amplitude when calculated from the early period of the two long blocks (1^st half) to the left of the black dots, and from the later period of the two long blocks (2^nd half) to the right of the black dots. It is clear in Figure 2B that MMN to the deviants in the first context are large throughout long blocks of the sequence, while MMN to deviants in the second context long blocks start smaller and amplitude increases as the block continues. These differences over block half contrast the relative equivalence of the MMN amplitudes evident in averages taken from the first and second encounter of the long blocks (see graphic Figure 2B, right). In Figure 2B, the white diamonds depict MMN amplitude for the first block encounter of the first and second context presented to the left of the black blocks, and that to the second block encounter for each context is presented to the right.

The novelty of this first-impression bias generates a series of important questions that must be addressed in service of a comprehensive understanding of the form and function of MMN and its contribution to perceptual-cognitive processes. The first requires establishing to what extent this modulation is attributable to temporal order effects over and above any other characteristic of the sound sequence (e.g., the physical properties of the tones). Should the observed effect be confidently attributed to tone order, there follows the question of to what extent it generalizes across sequence structures, tone types, and deviations. As noted, Figure 2A is derived from sequences in which the two sounds differ in duration (30 and 60 ms) and the same pattern of MMN amplitude modulation is obtained for the two block types whether the long tone or the short tone is rare in the first context (i.e., it is order-dependent not feature-dependent [(15), see also (210)]. Certainly, there is also evidence that similar modulation patterns can be observed using frequency deviants (7) and spatial deviants (208) offering support to the notion that it is a general order-driven effect.

Order-driven effects on MMN amplitude have elsewhere been observed in a study of shorter sound sequences where tones of different frequency switched roles as standard and deviant only once (160) and where authors attributed this to longer-timescale adaptation effects exerting bottom-up influence on ERP amplitudes. The study was designed to replicate earlier work demonstrating the impact of long-term SSA of single neurons on standard and deviant ERPs (159). The authors concluded that an initial “suppression” of MMN amplitude to a deviant with a long history of repetition after tones change roles could reflect the existence of similar SSA mechanisms occurring over multiple timescales in the human auditory cortex simultaneously (160). Longer timescale adaptation was demonstrated lasting up to 10 s, alongside a faster adaptation time constant of 1.5 s to local patterning, and appeared to show similar development to that seen in single-neuron studies of the cat auditory cortex (159). Costa-Faidella and colleagues (160) further demonstrated the successful prediction of MMN amplitude modulations through a linear model of local and global adaptation effects and argued that order-driven effects such as those observed by Todd and colleagues (8) can arise from basic, bottom-up properties of the auditory system. One difference between the two studies was the repeated alternation of tone arrangements in (8). In Figure 2B, the breakdown of data gives us an opportunity to examine the response to a deviant sound that has never been common (block 1, graphic on the right), versus the response to the same sound when it has just been common (block 3, graphic on the right). Based on SSA effects, we would assume the response to deviants in block 3 to be much smaller than in block 1, and the difference between deviants in blocks 3 and 4 to be diminished relative to differences between blocks 1 and 2. This is clearly not the case in data represented by the white diamonds.

Interestingly, the MMN amplitude modulations that occur are very different if a group of participants are first shown Figure 1 diagram and told about the sequence structure before being given the same instruction about the automaticity and ignoring the sounds while watching a DVD with subtitles. Under these “informed” conditions, the long > short block MMN amplitude modulation is absent for both contexts (see red squares, Figure 2A). Finally, if these same sequences are heard by participants who are performing a more cognitively demanding visual task the results are different again with MMN amplitude in long > short blocks for both contexts (207).

Every dataset displayed in Figure 2 emerged from the same two sounds with the same local probabilities (a 60 and 30 ms, 1,000 Hz pure tone at p = 0.875 when common and p = 0.125 when rare), and yet the MMN amplitude modulation patterns are quite different. This compilation of data illustrates that the MMN amplitudes produced to these simple sequences are highly dependent on the longer-term sequence structure, and the learning environment in which they are heard. This pattern occurs in a way that seems difficult to account for by SSA—at least not SSA considered to arise as an inevitable suppression of response based on a recent history of frequent presentation. In the following section, we explore a more complex account of order-effects that might accommodate these puzzling observations.

A Hierarchical Bayesian Perspective

The first-impression bias can be captured by the implementation of predictive coding within a hierarchical Bayesian learning scheme, where the processing of sensory input at each level is modulated by top-down priors which are weighted by estimates of confidence or accuracy and based on information collected over longer time periods (3). The influence of these backward connections embodies predictions, enforcing the suppression of prediction error units to a predicted sound in a manner that reflects expected precision based on the previous stability or predictability of the environment [i.e., gain control (211)]. The interpretation offered for the bias is that in the absence of a pre-existing prior for the two sounds at sequence onset, there is a rapid accumulation of precision for the initial deviant as rare and informative and the initial standard as redundant and uninformative (8, 212). These high confidence weightings equate to strong top-down predictions which are highly effective in suppressing prediction error to the uninformative standard tone. When tone roles change, the ability to accept this initially uninformative tone as a potentially important deviant is then limited as this highly suppressed error signal has a minimal impact on learning rate, leading to marked differences in how the two tones are processed as deviants.

The differential effects observed to the two tones are dominated by modulations of the deviant ERP, suggesting that it is principally the processing of surprise rather than redundancy which is biased [however, see (213) for more subtle order-dependent modulation of the standard ERP]. Modulation of response to the deviant tone is consistent with predictive coding, where precision or gain is specifically reflected in how effectively prediction error to the deviant sound is suppressed. While neural adaptation has previously been shown to influence MMN (160), this explanation alone is insufficient to explain why bias patterns persist throughout the duration of the sequence. Under this account, it would be expected that a similar suppression would be observed to subsequent presentations of the first context blocks after the first deviant has spent a period of time in the role of standard—Figure 2B shows that this did not explain the data in this case. Neural adaptation would also struggle to account for how, when a prior exists (informed condition, red squares, Figure 2A), this difference in weightings for the two contexts does not occur (207). These modulation patterns may instead be linked to some form of higher-order representation which is effectively re-activated each time the first context block is re-encountered. Prediction models have previously been shown to have a degree of context specification, given that no tone can behave as both standard and deviant in a given context (214). In this way, predictive coding could offer a sufficient mechanistic explanation of the first-impression bias as evidence for the influence of tightly held, top-down representations of sounds on future sound processing which involve some form of higher level, semantic categorization. Accordingly, the different data acquired from informed participants (Figure 2A) may indicate that foreknowledge enables more flexibility in model updating as a function of knowing in advance that category memberships will change.

Multiple Timescales of Statistical Learning

The presence of regular block lengths is central to the observed patterns of bias. The differential modulation patterns to first and second deviant tones do not occur if the four longer blocks are intermixed with the 12 shorter blocks such that there is no predictable longer term temporal structure (213). In this study, sequences always started with a long block of the 60-ms tone as first deviant, and blocks always alternated tone probabilities, but there was no regularity in the block alternation rate. Under these circumstances, the MMN amplitudes were larger with longer local regularity for both contexts. However, larger MMN amplitudes for longer blocks are not observed if the four longer blocks occur after a regular pattern of twelve shorter blocks. In this case, MMN amplitudes in longer blocks are either equivalent throughout the entire sequence for both contexts (215), or indeed larger for the shorter blocks than the longer blocks for the first context (14). It has therefore been suggested that high precision associated with the first context remains influential if the longer-term environment is predictable, but will be lost if the environment changes in an unexpected way (e.g., blocks are shorter or longer than expected). This explains an expected short < long block pattern for first-context if the long blocks are first, but counteracts this pattern if the long blocks are second.

The importance of first-impressions is perhaps even more convincingly demonstrated in a recent three-tone multiple-timescale sequence (37). In this study, three sounds were arranged in blocks where two were equally common and one was rare, and the probabilities rotated creating three different block types (i.e., probabilities, A < B = C, B < A = C, C < A = B). The sequences included two of each block type with three versions—one starting with A < B = C, one with B < A = C and one with C < A = B. While MMN was generated to the rare tone in each block of all sequence arrangements, the MMN generated to any deviant in any block was always significantly smaller if the sequence began with two common sounds with the highest spatial separation (90° left or right). In other words, despite equivalent sound compositions within blocks inside the different sequences, the auditory system assessed the configurations in which the two common tones were adjacent in space (within the three locations used) as less volatile compared to when they were highly separate. However, remarkably, the effect of this increase in volatility was only evident when the more volatile environment was encountered at the beginning of the sequence. A volatile first-impression at sequence onset led all deviance-related responses to be significantly lower in amplitude for the ensuing 12-min period.

Implications for Cognitive Neuroscience

First-impression bias has been interpreted to reflect a sensitivity to information collected across multiple timescales that alters model updating [e.g., (7, 15)]. High confidence in initial tone roles leads to slowed accumulation of confidence for the new roles once these change, but it appears to be prevented by sequence foreknowledge (207). At a local level, the inversion of tone probabilities overwrites the current prediction model, but some memory of the first impression remains and seems to be reactivated when the initial block structure is encountered again (15). This may suggest, for example, that the reversal of probabilities becomes treated as a temporary departure from the initial prediction model, revealing the ability of the auditory system to maintain information beyond even the longest 30-s temporal limit previously proposed to apply to MMN (216).

The interpretations offered above are controversial in their opposition of much of what is considered “known” about the mechanisms and meaning of MMN. Instead, they appeal to more sophisticated models of learning which have gained favor elsewhere. The Hierarchical Gaussian Filter (HGF), for example, provides a revision of Rescorla and Wagner's (217) model of associative learning where rather than learning rates being directly proportional to error, they are hierarchically weighted relative to various degrees of uncertainty that more closely imitate a stochastic real-world environment (218, 219). The HGF assumes that learning proceeds in a Bayes-optimal fashion similar to the hierarchical precision weightings described above. In the same way that a single repetition of a sound is not sufficient to elicit MMN (220), this ensures that new learning is not triggered by chance fluctuations. These similarities may speak to the generality of these fundamental learning mechanisms to perception. The first-impression bias therefore has potential not only to advance our understanding of MMN as a neural marker for basic brain processes but could itself as a tool for probing more complex neural computations.

Order effects on MMN are part of an evolving literature encouraging a revisiting of assumptions about the processing underlying MMN. Studies that emphasize the centrality of transitional probability rather than probability per se [e.g., (179)] are consistent with the notion of future state predictions being a priority for sensory information processing (3, 184) even in task-independent listening where sound has no direct implications for behavior. The order effects we have reviewed here prompt a reconsideration of the timescales over which predictive processing is operating. The critical influence of volatility estimates demonstrated here necessarily reflects longer-timescale attributes of sequential sound presentation. In conclusion, these observations introduce the potential for new applications of MMN as a tool in cognitive neuroscience and expand the questions and interpretations that might be put forward in its use to explore “sensory information processing abnormalities in schizophrenia and related neuropsychiatric disorders”.

Application in Clinical Cognitive Neuroscience

MMN, in general, is a useful candidate for exploring basic cognitive neuroscience and psychopathology. First, MMN has good individual test-retest reliability and sensitivity to inter-individual differences in a number of domains (71). Given that MMN elicitation relies on the detection of discrepancy between a deviant and standard sound it thereby provides a means to measure individual auditory discrimination ability (18). Studies have demonstrated the use of MMN to infer individual performance on related processes ranging from memory trace formation (82, 221), auditory stream segregation (174) and regularity extraction (79). Variation in MMN has also been used to track intra-individual changes, which can be useful in both general observation and the assessment of intervention effects. MMN has previously been used to measure the effects of pharmacotherapy [e.g., (222)] and auditory training [e.g., (223, 224)] and holds promise as an endophenotypic marker of dysfunction in certain conditions including pathological aging [e.g., (225, 226)] and psychotic disorders [e.g., (222, 227, 228)]. Another advantage of MMN as a clinical measure is that the change detection process underlying MMN appears to be initiated at least in part in an early, pre-attentive level of the cortical hierarchy (19). MMN elicitation does not require a participant to consciously attend to stimuli and provides a means to study a wide range of cognitive operations in populations where attention or motivation may be lacking, such as children or the very impaired (38, 229). In both clinical and healthy populations, MMN can be used to elucidate automatic or pre-attentive cognitive processes and their downstream effects on voluntary and controlled processing (230).

Variables listed in Table 1 include many that can be experimentally manipulated to investigate a rich array of questions in clinical groups, and the potential to explore the influence of multiple-timescale patterning within sequences and order-effects of volatility are now added inclusions in this suite. The caveat highlighted here however is that MMN is elicited within a specific learning environment created by the experimenter. The behavior of the inferential system under investigation will likely be nuanced by what predictions the system is attempting to optimize [see (213) for discussion]. Even within studies of schizophrenia, arguably, the most mature clinical application of MMN, there are inconsistencies in studies attempting to identify the core anomaly in the underlying system. Although reduced amplitude MMN is a highly replicable finding with large effect sizes [e.g., (231)], evidence for whether this reflects purely an impaired response to deviation (232, 233) or impaired encoding of regularity as well as deviance (26, 234) is mixed. Similarly, attempts to localize the deficit within the inferential network yield inconsistent findings [see (235, 236, 237) for reviews]. While deficient formulation and encoding of valid predictions is considered a central feature of psychotic phenomena (238), and auditory inference is an excellent methodology in which to study the integrity of valid predictions, a full mechanistic understanding of the underlying causes remains elusive. Ultimately, a deeper understanding of the differences within an inferential system remains reliant on a deeper understanding of the system itself, and here, we propose that a consideration of timescales of learning adds a potentially informative consideration in understanding how paradigms might differ, and how group differences might differ across paradigms.

Author Contributions

KF completed the first draft of this document. JT edited the document and added significant contributions to content.

Funding

KF acknowledges receipt of Australian Postgraduate Award scholarships. This research was supported by funds provided by the National Health and Medical Research Council of Australia (APP1002995).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This manuscript was based on an introductory section to the PhD thesis submitted by author KF under the supervision of author JT. The work benefitted significantly from constructive feedback from thesis markers Professor Dean Salisbury and Professor Greg Light. We also gratefully acknowledge additional feedback from Dr. Bryan Paton on this manuscript.

References

1. Shelley AM, Ward PB, Catts SV, Michie PT, Andrews S, McConaghy N. Mismatch negativity: an index of a preattentive processing deficit in schizophrenia. Biol Psychiatry (1991) 30(10):1059–62. doi: 10.1016/0006-3223(91)90126-7

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Bregman AS. Auditory scene analysis: the perceptual organization of sound. Cambridge, MA: The MIT Press. (1990). Retrieved from https://books.google.com.au/books/about/Auditory_Scene_Analysis.html?id=jI8muSpAC5AC&redir_esc=y.

Google Scholar

3. Friston K. A theory of cortical responses. Philos Trans R Soc Lond B Biol Sci (2005) 360(1456):815–36. doi: 10.1098/rstb.2005.1622

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Rao RPN, Ballard DH. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci (1999) 2(1):79–87. doi: 10.1038/4580

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Frost JD, Winkler I, Provost A, Todd J. Surprising sequential effects on MMN. Biol Psychol (2016) 116:47–56. doi: 10.1016/j.biopsycho.2015.10.005

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Todd J, Provost A, Whitson L, Mullens D. Initial uncertainty impacts statistical learning in sound sequence processing. J Physiol Paris (2018) 389:41–53. doi: 10.1016/j.neuroscience.2018.05.011