Dynamics of the Auditory Continuity Illusion

Cao, Qianyi; Parks, Noah; Goldwyn, Joshua H.

doi:10.3389/fncom.2021.676637

ORIGINAL RESEARCH article

Front. Comput. Neurosci., 08 June 2021

Volume 15 - 2021 | https://doi.org/10.3389/fncom.2021.676637

Dynamics of the Auditory Continuity Illusion

Qianyi Cao^†

Noah Parks^†

Joshua H. Goldwyn^*

Department of Mathematics and Statistics, Swarthmore College, Swarthmore, PA, United States

Illusions give intriguing insights into perceptual and neural dynamics. In the auditory continuity illusion, two brief tones separated by a silent gap may be heard as one continuous tone if a noise burst with appropriate characteristics fills the gap. This illusion probes the conditions under which listeners link related sounds across time and maintain perceptual continuity in the face of sudden changes in sound mixtures. Conceptual explanations of this illusion have been proposed, but its neural basis is still being investigated. In this work we provide a dynamical systems framework, grounded in principles of neural dynamics, to explain the continuity illusion. We construct an idealized firing rate model of a neural population and analyze the conditions under which firing rate responses persist during the interruption between the two tones. First, we show that sustained inputs and hysteresis dynamics (a mismatch between tone levels needed to activate and inactivate the population) can produce continuous responses. Second, we show that transient inputs and bistable dynamics (coexistence of two stable firing rate levels) can also produce continuous responses. Finally, we combine these input types together to obtain neural dynamics consistent with two requirements for the continuity illusion as articulated in a well-known theory of auditory scene analysis: responses persist through the noise-filled gap if noise provides sufficient evidence that the tone continues and if there is no evidence of discontinuities between the tones and noise. By grounding these notions in a quantitative model that incorporates elements of neural circuits (recurrent excitation, and mutual inhibition, specifically), we identify plausible mechanisms for the continuity illusion. Our findings can help guide future studies of neural correlates of this illusion and inform development of more biophysically-based models of the auditory continuity illusion.

1. Introduction

How do listeners in crowded and noisy environments create stable auditory streams in the face of interruptions and “background” noise? How do listeners identify the stops and starts of overlapping and interwoven sounds to correctly parse an auditory scene? Answering these questions is fundamental to understanding auditory perception and neural processing of sounds. A perceptual illusion that sheds light on dynamic processing of multiple sounds is the auditory continuity illusion (Bregman, 1990) (also called temporal induction; Warren, 2008). The continuity illusion can be elicited when noise interrupts a variety of sounds including: tones, frequency glides, and sentences (Bregman, 1990; Warren, 2008); complex tones Plack and White (2000); and sound textures (McWalter and McDermott, 2019). The common aspect of this illusion is that, when the noise is sufficiently loud and shares spectral content with the interrupted signal, listeners perceive the signal as continuous and uninterrupted. This illusion reveals a tendency for the auditory system to maintain perceptual continuity when confronted with sudden changes in the auditory scene and to sustain perception of sounds that are present prior to some masking or distracting sound. The continuity illusion has been thoroughly studied since its discovery (Miller and Licklider, 1950; Warren, 1970).

The version of the continuity illusion we investigate is an interrupted tone that can be perceived as continuous if the interruption (a short interval in which no tone is presented) is filled with broadband noise, as depicted in Figure 1. Results of listening experiments inform conceptual models for the illusion such as Bregman's theories of auditory scene analysis (Bregman, 1990). Fundamental questions remain about the perceptual origin and neural basis for the illusion [ that is, whether it is a cortical phenomenon (Husain et al., 2005; King, 2007; Petkov et al., 2007) or is created subcortically (Bidelman and Patro, 2016), and whether it is required that peripheral responses signal discontinuities in the sound (Riecke et al., 2012)]. Perhaps due to these uncertainties, there have been few efforts to model how the dynamic activity of neural populations can generate the continuity illusion. Previous works point toward several possible neural mechanisms that can give rise to the illusion: feedforward intra-cortical connections (Husain et al., 2005), nonlinear dynamic self-excitation (Noto et al., 2016), short-term synaptic plasticity (Vinnik et al., 2010), and a hierarchy of neural subpopulations integrating information on multiple time scales that are modulated by top-down feedback (Balaguer-Ballester et al., 2009). These few and disparate studies are motivation for further model-based investigations of the continuity illusion.

FIGURE 1

Figure 1. Illustration of the continuity illusion. (A) A tone interrupted by a silent gap (top panel) is perceived (correctly) as two tones discontinuous in time (bottom panel). (B) If a weak noise burst is inserted into the gap between tones, the tone remains perceived as discontinuous. (C) A sufficiently loud noise that shares frequency content with the tone can induce the illusion that the tone is uninterrupted and persists through the noise-filled interruption.

Our goal is to identify dynamical mechanisms that implement fundamental principles of the illusion. We are informed by the work of Petkov and colleagues who obtained evidence that two types of neural responses participate in the continuity illusion: sustained responses that signal ongoing sounds and transient responses that signal acoustic edges in time (onsets and offsets of sounds) (Petkov et al., 2007). While they specifically identified these response types in auditory cortex, we view these as common response motifs that are found throughout the auditory pathway (see Kopp-Scheinpflug et al., 2018 for a review of offset auditory neurons). In addition, we seek to connect Bregman's principles, gained over decades of careful observation, to fundamental features of neural dynamics. Specifically, we present dynamical explanations for two of Bregman's “rules” that define the circumstances under which the illusion will occur (Bregman, 1990). We paraphrase these rules as:

Sufficiency of Evidence Rule: There must be some neural activity during the interruption that is indistinguishable from what would have occurred if the tone had continued through the noise-filled interruption.

No Discontinuity Rule: There must be no evidence that the tone shuts off during the noise-filled interruption.

We use a nonlinear, dynamical firing rate model (Miller, 2018) with sustained inputs to implement the Sufficiency of Evidence rule. The dynamical principle at work is hysteresis: the interrupting noise (a broadband sound) provides a partial amount of excitatory input to an idealized neural population. Recurrent excitation enables the noise to maintain firing activity if the population is already active, even though the noise alone cannot activate an inactive population. We then adjust excitability in the model so that it has two stable states that coexist in the absence of sustained inputs (bistability). We use this configuration to show that transient inputs at tone onsets and offsets implement the No Discontinuity rule. Continuity occurs when noise suppresses the offset response at the end of the first tone. Without this sufficiently strong offset response, the population remains active despite the absence of any ongoing input during the interruption between tones. Finally, we configure the model to receive both input types. In this setting, the model utilizes both hysteresis and bistability to create dynamics consistent with the continuity illusion. This formulation of using persistent neural activity as an indicator of perceptual state, instantiated as an attractor in a dynamical system, resembles ideas common to studies of the neural basis for working memory (Brody et al., 2003).

We illustrate our dynamical explanation for the continuity illusion with simulation results and we emphasize, throughout, analytical and geometric insights gained by computing equilibrium solutions to the nonlinear differential equation that governs the firing rate dynamics. By identifying dynamical principles that are consistent with the continuity illusion, we illuminate how a long-standing perceptual theory (Bregman's “rules”) can be embodied with an idealized neural population. Furthermore, the postulated roles of bistability and hysteresis in the continuity illusion can guide how future, more biophysically-detailed modeling studies of this phenomenon should be constructed.

2. Materials and Methods

2.1. Firing Rate Model of a Neural Population

We model sound-evoked activity of a neural population using a firing rate equation with recurrent excitation (Miller, 2018). The differential equation that describes the dynamics of the neural activity is

\begin{array}{l} τ x^{'} = - x + f (a_{E} x + I (t)) & (1) \end{array}

where x(t) is the firing rate variable that takes unitless values between 0 (population inactive) and 1 (population active). The value of x(t) can be interpreted in a mean-field sense as the proportion of active neurons in a population or as the instantaneous probability of firing for neurons in the population. Frequency-tuning, known as tonotopy, is a central organizing feature of the auditory pathway (Pickles, 2012). Since we are studying responses to pure tone inputs (or a pure tone interrupted by broadband noise), we define x(t) as the activity of a population tuned to the tone frequency. We associate values of x(t) near 1 (active population) with perception of the tone at time t. The parameter τ is a time constant of the firing rate dynamics and a_E is the strength of recurrent excitation within the population. The nonlinear input-output function f has the sigmoidal form

\begin{array}{l} f (u) = \frac{1}{1 + e^{- (u - m) / k}} . & (2) \end{array}

Since we are working with unitless quantities, we use k = 1 throughout without loss of generality. We explain our choices for the half-maximum parameter m in more detail below.

The input term I(t) represents sound-driven inputs to the population. It consists of four components, some of which are sustained inputs (piecewise constant for the duration of the sound) and some are transient inputs (exponentially-decaying following tone onsets or offsets). Our classification of inputs as either sustained or transient, and our description of their dynamics as either constant or exponentially-decaying, are caricatures of actual inputs. Nevertheless, these response types are observed in auditory cortex (Wang et al., 2005) and in other nuclei (e.g., Pfeiffer, 1966). Moreover, we are motivated by observations of Petkov et al. (2007) who proposed that sustained and transient response types in primary auditory cortex are possible neural correlates of the continuity illusion.

The general form of I(t) is

\begin{array}{l} I (t) = I_{s u s t a i n} (t) + I_{o n s e t} (t) - I_{o f f s e t} (t) - I_{i n h i b} (t) . & (3) \end{array}

We consider three input scenarios: tone-only (tone on for 1 s and no noise), masking (tone and noise presented together for 1 s), and continuity (a pair of 1-s long tones separated by a half-second noise-filled gap). Figure 2 illustrates the forms of each of these inputs across these different scenarios. To be clear, tones and noise are represented by idealized waveforms (piecewise constant for sustained inputs, exponential decay for transient inputs). We do not use sinusoidal waveforms for tones or stochastic processes for noise. We next describe the forms and physiological rationales for each of these components. As we will make clear in section 3.1, some of these terms are omitted depending on the model configuration.

FIGURE 2

Figure 2. Inputs to the firing rate model. Sustained tone-driven excitatory input (row 1), tone onset excitatory input (row 2), tone offset inhibitory input (row 3), and sustained noise-driven inhibitory input (row 4). The three input scenarios considered are tone alone (I_N = 0, column 1), masking (tone and noise presented simultaneously, column 2), and continuity (tones separated by a noise-filled gap, column 3). Effects of noise are to increase I_sustain (see B1 and C1), suppress transient responses I_onset and I_offset (if noise is on at relevant onset and/or offset (see B2,B3,C2,C3), and increase I_inhib (see B4,C4). The plot of I_inhib uses x = 0, the amplitude of this input decreases for larger values of x (as indicated by arrow in B4 and C4, see Eq. 8. Parameter values used are I_T = 2 and I_N = 2 (for masking and continuity), α = 0.168, and β = 2/3. These are the α and β values used in Models 1 and 2, respectively. See text for details.

The sustained input I_sustain(t) is a piecewise-constant function that is positive when a tone is on and zero otherwise. This term represents the frequency-tuned excitatory input to the tone population. We use a unitless parameter I_T to represent tone strength and set I_sustain(t) equal to I_T in the absence of noise (tone-only input, Figure 2A1). As discussed in section 3.1, we construct all models so that I_T = 1 is the threshold for activation of the firing rate variable (in the absence of noise). Although I_T is unitless, sound levels only vary over a finite range in behaviorally-relevant situations, so we restrict I_T to not exceed a maximum tone level that we set (arbitrarily) at I_T,max = 5. As a broadband signal, noise also has energy in the frequency channel to which the tone population is tuned. We denote noise strength by I_N (takes values between 0 and I_{N, max} = 10) and allow the sustained excitatory term I_sustain(t) to increase with I_N, see Figures 2B1,C1 for illustrations of I_sustain(t) in the masking and continuity scenarios. This input is

\begin{array}{l} I_{s u s t a i n} (t) = I_{T} 𝟙_{t o n e} + α I_{N} 𝟙_{n o i s e} & (4) \end{array}

where 𝟙_tone is an indicator function that takes the value one if a tone is on at time t and zero otherwise, 𝟙_noise is defined similarly for the noise input, and α scales the contribution of noise to this excitatory, frequency-tuned input to the tone population.

The onset input I_onset(t) and offset input I_offset(t) are illustrated in the second and third rows of Figure 2. These are exponentially-decaying terms triggered by tone onsets and offsets, respectively. The minus sign before I_offset in Equation (3) indicates that offset responses are inhibitory. Inhibitory offset neurons have been found in the auditory system including neurons in the superior paraolivary nucleus (Oliver, 2005; Kopp-Scheinpflug et al., 2018) and among parvalbumin-expressing interneurons in auditory cortex (Keller et al., 2018). We use these transient inputs to represent the salience of the onsets and offsets of a tone or, in other words, the sharpness (in time) of acoustic edges. We therefore use tone strength (I_T) to set the initial amplitudes of these transient inputs. We suppose that noise obscures the sharpness of these acoustic edges (in time) and thus noise strength (I_N) reduces I_onset(t) and I_offset(t) (if the noise is on at the time of tone onset and/or offset). The form of the onset input for a tone that starts at time t_on is

\begin{array}{l} I_{o n s e t} (t) = γ_{o n} A_{o n} e^{- (t - t_{o n}) / τ} H (t - t_{o n}) & (5) \end{array}

The offset input is defined analogously. We choose the decay time constant τ for transient responses to match the time scale of the firing rate dynamics. The function H is the Heaviside function (also known as a step function) that is 0 if its argument is <0 and 1 if its argument is >0. We use it here to indicate that the transient onset response begins at tone onset (t_on). The constant A_on gives the initial amplitude of the onset input. As described above, it increases with I_T and decreases with I_N and takes the form:

\begin{array}{l} A_{o n} = {\begin{array}{l} γ_{o n} I_{T} & if noise is off at tone onset \\ γ_{o n} {[I_{T} - β I_{N}]}_{+} & if noise is on at tone onset \end{array} & (6) \end{array}

where [·]₊ is the rectifier operation defined as [u]₊ = max(0, u). The parameter γ_on scales the onset response so that tone threshold is always I_T = 1. We detail the calculation of γ_on in section 3.1. In our analysis of this model we will make use of the fact that this exponentially-decaying input can also be described by I_inhib(t) = γ_ons where s is a dynamical variable that follows a first-order, linear, ordinary differential equation:

\begin{array}{l} for t < t_{o n} : s = 0 \\ for t \geq t_{o n} : τ s^{'} = - s, with s (t_{o n}) = A_{o n} & (7) \end{array}

The final input term in Equation (3) is the sustained inhibitory term I_inhib(t). This term represents a noise-driven subpopulation (I_inhib increases with I_N) and it directly inhibits the tone population (note the minus sign in Equation 3). This term can reflect processes as early as the auditory nerve, where it is known that noise suppresses responses to tones (Costalupes et al., 1984). It can also be thought to represent lateral inhibition (“lateral” in the sense that neural responses can be suppressed by sounds with spectral content away from their best frequency). Lateral inhibition is evident in early auditory stages (Ehret and Merzenich, 1988; Rhode and Greenberg, 1994) and has also been observed in auditory cortex (Kato et al., 2017) and suggested to exist there by modeling work (de la Rocha et al., 2008). We suppose, further, that there is mutual inhibition between the tone-driven population and the noise-driven inhibitory population that we implement with a multiplicative factor (1 − x). For analytical convenience, we make the approximation that the dynamics of the noise-driven inhibitory population are much faster than x(t) so that I_inhib(t) can be described as evolving instantaneously to an x-dependent steady state of the form:

\begin{array}{l} I_{i n h i b} (t) = a_{I} I_{N} (1 - x) 𝟙_{n o i s e} & (8) \end{array}

where the parameter a_I determines the strength of this inhibitory input to the tone population. Time courses of I_inhib(t) for the three input scenarios are shown in the fourth row of Figure 2.

2.2. Numerical Simulations

All calculations were carried out using the scientific computing software MATLAB (The MathWorks, Inc.). The firing rate differential equation (Equation 1) was solved numerically using ode15s. Simulation code is available for download and use at https://github.com/jhgoldwyn/ContinuityIllusion. Nonlinear equations (to compute equilibrium states, for example) were solved using root-finding functions in Matlab such as fzero and fsolve.

3. Results

We begin, in section 3.1, by characterizing different dynamical regimes of the model and determining appropriate parameter choices (a_e and m in Equations 1 and 2). We identify three regimes that can exhibit the continuity illusion and analyze models drawn from each of these regimes. In section 3.2, we study a model with firing rate dynamics that exhibit hysteresis and can be activated by sustained inputs alone. In section 3.3, we study a model with bistable firing rate dynamics that can be activated by transient inputs alone. In section 3.4, we study a model in which the hysteresis and bistable features are both operative and the firing rate can only be activated by combinations of sustained and transient inputs. In each subsection, we describe how these models respond to a tone alone (without noise), a tone and noise presented together (masking), and two tones separated by a noise-filled gap (continuity). We derive activation threshold criteria for each model and stimulus type and describe how parameter values affect activation thresholds. Our primary observation is that all models can support continuity dynamics (firing activity persists through a noise-filled gap between tones), but that this is accomplished differently using the mechanisms of hysteresis and bistability. As described above, and following Petkov et al. (2007), we propose that sustained transient inputs to the model implement Bregman's Sufficiency of Evidence rule and transient inputs implement the No Discontinuity rule. While two of the models can implement these rules in isolation (sustained inputs only in section 3.2, transient inputs only in section 3.3), the significance of the final model (section 3.4) is that it succeeds at implementing both of these rules together. Firing activity in that model can persist during a noise-filled gap between tones because noise has two effects: it contributes excitatory inputs during the interruption (sufficient evidence), and prevents the offset response from inactivating the population (no discontinuity).

3.1. Model Classification by Firing Rate Equilibria

A general analysis of the equilibrium states of the model for the case of sustained tone input only (I_onset(t) = I_offset(t) = 0 for all t and I_N = 0) informs our parameter choices and is a starting point for subsequent analyses. Setting x′ = 0 and I(t) = I_T in Equation 1) and solving for I_T we find the equilibrium relation

\begin{array}{l} I_{T} (x) = m - ln (\frac{1}{x} - 1) - a_{E} x . & (9) \end{array}

The parameters a_E and m determine the shape of the firing rate equilibrium curve. We show three representative examples in Figure 3A, with equilibrium firing rates plotted on the vertical axis and I_T plotted on the horizontal axis. The key features of these curves are: whether they are S-shaped with left and right “knees” and, if so, the values of I_T at these knees. Recall that the firing rate variable x takes values between zero and one. We interpret equilibrium solutions x ≈ 0 as inactive states (no perception of tone) and equilibrium solutions x ≈ 1 as active states (perception of tone).

FIGURE 3

Figure 3. Parameter space is partitioned according to shape of firing rate equilibrium curves. (A) Equilibrium firing rates for tone-only inputs (I_N = 0). Two stable branches (solid) are separated by a branch of unstable equilibria (dotted). Gray line at right indicates the maximum tone level used in simulations and analysis (I_T,max = 5). (B) Parameter space for parameters in Equations (1) and (2). Colored dots correspond to the curves in (A). Parameter space is partitioned according position of left and right knees of the firing rate equilibrium curves (see Table 1 for details).

We summarize the possible scenarios in Table 1 and show how they partition the a_E-m plane in Figure 3B. The scenarios that interest us must satisfy three criteria. First, to clearly distinguish between active and inactive states, the firing rate equilibrium curve must be S-shaped. This rules out Region I. Second, the left knee must be located at an I_T value less than the maximum tone strength (I_T,max = 5) so that activation is possible. This rules out Region II. Lastly, the right knee must be located at a positive I_T value so that inactivation is possible. This rules out Region V, in which the population could remain active at all times, even without inputs.

TABLE 1

Table 1. Regions of a_E -m parameter space classified by positions of left and right knees of firing rate equilibrium curves.

As detailed in Table 1, the number and locations of the knees of the firing rate equilibrium curve delineate these regions. The knees are saddle node bifurcation points in the firing rate dynamics (points at which stable and unstable equilibria appear or disappear). They are located at critical points of I_T(x), so we identify these points by differentiating Equation (9) with respect to x and setting the resulting expression to zero. The x values at the left and right knees are

\begin{array}{l} x_{L} = \frac{1}{2} (1 + \sqrt{1 - 4 / a_{E}}) \\ x_{R} = \frac{1}{2} (1 - \sqrt{1 - 4 / a_{E}}) . & (10) \end{array}

We parameterize a_E and m by selecting values for I_T(x_L) and I_T(x_R) and then solving the resulting two nonlinear equations for I_T obtained from Equation (9). No real solutions exist for a_E < 4, thus a_E = 4 marks the boundary of models without S-shaped firing rate equilibrium curves (region I in Figure 3B). The regions of interest for this study (III and IV) are determined by the positions of the knees in the firing rate equilibrium curve. In particular, region IV consists of (a_E, m) parameter pairs for which the model is bistable in the absence of any inputs (the left knee is at a negative I_T value and the right knee is at a positive I_T value). Region III consists of models that are monostable with no inputs (left knee is at a positive I_T value). We distinguish between two subregions of Region III. In Region IIIa, I_T at the right knee is less than the maximum tone value, and thus sustained inputs alone can activate the neural population. In contrast, in Region IIIb the I_T value at the right knee is larger than the maximum tone strength and thus models in this region can only be activated by a combination of sustained and transient inputs.

From these considerations, we choose three models (one from each region of interest) and use these in all further analysis and simulations.

Model 1 (a_E = 5.9, m = 3.6, in Region IIIa). The left knee is at I_T = 0.2 and the right knee is at I_T = 1. As we describe below, the essential dynamical feature of this model is hysteresis: in response to sustained inputs the system requires a higher tone level to activate than deactivate (the S-shape of the firing rate equilibrium curve creates a mismatch between the two knees). We use sustained inputs only for this model (I_onset(t) = I_offset(t) = 0 in Equation 3 for all t).

Model 2 (a_E = 10.5, m = 5.2, in Region IV). The left knee is at I_T = −2 and the right knee is at I_T = 2. As we describe below, the essential dynamical feature of this model is bistability. We use transient inputs to move the system between active and inactive states and do not include sustained inputs (I_sustain(t) = I_inhib(t) = 0 in Equation 3 for all t).

Model 3 (a_E = 12.7, m = 9.5, in Region IIIb). The left knee is at I_T = 0.2 and the right knee is at I_T = 6. Neither sustained inputs alone nor transient inputs alone can activate models in Region IIIb, so we use all inputs types in Equation 3 for this model.

3.2. Model 1: Sustained Inputs and Hysteresis Dynamics Implement the Sufficiency of Evidence Rule

3.2.1. Response Dynamics to Tone-Only Inputs

Recall that Model 1 includes sustained inputs only (I_onset(t) = I_offset(t) = 0 for all t). The equilibrium solutions for this model for tone inputs only (I_N = 0) are shown in Figure 4A1 with I_T as the bifurcation parameter (horizontal axis). When there is no tone input (I_T = 0), this system has a single stable equilibrium in the inactive state. For stronger tone inputs, the system passes through a saddle point bifurcation point at which a second stable equilibrium is created in the active state. Activation of the population from rest requires the tone to be larger, namely that I_T > I_R where I_R is the tone level at the right knee of the firing rate equilibrium curve. At this second saddle node bifurcation point, the inactive state is abolished and only the active state remains as the unique stable and globally attracting fixed point for the system. Responses to subthreshold and suprathreshold tone inputs are shown in Figure 4A2.

FIGURE 4

Figure 4. Dynamics of Model 1 responses to tone and noise inputs. (A) Tone inputs (no noise) with tone level I_T = 0.5 (blue) and I_T = 1.5 (red). Equilibrium firing rates shown in (A1) and time-courses of firing rate variable x(t) in (A2). Firing rate equilibrium in (A1) (tone only) is re-plotted in gray in (B1,C1). (B) Simultaneous tone and noise inputs (masking condition) with I_N = 1. Noise shifts right knee of firing rate equilibrium curve to larger I_T levels (B1) and prevents activation by tone inputs (I_T = 0.5 and I_T = 1.5 shown). (C) Interrupted tone with noise-filled gap (continuity condition) with I_T = 1.5. Noise shifts left knee of firing rate equilibrium curve to smaller I_T levels (C1). Continuity occurs if left knee crosses I_T = 0 axis. Time-courses of firing rate variable in (C2) for no noise (blue) and noise above continuity threshold (I_N = 8, red). (D) Masking threshold and continuity thresholds. (E) Masking and continuity thresholds at maximum tone level, for varying parameter a_I and α parameter values. C(I_T,max) shown as a color map, contour lines show M(I_T,max) (values labeled above contours). Parameter values used in (A–D): a_I = 1.124, α = 0.168, shown as black dot in (E).

The feature of this model that is essential in our study of the continuity illusion (discussed below) is that it exhibits hysteresis dynamics. By this we mean that the tone level that activates an inactive population is larger than the tone level that maintains an already active population in the active state. Hysteresis is seen geometrically in the S-shaped firing rate equilibrium curve I_T(x) (Figure 4A1). The activation threshold is the tone strength at the right knee. The deactivation threshold—the minimum tone level that maintains activity—is the tone strength at the left knee. For Model 1, we positioned these knees at I_T = 1 (activation) and I_T = 0.2 (deactivation). As an example of the hysteresis effect: tone input with I_T = 0.5 does not activate x(t) from rest (blue curve in Figure 4A2) but it would maintain x(t) at a level near 1 if x(t) were active prior to this input (not shown, but notice the upper branch of equilibria in Figure 4A1 extends to I_T values <0.5).

3.2.2. Response Dynamics to Tone Masked by Noise

The two effects of noise are that it provides a “partial” input that enters as the additive term αI_N in I_sustain(t) and it drives inhibition through the term I_inhib = a_II_N(1 − x), recall Equations 4 and 8). The effect of the inhibitory term is to shift the right knee of the firing rate equilibrium curve to larger I_T levels (Figure 4B1, compare black curve with noise to gray curve without noise). That is, the threshold for activation increases with I_N, as desired for noise to have a masking effect. As a demonstration, a tone level of I_T = 1.5 would activate the population from rest in the absence of noise, but with I_N = 1 this input does not activate the population (red curve in Figure 4B2).

The equilibrium solutions for this model, using I_T as the bifurcation parameter and now including the effect of noise, are

\begin{array}{l} I_{T} (x, I_{N}) = m - ln (\frac{1}{x} - 1) - a_{E} x + a_{I} I_{N} (1 - x) - α I_{N} . & (11) \end{array}

The critical points at which stable equilibria are created and abolished are the knees of the S-shaped curve. We obtain results similar to Equation (10), but now with noise included:

\begin{array}{l} x_{L} (I_{N}) = \frac{1}{2} (1 + \sqrt{1 - 4 / (a_{E} + a_{I} I_{N})}) \\ x_{R} (I_{N}) = \frac{1}{2} (1 - \sqrt{1 - 4 / (a_{E} + a_{I} I_{N})}) . & (12) \end{array}

The x_R(I_N) point locates the threshold at which a tone activates a population from rest, in the presence of noise. The threshold for masking, then, is the noise level that solves I_T = I_R(I_N). Any value of I_T smaller than this critical value would fail to activate the population due to the noise-driven inhibition. We denote this masking threshold tone level as M(I_N). It depends nonlinearly on I_N because of the dependence of x_R on I_N. For Model 1, however, x_R is relatively constant with respect to I_N and we find masking threshold as a function of I_T is approximately linear (gray curve in Figure 4D). The slope of this nearly linear relation can be approximated by ${[a_{I} (1 - x_{R} (0)) - α]}^{- 1}$ (found by differentiating I_T with respect to I_N, and neglecting any change in x_R with respect to I_N). This approximation shows the opposing effects of noise on masking threshold: M(I_N) increases with increasing α (the amount of excitatory noise input) and decreases with increasing a_I (the amount of inhibitory noise-driven input; see Figure 4E). This relation also imposes a constraint on model parameters. We must have α < a_I [1 − x_R(0)]. If this condition is not satisfied, the excitatory effect of noise [the αI_N term in I_sustain(t)] would dominate the inhibitory effect of noise (I_inhib) and masking would not be possible.

3.2.3. Response Dynamics to Tones Interrupted by Noise

The x_L point locates the threshold for inactivation in the presence of noise. A population in the active state will remain active during the gap between tones if the noise strength causes I_T(x_L) to cross over to negative values (see Figure 4C1, and also time-courses of x(t) in Figure 4C2). Thus, we calculate the continuity threshold equation by solving (numerically) the root-finding problem I_T(x_L, I_N) = 0, where x_L is also a function of noise level (Equation 12). Observe that, since I_T = 0 during the gap between tones, the continuity threshold is constant with respect to tone level, as shown by the horizontal black line in Figure 4D.

The masking and continuity thresholds for this model are separate. This imposes additional constraints on our parameter choices. Specifically, in accordance with the hypothesis that the continuity illusion is a compensation for masking (Warren et al., 1972; Warren, 2008), we require that continuity can only occur at noise levels at least as high as the masking threshold. Additionally, we are only interested in parameter sets for which masking and continuity can both be achieved (continuity and masking thresholds must not exceed I_{N, max} = 10, even for the maximum tone level I_T,max = 5). A view of the a_I − α parameter region that satisfies these requirements is shown in Figure 4H, with labeled contour lines indicating the corresponding masking and continuity thresholds at the maximum tone level.

3.3. Model 2: Transient Inputs and Bistable Dynamics Implement the No Discontinuity Rule

3.3.1. Response Dynamics to Tone-Only Inputs

Model 2 receives transient inputs only (I_sustain(t) = I_inhib(t) = 0 for all t). These transient inputs occur at tone onsets and offsets and signal discontinuities at the “edges” of a tone. To analyze dynamics of this model at tone onset we find it useful to formulate it as a system of two ordinary differential equations:

\begin{array}{l} τ x^{'} = - x + f (a_{E} x + γ_{o n} s) \\ τ s^{'} = - s . & (13) \end{array}

The additional variable s describes the exponentially-decaying transient input, as introduced in Equation (7). At tone onset, this variable is instantaneously displaced to s(t_on) = [I_T − βI_N]₊. Similar dynamics occur at tone offset, with γ_on replaced by γ_off and s(t_off) = −[I_T − β_IN]₊. Notice s(t_off) is negative-valued because offset responses are inhibitory inputs to x.

We configured Model 2 so that it is bistable in the absence of any inputs. In the x-s phase plane, bistability comprises stable equilibria at inactive and active firing rates (x_I and x_A, respectively) separated by an unstable saddle point (x_S). All equilibria are located along the x-axis. Activation of the population from rest requires that the transient onset response is sufficiently large to transition x(t) from the basin of attraction of x_I to the basin of attraction of x_A. This condition can be visualized in the phase plane by considering the separatrix curve S(x) that divides these two basins of attraction. The firing rate will activate from rest if s(t_on) > S(x_I), that is if the response variable at tone onset exceeds the height of the separatrix curve evaluated at the inactive firing rate equilibrium. We observed that the separatrix curve can be adequately approximated as a line connecting the saddle point (x_S, 0) to the point (x_I, 1). The choice of s = 1 at threshold enforces our convention that activation for tone-only inputs occurs at I_T = 1. From this geometric argument, the linear approximation to the separatrix is

\begin{array}{l} S (x) = \frac{x_{S} - x}{x_{S} - x_{I}} & (14) \end{array}

Trajectories in the phase plane at tone onset are shown in Figure 5A1 and their full time-courses are shown in Figure 5A2. If the tone level is sufficiently high (I_T = 1.2 in this simulation, red curve), the system crosses the separatrix and transitions to the stable equilibrium in the active state.

FIGURE 5

Figure 5. Dynamics of Model 2 responses to tone and noise inputs. (A) Tone inputs (no noise). Phase plane at tone onset, showing stable equilibria (black dots), unstable saddle (gray circle), linear approximation to separatrix (gray line), and trajectories for I_T = 0.8 (blue curve) and I_T = 1.2 (red curve). Time-course of firing rate variable x(t) shown in (A2). (B) Simultaneous tone and noise inputs (masking condition) with I_T = 3. Phase plane shown at tone and noise onset. Sufficiently large noise suppresses the transient onset response and can prevent activation for sufficiently large noise (I_N = 3.2, red curve). (C) Interrupted tone with noise-filled gap (continuity condition) with I_T = 3. Phase plane shown at noise onset. Sufficiently large noise suppresses tone offset and can prevent return to inactive state (I_N = 3, red curve). (D) Masking threshold and continuity thresholds are equal for our choice of parameters (stable equilibria are equidistant to unstable saddle). Tone and noise levels shown in legend at bottom right. Parameter values used: β = 2/3, γ_on = γ_off = 5.2. Filled circles are thresholds computed from simulations and the solid line is the approximate threshold computed from linearized system (Equation 18).

This approximation to the separatrix can also be found by linearizing the dynamical system in Equation (13) about the saddle point and determining the eigenvector associated with its stable manifold. The Jacobian matrix for the system is

\begin{array}{l} J (x, s) = [\begin{matrix} (- 1 + a_{e} f^{'} (a_{E} x + γ_{o n} s)) / τ & γ_{o n} f^{'} (a_{E} x + γ_{o n} s) / τ \\ 0 & - 1 / τ \end{matrix}] & (15) \end{array}

The negative eigenvalue for the saddle point is the lower right entry of this matrix. The associated eigenvector satisfies (J₁₁ − J₂₂)x + J₁₂s = 0, where J_ij is the (i, j) entry of the Jacobian matrix so we conclude that the linear approximation to the separatrix is

\begin{array}{l} S (x) = (\frac{J_{22} - J_{11}}{J_{12}}) (x - x_{S}) . & (16) \end{array}

A useful consequence of our assumption that x and s have the same time constant τ is that this expression can be simplified substantially to:

\begin{array}{l} S (x) = \frac{a_{E}}{γ_{o n}} (x_{S} - x) . & (17) \end{array}

Comparing this equation to the result obtained by geometric considerations (Equation 14), we observe that γ_on is determined by the parameters a_E and m, and our convention that tone threshold is I_T = 1. In particular, we set γ_on = a_E(x_S − x_I).

Deactivation is the mirror image of activation and occurs if the “downward” perturbation of s is sufficiently strong at the tone offset. For the values of a_E and m that we use, the stable equilibria are symmetric around the saddle point at x = 0.5 and it is convenient to set γ_on = γ_off so that the threshold for activation and deactivation are the same. More generally, the offset parameter should always be set so that activation thresholds are not less than deactivation thresholds, to avoid the scenarios in which a tone onset can activate the firing rate variable but the tone offset response is too weak to return the population. In this unrealistic case, x(t) could remain in the active state for perpetuity.

3.3.2. Response Dynamics to Tone Masked by Noise

If a sufficiently strong noise is presented at the same time as a tone, then the noise can prevent activation of the tone population by reducing the transient response at the start of the tone. This is the masking condition. Simulations exhibiting masking dynamics are in Figure 5B. Recall the effect of noise is to reduce the onset response to s(t_on) = [I_T − βI_N]₊ where β is a parameter that controls how much noise suppresses the tone onset (given in Equation 6). As we explained above, in our discussion of tone activation, the model is parameterized so that I_T = 1 is the threshold for activation in the absence of noise. Thus, a noise will mask a tone if I_T − βI_N < 1. We denote the threshold for masking as M(I_T) and conclude that it is related to tone strength via a linear equation:

\begin{array}{l} M (I_{T}) = \frac{1}{β} (I_{T} - 1), & (18) \end{array}

This equation is valid for values of I_T above the noise-free threshold (I_T = 1) and below the maximum tone strength in the model (I_T,max = 5). It provides a direct relationship between masking threshold and the degree to which noise masks tone onsets (represented by the parameter β). In the simulations shown in Figure 5) we use β = 2/3. The masking threshold curve is shown in Figure 5D.

3.3.3. Response Dynamics to Tones Interrupted by Noise

The feature of this model that is essential for our study of the continuity illusion is that the inactive and active states coexist and are stable in the absence of any inputs. The tone population can, therefore, remain in the active state even after a tone is turned off if the offset signal is weak and does not send x(t) across the separatrix. Simulations exhibiting continuity dynamics are in Figure 5C.

Whereas masking depends on suppression of the onset response (as described above), the continuity illusion depends on suppression of the offset response. Computation of the continuity threshold is analogous to our derivation of the masking threshold, but with onset terms replaced by offset terms. The criteria for continuity are that I_T > 1 (so that the first tone activates the population) and that I_N is sufficiently large to reduce the tone offset response and x(t) to remain near its upper equilibrium state. To satisfy this second condition, we must have that the offset response does not cross the separatrix curve, with the key difference being that we are now analyzing the system at the start of the noise-filled gap. This means x(t_off) = x_A (first tone has activated the population) and s(t_off) = −[I_T − βI_N]₊ (noise reduces offset amplitude). Adapting the separatrix equation in Equation (17) for the offset response, we have that continuity requires −[I_T − βI_N]₊ < a_E (x_S − x_A)/γ_off. The continuity threshold equation is, therefore,

\begin{array}{l} C (I_{T}) = \frac{1}{β} (I_{T} - \frac{a_{E}}{γ_{o f f}} (x_{S} - x_{A})) . & (19) \end{array}

In the particular case of Model 2, we have chosen parameters that make it symmetric (x_I and x_A are equidistant from the saddle point). We set γ_off = γ_on = a_E(x_S − x_I), so that the continuity and masking thresholds are identical (compare C(I_T) to Equation 18, and see threshold lines in Figure 5D).

More generally, we would require γ_off ≥ γ_on to avoid persistent activation (discussed above). If γ_off > γ_on, the continuity threshold would shift upward in Figure 5D (γ_off affects the intercept of C(I_T) but not its slope). In the intermediate I_N values between C(I_T) and M(I_T), we observe responses in which the first tone activates x(t), then x(t) deactivates during the noise-filled gap (no continuity, offset response too strong), and the second tone does not reactivate x(t) (second onset response too weak). In other words, the model without symmetry results in a region in stimulus space in which the noise burst between the two tones is too weak to induce the continuity illusion but sufficiently strong to prevent perception of the second tone by forward masking.

3.4. Model 3: Combined Inputs Implement Both Rules for the Continuity Illusion

The last model configuration we consider is one that cannot be activated by transient inputs alone or sustained inputs alone. These requirements are met if I_T at the left knee of the firing rate equilibrium curve is positive (no bistability at rest) and the I_T at the right knee is to the right of the maximum allowable tone level I_T,max. Activation from rest can only occur using a combination of sustained and transient inputs. There must be a sufficiently strong sustained input to move the system past the saddle node bifurcation point at the left knee of the firing rate equilibrium curve. This creates a stable equilibrium in the activated state that can be accessed if the transient portion of the input is sufficiently strong to transition the system into the basin of attraction of this upper equilibrium (see Figure 6).

FIGURE 6

Figure 6. Activation dynamics in Model 3 requiring sustained and transient inputs. (A) Firing rate equilibrium curve. Defining feature of Model 3 is a left knee at positive I_T level and right knee at I_T level larger than I_T,max. (B) Time-courses of firing rate variable for sustained input only (blue), transient input only (red), and combination of both inputs (yellow). Tone level is I_T = 1.5 in all cases.

To understanding firing rate responses for this model, we again formulate the dynamics in the x-s state space:

\begin{array}{l} τ x^{'} = - x + f (a_{E} x + {[I_{T} - β I_{N}]}_{+} - a_{I} (1 - x) I_{N} + γ s) \\ τ s^{'} = - s & (20) \end{array}

These equations govern the dynamics of the system in the time following a tone onset or offset. The parameter γ should be thought of representing γ_on when describing onset responses and γ_off when describing offset responses. The initial values for these equations are given by the state of the system immediately prior to sound onset or offset. For the case of tone onset for a system starting from rest (no input), for instance, the initial values would be x(t_on) = x_I(0, 0) and s(t_on) = 1, where x_I(0, 0) is the inactive state in the case of I_T = I_N = 0. We will use similar notation throughout this section to indicate that equilibrium points are functions of the input levels.

The Jacobian matrix for these equations is

\begin{array}{l} J (x, s) = [\begin{matrix} (- 1 + (a_{e} + a_{I} I_{N}) f^{'} (u)) / τ & γ f^{'} (u) / τ \\ 0 & - 1 / τ \end{matrix}] & (21) \end{array}

where we have abbreviated the argument of f′ with u = a_Ex + [I_T − βI_N]₊ − a_I(1 − x)I_N + γ_ons. As before, we evaluate the Jacobian at the saddle point (when it exists, for sufficiently large I_T) and use the eigenvector associated with the stable manifold of the saddle point to construct a linear approximation to the separatrix curve that defines the threshold for activation. The notable difference between the analysis in this section and the preceding section (for Model 2), is that the sustained input terms can affect the positions of the equilibrium solutions and the shape of the separatrix in the current model setting. Following our earlier calculation (recall Equation 16), we find the eigenvector by solving (J₁₁ − J₂₂)x + J₁₂s = 0. After simplifications, we find the linear approximation to the separatrix to be

\begin{array}{l} S (x; I_{T}, I_{N}) = (\frac{a_{E} + a_{I} I_{N}}{γ}) (x_{S} (I_{T}, I_{N}) - x) & (22) \end{array}

where we are assuming that I_T is sufficiently large so that the saddle point x_S(I_T, I_N) exists. In the remaining sections, we apply this result in the three cases we have been considering (tone only, simultaneous tone and noise, and tones with a noise-filled gap) to characterize activation by tone, masking, and continuity dynamics.

3.4.1. Response Dynamics to Tone-Only Inputs

Activation by a tone-only input (I_N = 0) occurs if the onset response causes the system to cross the separatrix defined in Equation (22). In particular, we consider the system starting from rest, with x(t_on) = x_I(0, 0) and input variable s instantaneously perturbed to s(t_on) = I_T. We then ask if this onset perturbation to s exceeds S(x(t_on)). We must also keep in mind that the position of the saddle point is determined by inputs, so in this case we use x_S(I_T, 0) in Equation (22). From these considerations we conclude that, to satisfy our convention that I_T = 1 is the tone threshold, we must set the onset parameter to γ_on = a_E [x_S(1, 0) − x_I(0, 0)]. Simulations showing activation by a tone are shown in Figure 7A. The phase portrait in Figure 7A1 illustrates the dynamics at tone onset. We remark that I_offset(t) is not necessary to move the system back to the inactive state. In this model, the return to the inactive state at the end of the tone is guaranteed in the tone-only case because the saddle point and upper equilibrium do not exist for I_T = 0. The firing rate variable must return to x_I(0, 0) because it is the unique, remaining stable equilibrium. This differs from Model 2 which required an offset response to deactivate x(t). We will see shortly, however, that the offset response does affect dynamics of the continuity illusion dynamics, and we will explore γ_off further in that setting.

FIGURE 7

Figure 7. Dynamics of Model 3 responses to tone and noise inputs. (A) Tone inputs (no noise) with tone level I_T = 1.2. Phase plane at tone onset, showing stable equilibria (black dots), unstable saddle (gray circle), linear approximation to separatrix (gray line) and trajectory (blue curve). Time-course of firing rate variable x(t) shown in (A2). (B) Simultaneous tone and noise inputs (masking condition) with I_T = 2 and I_N = 1.5. Phase plane shown at tone and noise onset. The active state still exists, but it is shifted rightward relative to tone only and slope of the separatrix is steeper. Noise prevents activation in this case and firing rate returns to inactive equilibrium. (C) Interrupted tone with noise-filled gap (continuity condition) with I_T = 2 and I_N = 1.5. Phase plane shown at noise onset. Noise preserves stable equilibrium. In this case, trajectory does not cross separatrix and system remains active during gap between tones. (D) Masking threshold and continuity thresholds. Parameter values used: a_I = 7, α = 0.5, β = 0.05, γ_on = 9.6, γ_off = 0.88. (E) Continuity threshold for varying γ_off parameter value. Black curve in (D,E) are identical. Filled circles in (D,E) are thresholds computed from simulations, solid lines are approximate threshold computed from linearized system (Equation 18).

3.4.2. Response Dynamics to Tone Masked by Noise

In the masking condition, tone and noise inputs are both present simultaneously and thus the saddle point about which we linearize the system depends on I_T and I_N. To determine whether a noise prevents activation by a tone, we are still interested in onset dynamics from rest. The initial firing rate is x(t_on) = x_I(0, 0). We use the separatrix equation (Equation 22) again to determine the masking threshold. We determined γ_on from the tone-only response and we use it to here to update the separatrix at tone onset:

\begin{array}{l} S (x; I_{T}, I_{N}) = (1 + \frac{a_{I}}{a_{E}} I_{N}) (\frac{x_{S} (I_{T}, I_{N}) - x}{x_{S} (1, 0) - x_{I} (0, 0)}) . & (23) \end{array}

This equation reveals the differing effects of sustained tone and noise inputs. The sustained tone input does not alter the slope of the linear approximation to the separatrix, it is the same as the slope found for the model with transient inputs only (Equation 16), but it can change the intercept through the dependence of x_S on I_T in the numerator. The sustained noise input, in contrast, changes the intercept and also the slope of the intercept (through the term a_II_N/a_E).

Trajectories in the phase plane and as firing rate time-courses showing masking of a tone by noise are in Figure 7B. In this example, the onset response exceeds the threshold for tone-only inputs (s(t_on) is above one in Figure 7B1). Nevertheless, the population returns to the inactive state because the noise input has raised the threshold for activation (in this case, primarily by steepening the slope of separatrix).

With Equation (23) in hand, we compute the masking threshold as the minimum I_N level (for a given I_T level) at which the onset response crosses the separatrix. To do this, we set the amplitude of the transient onset response to the height of the separatrix at the initial value. Thus, the masking threshold M(I_T) is the noise level that solves the nonlinear equation I_T − βI_N = S(x_I(0, 0); I_T, I_N). We solve this equation numerically and display the resulting masking threshold curve in Figure 7D, for selected parameter values.

3.4.3. Response Dynamics to Tones Interrupted by Noise

In contrast to the two cases just considered (activation by a tone alone, masking by noise), understanding the continuity illusion requires analysis of the offset response. We approximate the threshold, as usual, with the linear approximation to the separatrix that is given in Equation (22). We interpret γ in that equation as γ_off since we are concerned with the dynamics at the offset of the first portion of the tone. There is no tone (I_T = 0) during the noise-filled interruption between the tones, so the saddle point position is only a function of I_N. This highlights the fact that there are two necessary conditions for continuity dynamics in this model: the noise level must be sufficiently strong to preserve the saddle point and the stable active equilibrium, and the offset response must be sufficiently weak (perhaps because it is masked by the noise) to prevent the system from crossing the separatrix. The second of these conditions is satisfied if the offset response, which has initial amplitude −(I_T − βI_N), does not cross S(x_A; 0, I_N). A firing rate response that remains active when the tone is removed (consistent with the continuity illusion) is shown in the x-s phase plane in Figure 7C1 and as a time-course in Figure 7C2. The continuity threshold curve, C(I_T), which we calculate by solving for I_N the nonlinear equation −(I_T − βI_N) = S(x_A; 0, I_N), is shown in Figure 7D.

The offset parameter γ_off shifts the system between the two extreme cases represented by Models 1 and 2. If the offset parameter is small, then the occurrence of the continuity illusions relies on the hysteresis effect (the fact that I_N can preserve the saddle point and active equilibrium). In this case, the continuity threshold C(I_T) changes slowly with I_T (and, in fact, can be constant over a range of I_T values). In the extreme case of no offset response, the continuity threshold would be constant with I_T as was the case for Model 1 (see Figure 4D). If the offset parameter is large, then the offset response makes a larger contribution to whether the firing persists during the gap. The continuity threshold varies more with I_T as γ_off increases. These effects of γ_off on continuity threshold are shown in Figure 7E.

4. Discussion

The continuity illusion is an intriguing example of the capacity for the brain to “fill in” missing information. In this case, the “filling in” process creates an illusion of a continuous tone that is, in fact, discontinuous. In the context of hearing in a complex listening environment, a bias toward linking related sounds across time to create longer-lasting auditory objects can be useful when contending with multiple interrupting or distracting sounds that could momentarily obscure a sound of interest. Our contribution has been to use principles of neural dynamics and to draw on previous conceptual frameworks and experimental studies of the continuity illusion to identify dynamical mechanisms that can account for the continuity illusion.

We used a firing rate model of population-level neural activity to explore possible routes toward continuity illusion-like dynamics. A first requirement was that recurrent excitation be strong enough to create a stable equilibrium at a high firing rate level (Figure 3). We stereotyped the inputs to the population as sustained and transient. This setup was based on physiological evidence that neurons in auditory cortex with these response types are possible neural correlates for the continuity illusion (Petkov et al., 2007). We showed, using Model 3, that a firing-rate model can be constructed to require both input types to implement the continuity illusion.

Although the model could also be configured so that sustained inputs alone (Model 1) or transient inputs alone (Model 2) produce dynamics consistent with the continuity illusion, there are shortcomings in both cases that were remedied in Model 3. In the case of sustained inputs alone, continuity dynamics require a hysteresis effect: input levels that are too weak to activate the population from rest can, nevertheless, sustain activity in an already active population. The tone input makes no contribution to the response dynamics during the noise-filled interruption between the tones. As a result, the continuity threshold is constant with tone level (Figure 4D). This is inconsistent with evidence that the probability of perceiving the continuity illusion increases as the noise level becomes louder relative to the tone (Riecke et al., 2008). In the case of transient inputs alone, continuity dynamics require a population that is bistable in the absence of any inputs. If additional mechanisms were not at work (synaptic adaption in the recurrent excitatory connections, for instance), a bistable population could remain in a high firing rate state for a long period of time even when no stimulus is present. An additional limitation of this model is the requirement for symmetric onset and offset response dynamics. If deactivation requires a stronger input than activation (i.e., if the saddle point is at some x < 1/2), then there will exist a range of input strengths for which a tone that activates the population will not return the population to rest at the end of the tone. For the same reason, if activation requires a stronger input than deactivation (i.e., if the saddle point is at some x > 1/2), then in the interrupted-tone case a tone that activates the population may be too weak to reactivate the population following the noise-filled gap. While this type of response dynamic could be consistent with forward masking (Moore and Glasberg, 1983), it is inconsistent with the stimuli used to test for the continuity illusion (both tones, before and after the noise-filled gap, are typically audible).

Interestingly, the model configurations revealed that continuity thresholds can depend on tone level (I_T) in distinct ways. As I_T increased, continuity thresholds increased less for models in which sustained inputs dominated (Model 1) and more for models in which transient inputs dominated (Model 2), see also Figure 7E showing similar results for different offset strengths γ_off in Model 3. This outcome of the model could be examined with experiments that assess listeners' perception of the continuity illusion across a range of tone levels using inputs that vary the salience of acoustic edges (e.g., tones that ramp down before the noise-filled interruption; Bregman and Dannenbring, 1977).

Following Petkov et al. (2007), we view the different input populations (sustained, onset, and offset) as possible neural correlates of two aspects of the descriptive theory of the continuity illusion proposed by Bregman (Bregman, 1990). First, sustained responses convey “evidence” that a tone is ongoing. If some sustained neural activity persists during the interruption between two tones, then the tone may be perceived as continuous. This is Bregman's “Sufficiency of Evidence rule.” In our model, this is implemented by assuming that noise (as a broadband signal) drives a partial excitatory input to the population whose activity signals the perception of the tone. Second, transient responses at tone onsets and offsets mark “discontinuities” in the tone (acoustic edges). If the noise during the gap between tones obscures this discontinuity, then tone may be perceived as continuous. This is Bregman's “No Discontinuity rule.” In our model, this is implemented by assuming noise can reduce the amplitude of the transient response. An additional effect of noise (in Model 3) is to increase the threshold for activation by a transient input (effected by altering the separatrix, see I_N term in Equation 23).

By analyzing how these input types drive firing rate activity, we identified two dynamical mechanisms and circuit properties that can support the continuity illusion: 1) a hysteresis effect enabled the noise to convey “sufficient evidence” to sustain firing activity in the gap between tones, 2) an offset response that signaled “discontinuity” in an acoustic signal could be suppressed by noise to confined the system to the basin of attraction of the high firing rate stable equilibrium. These dynamical mechanisms (hysteresis and bistability) required sufficiently strong recurrent excitation, specifically the requirement that a_E > 4 for S-shaped firing rate equilibrium curves (Figure 3).

In addition to providing a dynamical framework to accompany Bregman's theory, our approach offers a new perspective on previous models of the continuity illusion. The work of Noto et al. (2016) relied on nonlinear, self-excitation of neural populations which is consistent with our approach for creating hysteresis and bistable dynamics. The work of Vinnik et al. (2010) utilized dynamic synapses to create continuous responses to interrupted tones. Since dynamic synapses are an additional mechanism for creating hysteresis and bistable dynamics (Barak and Tsodyks, 2007), there is also consistency between their work and ours. An extra feature of our study, not considered in these previous models of the continuity illusion, is that we also required that our simulated dynamics exhibit masking for noise and tones presented simultaneously. Incorporating this is essential, in our view, to avoid “trivial” solutions in which noise acts simply (and solely) as an excitatory input that facilitates neural activity during the noise-filled gap between tones. Balancing these excitatory and inhibitory effects of noise required fine-tuning in Model 1 (Figure 4E).

Although the firing rate framework is a caricatured description of neural activity, insights into a number of sensory illusions have been made using similar approaches including for visual bistability (Laing and Chow, 2002; Shpiro et al., 2007, 2009) and auditory streaming (Rankin et al., 2015, 2017; Paredes-Gallardo et al., 2019). In our study, it has been useful as a minimal model that identifies mechanisms that can account for the continuity illusion. We focused on a classic version of the continuity illusion (tone interrupted by noise), but the illusion can be elicited by a variety of sound types (Bregman, 1990; Warren, 2008; McWalter and McDermott, 2019). The facts that the continuity illusion is widespread and that the dynamics of the illusion can be explained with a relatively simple model may indicate that perceptual continuity of interrupted sounds is constructed at more than one stage of auditory processing. Indeed, while we were motivated by observations in auditory cortex (Petkov et al., 2007), the input motifs we used in our model (sustained, onset, and offset responders) are present in other auditory nuclei, and there is evidence for involvement of the brainstem in the continuity illusion in human listeners (Bidelman and Patro, 2016). We suggest future work could focus on the inferior colliculus (IC) as a possible origin of the continuity illusion. The IC is a midbrain structure that receives inputs from numerous brainstem regions including excitatory projections from the ventral cochlear nucleus (a site of sustained and onset neurons; Pfeiffer, 1966; Rhode and Smith, 1986) and inhibitory projections from the superior paraolivary nucleus (a site of offset neurons; Oliver, 2005; Cant and Oliver, 2018; Kopp-Scheinpflug et al., 2018). A critical component of our model was recurrent excitation that creates persistent activity. The IC, accordingly, has numerous, local connections (Ito and Malmierca, 2018) that may prolong the duration of post-synaptic potentials (Sivaramakrishnan et al., 2013) and increase firing rate responses to high sound intensities (Grimsley et al., 2013). As work continues to identify neural correlates of the continuity illusion, idealized dynamical systems descriptions can inform what features should be included in future, more biophysically-detailed modeling approaches.

Data Availability Statement

Matlab code for the firing rate model and to generate all figures in this study can be found in the github repository at https://github.com/jhgoldwyn/ContinuityIllusion.

Author Contributions

JG conceived of the study. All authors designed, implemented, and carried out simulations and analysis. All authors contributed to writing and editing the manuscript.

Funding

This research has been supported by Swarthmore College through the Deborah A. DeMott '70 Student Research and Internship fund (QC) and the Eugene M. Lang Summer Research Fellowship (NP).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

Computations were performed using MATLAB (RRID:SCR_001622), version R2020b.

References

Balaguer-Ballester, E., Clark, N. R., Coath, M., Krumbholz, K., and Denham, S. L. (2009). Understanding pitch perception as a hierarchical process with top-down modulation. PLoS Comput. Biol. 5:e1000301. doi: 10.1371/journal.pcbi.1000301

PubMed Abstract | CrossRef Full Text | Google Scholar

Barak, O., and Tsodyks, M. (2007). Persistent activity in neural networks with dynamic synapses. PLoS Comput. Biol. 3:e30104. doi: 10.1371/journal.pcbi.0030104

PubMed Abstract | CrossRef Full Text

Bidelman, G. M., and Patro, C. (2016). Auditory perceptual restoration and illusory continuity correlates in the human brainstem. Brain Res. 1646, 84–90. doi: 10.1016/j.brainres.2016.05.050

PubMed Abstract | CrossRef Full Text | Google Scholar

Bregman, A. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT Press. doi: 10.7551/mitpress/1486.001.0001

PubMed Abstract | CrossRef Full Text | Google Scholar

Bregman, A., and Dannenbring, G. (1977). Auditory continuity and amplitude edges. Can. J. Psychol. 31, 151–159. doi: 10.1037/h0081658

PubMed Abstract | CrossRef Full Text | Google Scholar

Brody, C. D., Romo, R., and Kepecs, A. (2003). Basic mechanisms for graded persistent activity: discrete attractors, continuous attractors, and dynamic representations. Curr. Opin. Neurobiol. 13, 204–211. doi: 10.1016/S0959-4388(03)00050-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Cant, N., and Oliver, D. (2018). “Chapter 2: Overview of auditory projection pathways and intrinsi microcircuits,” in The Mammalian Auditory Pathways, Vol. 65 of Springer Handbook of Auditory Research, eds D. Oliver, N. Cant, R. R. Fay, and A. N. Popper (Springer), 7–40. doi: 10.1007/978-3-319-71798-2_2

CrossRef Full Text | Google Scholar

Costalupes, J., Young, E., and Gibson, D. (1984). Effects of continuous noise backgrounds on rate response of auditory nerve fibers in cat. J. Neurophysiol. 51, 1326–1344. doi: 10.1152/jn.1984.51.6.1326

PubMed Abstract | CrossRef Full Text | Google Scholar

de la Rocha, J., Marchetti, C., Schiff, M., and Reyes, A. D. (2008). Linking the response properties of cells in auditory cortex with network architecture: counting versus lateral inhibition. J. Neurosci. 28, 9151–9163. doi: 10.1523/JNEUROSCI.1789-08.2008

PubMed Abstract | CrossRef Full Text | Google Scholar

Ehret, G., and Merzenich, M. M. (1988). Complex sound analysis (frequency resolution, filtering and spectral integration) by single units of the inferior colliculus of the cat. Brain Res. Rev. 13, 139–163. doi: 10.1016/0165-0173(88)90018-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Grimsley, C., Sanchez, J., and Sivaramakrishnan, S. (2013). Midbrain local circuits shape sound intensity codes. Front. Neural Circuits 7:174. doi: 10.3389/fncir.2013.00174

PubMed Abstract | CrossRef Full Text | Google Scholar

Husain, F. T., Lozito, T. P., Ulloa, A., and Horwitz, B. (2005). Investigating the neural basis of the auditory continuity illusion. J. Cogn. Neurosci. 17, 1275–1292. doi: 10.1162/0898929055002472

PubMed Abstract | CrossRef Full Text | Google Scholar

Ito, T., and Malmierca, M. (2018). “Chapter 6: Neurons, connections, and microcircuits of the inferior colliculus,” in The Mammalian Auditory Pathways, Vol. 65 of Springer Handbook of Auditory Research, eds D. Oliver, N. Cant, R. R. Fay, and A. N. Popper (Springer), 127–167. doi: 10.1007/978-3-319-71798-2_6

CrossRef Full Text | Google Scholar

Kato, H., Asinof, S., and Isaacson, J. (2017). Network-level control of frequency tuning in auditory cortex. Neuron 95, 412–423. doi: 10.1016/j.neuron.2017.06.019

PubMed Abstract | CrossRef Full Text | Google Scholar

Keller, C. H., Kaylegian, K., and Wehr, M. (2018). Gap encoding by parvalbumin-expressing interneurons in auditory cortex. J. Neurophysiol. 120, 105–114. doi: 10.1152/jn.00911.2017

PubMed Abstract | CrossRef Full Text | Google Scholar

King, A. J. (2007). Auditory neuroscience: filling in the gaps. Curr. Biol. 17, R799–R801. doi: 10.1016/j.cub.2007.07.013

PubMed Abstract | CrossRef Full Text | Google Scholar

Kopp-Scheinpflug, C., Sinclair, J. L., and Linden, J. F. (2018). When sound stops: offset responses in the auditory system. Trends Neurosci. 41, 712–728. doi: 10.1016/j.tins.2018.08.009

PubMed Abstract | CrossRef Full Text | Google Scholar

Laing, C. R., and Chow, C. C. (2002). A spiking neuron model for binocular rivalry. J. Comput. Neurosci. 12, 39–53. doi: 10.1023/A:1014942129705

PubMed Abstract | CrossRef Full Text | Google Scholar

McWalter, R., and McDermott, J. H. (2019). Illusory sound texture reveals multi-second statistical completion in auditory scene analysis. Nat. Commun. 10:5096. doi: 10.1038/s41467-019-12893-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Miller, G., and Licklider, J. (1950). The intelligibility of interrupted speech. J. Acoust. Soc. Am. 22:167. doi: 10.1121/1.1906584

CrossRef Full Text | Google Scholar

Miller, P. (2018). Chapter 6: An Introductory Course in Computational Neuroscience. Cambridge, MA: MIT Press, 211–256. doi: 10.7551/mitpress/9780262533287.003.0011

CrossRef Full Text | Google Scholar

Moore, B., and Glasberg, B. (1983). Growth of forward masking for sinusoidal and noise maskers as a function of signal delay; implications for suppression in noise. J. Acoust. Soc. Am. 73, 1249–1259. doi: 10.1121/1.389273

PubMed Abstract | CrossRef Full Text | Google Scholar

Noto, M., Nishikawa, J., and Tateno, T. (2016). An analysis of nonlinear dynamics underlying neural activity related to auditory induction in the rat auditory cortex. Neuroscience 318, 58–83. doi: 10.1016/j.neuroscience.2015.12.060

PubMed Abstract | CrossRef Full Text | Google Scholar

Oliver, D. L. (2005). “Chapter 2: Neuronal organization of the inferior colliculus,” in The Inferior Colliculus, eds J. A. Winer and C. E. Schreiner (New York, NY: Springer), 137. doi: 10.1007/0-387-27083-3_2

CrossRef Full Text | Google Scholar

Paredes-Gallardo, A., Dau, T., and Marozeau, J. (2019). Auditory stream segregation can be modeled by neural competition in cochlear implant listeners. Front. Comput. Neurosci. 13:42. doi: 10.3389/fncom.2019.00042

PubMed Abstract | CrossRef Full Text | Google Scholar

Petkov, C. I., O'Connor, K. N., and Sutter, M. L. (2007). Encoding of illusory continuity in primary auditory cortex. Neuron 54, 153–165. doi: 10.1016/j.neuron.2007.02.031

PubMed Abstract | CrossRef Full Text | Google Scholar

Pfeiffer, R. R. (1966). Classification of response patterns of spike discharges for units in the cochlear nucleus: tone-burst stimulation. Exp. Brain Res. 1, 220–235. doi: 10.1007/BF00234343

PubMed Abstract | CrossRef Full Text | Google Scholar

Pickles, J. (2012). An Introduction to the Physiology of Hearing, 4th Edn. Bingley: Emerald.

Google Scholar

Plack, C. J., and White, L. J. (2000). Perceived continuity and pitch perception. J. Acoust. Soc. Am. 108, 1162–1169. doi: 10.1121/1.1287022

PubMed Abstract | CrossRef Full Text | Google Scholar

Rankin, J., Osborn Popp, P. J., and Rinzel, J. (2017). Stimulus pauses and perturbations differentially delay or promote the segregation of auditory objects: psychoacoustics and modeling. Front. Neurosci. 11:198. doi: 10.3389/fnins.2017.00198

PubMed Abstract | CrossRef Full Text | Google Scholar

Rankin, J., Sussman, E., and Rinzel, J. (2015). Neuromechanistic model of auditory bistability. PLoS Comput. Biol. 11:e1004555. doi: 10.1371/journal.pcbi.1004555

PubMed Abstract | CrossRef Full Text | Google Scholar

Rhode, W. S., and Smith, P. H. (1986). Encoding timing and intensity in the ventral cochlear nucleus of the cat. J. Neurophysiol. 56, 261–286. doi: 10.1152/jn.1986.56.2.261

PubMed Abstract | CrossRef Full Text | Google Scholar

Rhode, W. S., and Greenberg, S. (1994). Lateral suppression and inhibition in the cochlear nucleus of the cat. J. Neurophysiol. 71, 493–514. doi: 10.1152/jn.1994.71.2.493

PubMed Abstract | CrossRef Full Text | Google Scholar

Riecke, L., Micheyl, C., and Oxenham, A. J. (2012). Global not local masker features govern the auditory continuity illusion. J. Neurosci. 32, 4660–4664. doi: 10.1523/JNEUROSCI.6261-11.2012

PubMed Abstract | CrossRef Full Text | Google Scholar

Riecke, L., van Opstal, A. J., and Formisano, E. (2008). The auditory continuity illusion: A parametric investigation and filter model. Percept. Psychophys. 70, 1–12. doi: 10.3758/PP.70.1.1

PubMed Abstract | CrossRef Full Text | Google Scholar

Shpiro, A., Curtu, R., Rinzel, J., and Rubin, N. (2007). Dynamical characteristics common to neuronal competition models. J. Neurophysiol. 97, 462–473. doi: 10.1152/jn.00604.2006

PubMed Abstract | CrossRef Full Text | Google Scholar

Shpiro, A., Moreno-Bote, R., Rubin, N., and Rinzel, J. (2009). Balance between noise and adaptation in competition models of perceptual bistability. J. Comput. Neurosci. 27:37. doi: 10.1007/s10827-008-0125-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Sivaramakrishnan, S., Sanchez, J., and Grimsley, C. (2013). High concentrations of divalent cations isolate monosynaptic inputs from local circuits in the auditory midbrain. Front. Neural Circuits 7:175. doi: 10.3389/fncir.2013.00175

PubMed Abstract | CrossRef Full Text | Google Scholar

Vinnik, E., Itskov, P., and Balaban, E. (2010). A proposed neural mechanism underlying auditory continuity illusions. J. Acoust. Soc. Am. 128, EL20–EL25. doi: 10.1121/1.3443568

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, X., Lu, T., Snider, R. K., and Liang, L. (2005). Sustained firing in auditory cortex evoked by preferred stimuli. Nature 435, 341–346. doi: 10.1038/nature03565

PubMed Abstract | CrossRef Full Text | Google Scholar

Warren, R. M. (1970). Perceptual restoration of missing speech sounds. Science 167, 392–393. doi: 10.1126/science.167.3917.392

PubMed Abstract | CrossRef Full Text | Google Scholar

Warren, R. M. (2008). Chapter 6: Perceptual Restoration of Missing Sounds, 3rd Edn. Cambridge, MA: Cambridge University Press, 150–173. doi: 10.1017/CBO9780511754777.007

CrossRef Full Text

Warren, R. M., Obusek, C. J., and Ackroff, J. M. (1972). Auditory induction: perceptual synthesis of absent sounds. Science 176, 1149–1151. doi: 10.1126/science.176.4039.1149

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: auditory scene analysis, bistability, computational neuroscience, continuity illusion, hysteresis, neural dynamics

Citation: Cao Q, Parks N and Goldwyn JH (2021) Dynamics of the Auditory Continuity Illusion. Front. Comput. Neurosci. 15:676637. doi: 10.3389/fncom.2021.676637

Received: 05 March 2021; Accepted: 04 May 2021;
Published: 08 June 2021.

Edited by:

Emili Balaguer-Ballester, Bournemouth University, United Kingdom

Reviewed by:

James Rankin, University of Exeter, United Kingdom
Rodica Curtu, The University of Iowa, United States

Copyright © 2021 Cao, Parks and Goldwyn. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Joshua H. Goldwyn, amhnb2xkd3luQGdtYWlsLmNvbQ==

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.