Dynamics of the Auditory Continuity Illusion

Illusions give intriguing insights into perceptual and neural dynamics. In the auditory continuity illusion, two brief tones separated by a silent gap may be heard as one continuous tone if a noise burst with appropriate characteristics fills the gap. This illusion probes the conditions under which listeners link related sounds across time and maintain perceptual continuity in the face of sudden changes in sound mixtures. Conceptual explanations of this illusion have been proposed, but its neural basis is still being investigated. In this work we provide a dynamical systems framework, grounded in principles of neural dynamics, to explain the continuity illusion. We construct an idealized firing rate model of a neural population and analyze the conditions under which firing rate responses persist during the interruption between the two tones. First, we show that sustained inputs and hysteresis dynamics (a mismatch between tone levels needed to activate and inactivate the population) can produce continuous responses. Second, we show that transient inputs and bistable dynamics (coexistence of two stable firing rate levels) can also produce continuous responses. Finally, we combine these input types together to obtain neural dynamics consistent with two requirements for the continuity illusion as articulated in a well-known theory of auditory scene analysis: responses persist through the noise-filled gap if noise provides sufficient evidence that the tone continues and if there is no evidence of discontinuities between the tones and noise. By grounding these notions in a quantitative model that incorporates elements of neural circuits (recurrent excitation, and mutual inhibition, specifically), we identify plausible mechanisms for the continuity illusion. Our findings can help guide future studies of neural correlates of this illusion and inform development of more biophysically-based models of the auditory continuity illusion.


INTRODUCTION
How do listeners in crowded and noisy environments create stable auditory streams in the face of interruptions and "background" noise? How do listeners identify the stops and starts of overlapping and interwoven sounds to correctly parse an auditory scene? Answering these questions is fundamental to understanding auditory perception and neural processing of sounds. A perceptual illusion that sheds light on dynamic processing of multiple sounds is the auditory continuity illusion (Bregman, 1990) (also called temporal induction; Warren, 2008). The continuity illusion can be elicited when noise interrupts a variety of sounds including: tones, frequency glides, and sentences (Bregman, 1990;Warren, 2008); complex tones Plack and White (2000); and sound textures (McWalter and McDermott, 2019). The common aspect of this illusion is that, when the noise is sufficiently loud and shares spectral content with the interrupted signal, listeners perceive the signal as continuous and uninterrupted. This illusion reveals a tendency for the auditory system to maintain perceptual continuity when confronted with sudden changes in the auditory scene and to sustain perception of sounds that are present prior to some masking or distracting sound. The continuity illusion has been thoroughly studied since its discovery (Miller and Licklider, 1950;Warren, 1970).
The version of the continuity illusion we investigate is an interrupted tone that can be perceived as continuous if the interruption (a short interval in which no tone is presented) is filled with broadband noise, as depicted in Figure 1. Results of listening experiments inform conceptual models for the illusion such as Bregman's theories of auditory scene analysis (Bregman, 1990). Fundamental questions remain about the perceptual origin and neural basis for the illusion [ that is, whether it is a cortical phenomenon (Husain et al., 2005;King, 2007;Petkov et al., 2007) or is created subcortically (Bidelman and Patro, 2016), and whether it is required that peripheral responses signal discontinuities in the sound (Riecke et al., 2012)]. Perhaps due to these uncertainties, there have been few efforts to model how the dynamic activity of neural populations can generate the continuity illusion. Previous works point toward several possible neural mechanisms that can give rise to the illusion: feedforward intra-cortical connections (Husain et al., 2005), nonlinear dynamic self-excitation (Noto et al., 2016), short-term synaptic plasticity (Vinnik et al., 2010), and a hierarchy of neural subpopulations integrating information on multiple time scales that are modulated by top-down feedback (Balaguer-Ballester et al., 2009). These few and disparate studies are motivation for further model-based investigations of the continuity illusion.
Our goal is to identify dynamical mechanisms that implement fundamental principles of the illusion. We are informed by the work of Petkov and colleagues who obtained evidence that two types of neural responses participate in the continuity illusion: sustained responses that signal ongoing sounds and transient responses that signal acoustic edges in time (onsets and offsets of sounds) (Petkov et al., 2007). While they specifically identified these response types in auditory cortex, we view these as common response motifs that are found throughout the auditory pathway (see Kopp-Scheinpflug et al., 2018 for a review of offset auditory neurons). In addition, we seek to connect Bregman's principles, gained over decades of careful observation, to fundamental features of neural dynamics. Specifically, we present dynamical explanations for two of Bregman's "rules" that define the circumstances under which the illusion will occur (Bregman, 1990). We paraphrase these rules as: Sufficiency of Evidence Rule: There must be some neural activity during the interruption that is indistinguishable from what would have occurred if the tone had continued through the noise-filled interruption. No Discontinuity Rule: There must be no evidence that the tone shuts off during the noise-filled interruption.
We use a nonlinear, dynamical firing rate model (Miller, 2018) with sustained inputs to implement the Sufficiency of Evidence rule. The dynamical principle at work is hysteresis: the interrupting noise (a broadband sound) provides a partial amount of excitatory input to an idealized neural population. Recurrent excitation enables the noise to maintain firing activity if the population is already active, even though the noise alone cannot activate an inactive population. We then adjust excitability in the model so that it has two stable states that coexist in the absence of sustained inputs (bistability). We use this configuration to show that transient inputs at tone onsets and offsets implement the No Discontinuity rule. Continuity occurs when noise suppresses the offset response at the end of the first tone. Without this sufficiently strong offset response, the population remains active despite the absence of any ongoing input during the interruption between tones. Finally, we configure the model to receive both input types. In this setting, the model utilizes both hysteresis and bistability to create dynamics consistent with the continuity illusion. This formulation of using persistent neural activity as an indicator of perceptual state, instantiated as an attractor in a dynamical system, resembles ideas common to studies of the neural basis for working memory (Brody et al., 2003).
We illustrate our dynamical explanation for the continuity illusion with simulation results and we emphasize, throughout, analytical and geometric insights gained by computing equilibrium solutions to the nonlinear differential equation that governs the firing rate dynamics. By identifying dynamical principles that are consistent with the continuity illusion, we illuminate how a long-standing perceptual theory (Bregman's "rules") can be embodied with an idealized neural population. Furthermore, the postulated roles of bistability and hysteresis in the continuity illusion can guide how future, more biophysically-detailed modeling studies of this phenomenon should be constructed.

Firing Rate Model of a Neural Population
We model sound-evoked activity of a neural population using a firing rate equation with recurrent excitation (Miller, 2018). The differential equation that describes the dynamics of the neural activity is where x(t) is the firing rate variable that takes unitless values between 0 (population inactive) and 1 (population active). The value of x(t) can be interpreted in a mean-field sense as the proportion of active neurons in a population or as the instantaneous probability of firing for neurons in the population. Frequency-tuning, known as tonotopy, is a central organizing feature of the auditory pathway (Pickles, 2012). Since we are studying responses to pure tone inputs (or a pure tone interrupted by broadband noise), we define x(t) as the activity If a weak noise burst is inserted into the gap between tones, the tone remains perceived as discontinuous. (C) A sufficiently loud noise that shares frequency content with the tone can induce the illusion that the tone is uninterrupted and persists through the noise-filled interruption.
of a population tuned to the tone frequency. We associate values of x(t) near 1 (active population) with perception of the tone at time t. The parameter τ is a time constant of the firing rate dynamics and a E is the strength of recurrent excitation within the population. The nonlinear input-output function f has the sigmoidal form Since we are working with unitless quantities, we use k = 1 throughout without loss of generality. We explain our choices for the half-maximum parameter m in more detail below. The input term I(t) represents sound-driven inputs to the population. It consists of four components, some of which are sustained inputs (piecewise constant for the duration of the sound) and some are transient inputs (exponentially-decaying following tone onsets or offsets). Our classification of inputs as either sustained or transient, and our description of their dynamics as either constant or exponentially-decaying, are caricatures of actual inputs. Nevertheless, these response types are observed in auditory cortex (Wang et al., 2005) and in other nuclei (e.g., Pfeiffer, 1966). Moreover, we are motivated by observations of Petkov et al. (2007) who proposed that sustained and transient response types in primary auditory cortex are possible neural correlates of the continuity illusion.
The general form of I(t) is We consider three input scenarios: tone-only (tone on for 1 s and no noise), masking (tone and noise presented together for 1 s), and continuity (a pair of 1-s long tones separated by a half-second noise-filled gap). Figure 2 illustrates the forms of each of these inputs across these different scenarios. To be clear, tones and noise are represented by idealized waveforms (piecewise constant for sustained inputs, exponential decay for transient inputs). We do not use sinusoidal waveforms for tones or stochastic processes for noise. We next describe the forms and physiological rationales for each of these components. As we will make clear in section 3.1, some of these terms are omitted depending on the model configuration.
The sustained input I sustain (t) is a piecewise-constant function that is positive when a tone is on and zero otherwise. This term represents the frequency-tuned excitatory input to the tone population. We use a unitless parameter I T to represent tone strength and set I sustain (t) equal to I T in the absence of noise (tone-only input, Figure 2A1). As discussed in section 3.1, we construct all models so that I T = 1 is the threshold for activation of the firing rate variable (in the absence of noise). Although I T is unitless, sound levels only vary over a finite range in behaviorally-relevant situations, so we restrict I T to not exceed a maximum tone level that we set (arbitrarily) at I T,max = 5. As a broadband signal, noise also has energy in the frequency channel to which the tone population is tuned. We denote noise strength by I N (takes values between 0 and I N,max = 10) and allow the sustained excitatory term I sustain (t) to increase with I N , see Figures 2B1,C1 for illustrations of I sustain (t) in the masking and continuity scenarios. This input is where 1 tone is an indicator function that takes the value one if a tone is on at time t and zero otherwise, 1 noise is defined similarly for the noise input, and α scales the contribution of noise to this excitatory, frequency-tuned input to the tone population. The onset input I onset (t) and offset input I offset (t) are illustrated in the second and third rows of Figure 2. These are exponentially-decaying terms triggered by tone onsets and offsets, respectively. The minus sign before I offset in Equation (3) indicates that offset responses are inhibitory. Inhibitory offset neurons have been found in the auditory system including neurons in the superior paraolivary nucleus (Oliver, 2005;Kopp-Scheinpflug et al., 2018) and among parvalbumin-expressing interneurons in auditory cortex (Keller et al., 2018). We use these transient inputs to represent the salience of the onsets and offsets of a tone or, in other words, the sharpness (in time) of acoustic edges. We therefore use tone strength (I T ) to set the initial amplitudes of these transient inputs. We suppose that noise obscures the sharpness of these acoustic edges (in time) and thus noise strength (I N ) reduces I onset (t) and I offset (t) (if the noise is FIGURE 2 | Inputs to the firing rate model. Sustained tone-driven excitatory input (row 1), tone onset excitatory input (row 2), tone offset inhibitory input (row 3), and sustained noise-driven inhibitory input (row 4). The three input scenarios considered are tone alone (I N = 0, column 1), masking (tone and noise presented simultaneously, column 2), and continuity (tones separated by a noise-filled gap, column 3). Effects of noise are to increase I sustain (see B1 and C1), suppress transient responses I onset and I offset (if noise is on at relevant onset and/or offset (see B2,B3,C2,C3), and increase I inhib (see B4,C4). The plot of I inhib uses x = 0, the amplitude of this input decreases for larger values of x (as indicated by arrow in B4 and C4, see Eq. 8. Parameter values used are I T = 2 and I N = 2 (for masking and continuity), α = 0.168, and β = 2/3. These are the α and β values used in Models 1 and 2, respectively. See text for details. on at the time of tone onset and/or offset). The form of the onset input for a tone that starts at time t on is The offset input is defined analogously. We choose the decay time constant τ for transient responses to match the time scale of the firing rate dynamics. The function H is the Heaviside function (also known as a step function) that is 0 if its argument is <0 and 1 if its argument is >0. We use it here to indicate that the transient onset response begins at tone onset (t on ). The constant A on gives the initial amplitude of the onset input. As described above, it increases with I T and decreases with I N and takes the form: where [·] + is the rectifier operation defined as [u] + = max(0, u). The parameter γ on scales the onset response so that tone threshold is always I T = 1. We detail the calculation of γ on in section 3.1. In our analysis of this model we will make use of the fact that this exponentially-decaying input can also be described by I inhib (t) = γ on s where s is a dynamical variable that follows a first-order, linear, ordinary differential equation: for t < t on : The final input term in Equation (3) is the sustained inhibitory term I inhib (t). This term represents a noise-driven subpopulation (I inhib increases with I N ) and it directly inhibits the tone population (note the minus sign in Equation 3). This term can reflect processes as early as the auditory nerve, where it is known that noise suppresses responses to tones (Costalupes et al., 1984). It can also be thought to represent lateral inhibition ("lateral" in the sense that neural responses can be suppressed by sounds with spectral content away from their best frequency). Lateral inhibition is evident in early auditory stages (Ehret and Merzenich, 1988;Rhode and Greenberg, 1994) and has also been observed in auditory cortex (Kato et al., 2017) and suggested to exist there by modeling work (de la Rocha et al., 2008). We suppose, further, that there is mutual inhibition between the tone-driven population and the noise-driven inhibitory population that we implement with a multiplicative factor (1−x).
For analytical convenience, we make the approximation that the dynamics of the noise-driven inhibitory population are much faster than x(t) so that I inhib (t) can be described as evolving instantaneously to an x-dependent steady state of the form: where the parameter a I determines the strength of this inhibitory input to the tone population. Time courses of I inhib (t) for the three input scenarios are shown in the fourth row of Figure 2.

Numerical Simulations
All calculations were carried out using the scientific computing software MATLAB (The MathWorks, Inc.). The firing rate differential equation (Equation 1) was solved numerically using ode15s. Simulation code is available for download and use at https://github.com/ jhgoldwyn/ContinuityIllusion. Nonlinear equations (to compute equilibrium states, for example) were solved using root-finding functions in Matlab such as fzero and fsolve.

RESULTS
We begin, in section 3.1, by characterizing different dynamical regimes of the model and determining appropriate parameter choices (a e and m in Equations 1 and 2). We identify three regimes that can exhibit the continuity illusion and analyze models drawn from each of these regimes. In section 3.2, we study a model with firing rate dynamics that exhibit hysteresis and can be activated by sustained inputs alone. In section 3.3, we study a model with bistable firing rate dynamics that can be activated by transient inputs alone. In section 3.4, we study a model in which the hysteresis and bistable features are both operative and the firing rate can only be activated by combinations of sustained and transient inputs.
In each subsection, we describe how these models respond to a tone alone (without noise), a tone and noise presented together (masking), and two tones separated by a noise-filled gap (continuity). We derive activation threshold criteria for each model and stimulus type and describe how parameter values affect activation thresholds. Our primary observation is that all models can support continuity dynamics (firing activity persists through a noise-filled gap between tones), but that this is accomplished differently using the mechanisms of hysteresis and bistability. As described above, and following Petkov et al. (2007), we propose that sustained transient inputs to the model implement Bregman's Sufficiency of Evidence rule and transient inputs implement the No Discontinuity rule. While two of the models can implement these rules in isolation (sustained inputs only in section 3.2, transient inputs only in section 3.3), the significance of the final model (section 3.4) is that it succeeds at implementing both of these rules together. Firing activity in that model can persist during a noise-filled gap between tones because noise has two effects: it contributes excitatory inputs during the interruption (sufficient evidence), and prevents the offset response from inactivating the population (no discontinuity).

Model Classification by Firing Rate Equilibria
A general analysis of the equilibrium states of the model for the case of sustained tone input only (I onset (t) = I offset (t) = 0 for all t and I N = 0) informs our parameter choices and is a starting point for subsequent analyses. Setting x ′ = 0 and I(t) = I T in Equation 1) and solving for I T we find the equilibrium relation The parameters a E and m determine the shape of the firing rate equilibrium curve. We show three representative examples in Figure 3A, with equilibrium firing rates plotted on the vertical axis and I T plotted on the horizontal axis. The key features of these curves are: whether they are S-shaped with left and right "knees" and, if so, the values of I T at these knees. Recall that the firing rate variable x takes values between zero and one. We interpret equilibrium solutions x ≈ 0 as inactive states (no perception of tone) and equilibrium solutions x ≈ 1 as active states (perception of tone). We summarize the possible scenarios in Table 1 and show how they partition the a E -m plane in Figure 3B. The scenarios that interest us must satisfy three criteria. First, to clearly distinguish between active and inactive states, the firing rate equilibrium curve must be S-shaped. This rules out Region I. Second, the left knee must be located at an I T value less than the maximum tone strength (I T,max = 5) so that activation is possible. This rules out Region II. Lastly, the right knee must be located at a positive I T value so that inactivation is possible. This rules out Region V, in which the population could remain active at all times, even without inputs.
As detailed in Table 1, the number and locations of the knees of the firing rate equilibrium curve delineate these regions. The knees are saddle node bifurcation points in the firing rate dynamics (points at which stable and unstable equilibria appear or disappear). They are located at critical points of I T (x), so we identify these points by differentiating Equation (9) with respect to x and setting the resulting expression to zero. The x values at the left and right knees are We parameterize a E and m by selecting values for I T (x L ) and I T (x R ) and then solving the resulting two nonlinear equations for I T obtained from Equation (9). No real solutions exist for a E < 4, thus a E = 4 marks the boundary of models without S-shaped firing rate equilibrium curves (region I in Figure 3B). The regions of interest for this study (III and IV) are determined by the positions of the knees in the firing rate equilibrium curve.  (1) and (2). Colored dots correspond to the curves in (A). Parameter space is partitioned according position of left and right knees of the firing rate equilibrium curves (see Table 1 for details).
In particular, region IV consists of (a E , m) parameter pairs for which the model is bistable in the absence of any inputs (the left knee is at a negative I T value and the right knee is at a positive I T value). Region III consists of models that are monostable with no inputs (left knee is at a positive I T value). We distinguish between two subregions of Region III. In Region IIIa, I T at the right knee is less than the maximum tone value, and thus sustained inputs alone can activate the neural population. In contrast, in Region IIIb the I T value at the right knee is larger than the maximum tone strength and thus models in this region can only be activated by a combination of sustained and transient inputs.
From these considerations, we choose three models (one from each region of interest) and use these in all further analysis and simulations.
Model 1 (a E = 5.9, m = 3.6, in Region IIIa). The left knee is at I T = 0.2 and the right knee is at I T = 1. As we describe below, the essential dynamical feature of this model is hysteresis: in response to sustained inputs the system requires a higher tone level to activate than deactivate (the S-shape of the firing rate equilibrium curve creates a mismatch between the two knees). We use sustained inputs only for this model (I onset (t) = I offset (t) = 0 in Equation 3 for all t). Model 2 (a E = 10.5, m = 5.2, in Region IV). The left knee is at I T = −2 and the right knee is at I T = 2. As we describe below, the essential dynamical feature of this model is bistability. We use transient inputs to move the system between active and inactive states and do not include sustained inputs (I sustain (t) = I inhib (t) = 0 in Equation 3 for all t). Model 3 (a E = 12.7, m = 9.5, in Region IIIb). The left knee is at I T = 0.2 and the right knee is at I T = 6. Neither sustained inputs alone nor transient inputs alone can activate models in Region IIIb, so we use all inputs types in Equation 3 for this model.  Recall that Model 1 includes sustained inputs only (I onset (t) = I offset (t) = 0 for all t). The equilibrium solutions for this model for tone inputs only (I N = 0) are shown in Figure 4A1 with I T as the bifurcation parameter (horizontal axis). When there is no tone input (I T = 0), this system has a single stable equilibrium in the inactive state. For stronger tone inputs, the system passes through a saddle point bifurcation point at which a second stable equilibrium is created in the active state. Activation of the population from rest requires the tone to be larger, namely that I T > I R where I R is the tone level at the right knee of the firing rate equilibrium curve. At this second saddle node bifurcation point, the inactive state is abolished and only the active state remains as the unique stable and globally attracting fixed point for the system. Responses to subthreshold and suprathreshold tone inputs are shown in Figure 4A2.
The feature of this model that is essential in our study of the continuity illusion (discussed below) is that it exhibits hysteresis dynamics. By this we mean that the tone level that activates an inactive population is larger than the tone level that maintains an already active population in the active state. Hysteresis is seen geometrically in the S-shaped firing rate equilibrium curve I T (x) (Figure 4A1). The activation threshold is the tone strength at the right knee. The deactivation threshold-the minimum tone level that maintains activity-is the tone strength at the left knee. For Model 1, we positioned these knees at I T = 1 (activation) and I T = 0.2 (deactivation). As an example of the hysteresis effect: tone input with I T = 0.5 does not activate x(t) from rest (blue curve in Figure 4A2) but it would maintain x(t) at a level near 1 if x(t) were active prior to this input (not shown, but notice the upper branch of equilibria in Figure 4A1 extends to I T values <0.5).

Response Dynamics to Tone Masked by Noise
The two effects of noise are that it provides a "partial" input that enters as the additive term αI N in I sustain (t) and it drives inhibition through the term I inhib = a I I N (1 − x), recall Equations 4 and 8). The effect of the inhibitory term is to shift the right knee of the firing rate equilibrium curve to larger I T levels ( Figure 4B1, compare black curve with noise to gray curve without noise). That is, the threshold for activation increases with I N , as desired for noise to have a masking effect. As a demonstration, a tone level of I T = 1.5 would activate the population from rest in the absence of noise, but with I N = 1 this input does not activate the population (red curve in Figure 4B2).
The equilibrium solutions for this model, using I T as the bifurcation parameter and now including the effect of noise, are The critical points at which stable equilibria are created and abolished are the knees of the S-shaped curve. We obtain results similar to Equation (10), but now with noise included: The x R (I N ) point locates the threshold at which a tone activates a population from rest, in the presence of noise. The threshold for masking, then, is the noise level that solves I T = I R (I N ). Any value of I T smaller than this critical value would fail to activate the population due to the noise-driven inhibition. We denote this masking threshold tone level as M(I N ). It depends nonlinearly on I N because of the dependence of x R on I N . For Model 1, however, x R is relatively constant with respect to I N and we find masking threshold as a function of I T is approximately linear (gray curve in Figure 4D). The slope of this nearly linear relation can be approximated by a I 1 − x R (0) − α −1 (found by differentiating I T with respect to I N , and neglecting any change in x R with respect to I N ). This approximation shows the opposing effects of noise on masking threshold: M(I N ) increases with increasing α (the amount of excitatory noise input) and decreases with increasing a I (the amount of inhibitory noise-driven input; see Figure 4E). This relation also imposes a constraint on model parameters. We must have α < a I 1 − x R (0) . If this condition is not satisfied, the excitatory effect of noise [the αI N term in I sustain (t)] would dominate the inhibitory effect of noise (I inhib ) and masking would not be possible.

Response Dynamics to Tones Interrupted by Noise
The x L point locates the threshold for inactivation in the presence of noise. A population in the active state will remain active during the gap between tones if the noise strength causes I T (x L ) to cross over to negative values (see Figure 4C1, and also timecourses of x(t) in Figure 4C2). Thus, we calculate the continuity threshold equation by solving (numerically) the root-finding problem I T (x L , I N ) = 0, where x L is also a function of noise level (Equation 12). Observe that, since I T = 0 during the gap between tones, the continuity threshold is constant with respect to tone level, as shown by the horizontal black line in Figure 4D.
The masking and continuity thresholds for this model are separate. This imposes additional constraints on our parameter choices. Specifically, in accordance with the hypothesis that the continuity illusion is a compensation for masking (Warren et al., 1972;Warren, 2008), we require that continuity can only occur at noise levels at least as high as the masking threshold. Additionally, we are only interested in parameter sets for which masking and continuity can both be achieved (continuity and masking thresholds must not exceed I N,max = 10, even for the maximum tone level I T,max = 5). A view of the a I − α parameter region that satisfies these requirements is shown in Figure 4H, with labeled contour lines indicating the corresponding masking and continuity thresholds at the maximum tone level.

Response Dynamics to Tone-Only Inputs
Model 2 receives transient inputs only (I sustain (t) = I inhib (t) = 0 for all t). These transient inputs occur at tone onsets and offsets and signal discontinuities at the "edges" of a tone. To analyze dynamics of this model at tone onset we find it useful to formulate it as a system of two ordinary differential equations: The additional variable s describes the exponentially-decaying transient input, as introduced in Equation (7). At tone onset, this variable is instantaneously displaced to s(t on ) = [I T − βI N ] + . Similar dynamics occur at tone offset, with γ on replaced by γ off and s(t off ) = − [I T − β I N] + . Notice s(t off ) is negative-valued because offset responses are inhibitory inputs to x. We configured Model 2 so that it is bistable in the absence of any inputs. In the x-s phase plane, bistability comprises stable equilibria at inactive and active firing rates (x I and x A , respectively) separated by an unstable saddle point (x S ). All equilibria are located along the x-axis. Activation of the population from rest requires that the transient onset response is sufficiently large to transition x(t) from the basin of attraction of x I to the basin of attraction of x A . This condition can be visualized in the phase plane by considering the separatrix curve S(x) that divides these two basins of attraction. The firing rate will activate from rest if s(t on ) > S(x I ), that is if the response variable at tone onset exceeds the height of the separatrix curve evaluated at the inactive firing rate equilibrium. We observed that the separatrix curve can be adequately approximated as a line connecting the saddle point (x S , 0) to the point (x I , 1). The choice of s = 1 at threshold enforces our convention that activation for tone-only inputs occurs at I T = 1. From this geometric argument, the linear approximation to the separatrix is Trajectories in the phase plane at tone onset are shown in Figure 5A1 and their full time-courses are shown in Figure 5A2.
If the tone level is sufficiently high (I T = 1.2 in this simulation, red curve), the system crosses the separatrix and transitions to the stable equilibrium in the active state. This approximation to the separatrix can also be found by linearizing the dynamical system in Equation (13) about the saddle point and determining the eigenvector associated with its stable manifold. The Jacobian matrix for the system is The negative eigenvalue for the saddle point is the lower right entry of this matrix. The associated eigenvector satisfies (J 11 − J 22 )x + J 12 s = 0, where J ij is the (i, j) entry of the Jacobian matrix so we conclude that the linear approximation to the separatrix is A useful consequence of our assumption that x and s have the same time constant τ is that this expression can be simplified substantially to: Comparing this equation to the result obtained by geometric considerations (Equation 14), we observe that γ on is determined by the parameters a E and m, and our convention that tone threshold is I T = 1. In particular, we set γ on = a E (x S − x I ).
Deactivation is the mirror image of activation and occurs if the "downward" perturbation of s is sufficiently strong at the tone offset. For the values of a E and m that we use, the stable equilibria are symmetric around the saddle point at x = 0.5 and it is convenient to set γ on = γ off so that the threshold for activation and deactivation are the same. More generally, the offset parameter should always be set so that activation thresholds are not less than deactivation thresholds, to avoid the scenarios in which a tone onset can activate the firing rate variable but the tone offset response is too weak to return the population. In this unrealistic case, x(t) could remain in the active state for perpetuity.

Response Dynamics to Tone Masked by Noise
If a sufficiently strong noise is presented at the same time as a tone, then the noise can prevent activation of the tone population by reducing the transient response at the start of the tone. This is the masking condition. Simulations exhibiting masking dynamics are in Figure 5B. Recall the effect of noise is to reduce the onset response to s(t on ) = [I T − βI N ] + where β is a parameter that controls how much noise suppresses the tone onset (given in Equation 6). As we explained above, in our discussion of tone activation, the model is parameterized so that I T = 1 is the threshold for activation in the absence of noise. Thus, a noise will mask a tone if I T − βI N < 1. We denote the threshold for masking as M(I T ) and conclude that it is related to tone strength via a linear equation: This equation is valid for values of I T above the noise-free threshold (I T = 1) and below the maximum tone strength in the model (I T,max = 5). It provides a direct relationship between masking threshold and the degree to which noise masks tone onsets (represented by the parameter β). In the simulations shown in Figure 5) we use β = 2/3. The masking threshold curve is shown in Figure 5D.

Response Dynamics to Tones Interrupted by Noise
The feature of this model that is essential for our study of the continuity illusion is that the inactive and active states coexist and are stable in the absence of any inputs. The tone population can, therefore, remain in the active state even after a tone is turned off if the offset signal is weak and does not send x(t) across the separatrix. Simulations exhibiting continuity dynamics are in Figure 5C. Whereas masking depends on suppression of the onset response (as described above), the continuity illusion depends on suppression of the offset response. Computation of the continuity threshold is analogous to our derivation of the masking threshold, but with onset terms replaced by offset terms. The criteria for continuity are that I T > 1 (so that the first tone activates the population) and that I N is sufficiently large to reduce the tone offset response and x(t) to remain near its upper equilibrium state. To satisfy this second condition, we must have that the offset response does not cross the separatrix curve, with the key difference being that we are now analyzing the system at the start of the noise-filled gap. This means x(t off ) = x A (first tone has activated the population) and s(t off ) = − [I T − βI N ] + (noise reduces offset amplitude). Adapting the separatrix equation in Equation (17) for the offset response, we have that continuity requires − [I T − βI N ] + < a E (x S − x A ) /γ off . The continuity threshold equation is, therefore, In the particular case of Model 2, we have chosen parameters that make it symmetric (x I and x A are equidistant from the saddle point). We set γ off = γ on = a E (x S −x I ), so that the continuity and masking thresholds are identical (compare C(I T ) to Equation 18, and see threshold lines in Figure 5D). More generally, we would require γ off ≥ γ on to avoid persistent activation (discussed above). If γ off > γ on , the continuity threshold would shift upward in Figure 5D (γ off affects the intercept of C(I T ) but not its slope). In the intermediate I N values between C(I T ) and M(I T ), we observe responses in which the first tone activates x(t), then x(t) deactivates during the noise-filled gap (no continuity, offset response too strong), and the second tone does not reactivate x(t) (second onset response too weak). In other words, the model without symmetry results in a region in stimulus space in which the noise burst between the two tones is too weak to induce the continuity illusion but sufficiently strong to prevent perception of the second tone by forward masking.

Model 3: Combined Inputs Implement Both Rules for the Continuity Illusion
The last model configuration we consider is one that cannot be activated by transient inputs alone or sustained inputs alone. These requirements are met if I T at the left knee of the firing rate equilibrium curve is positive (no bistability at rest) and the I T at the right knee is to the right of the maximum allowable tone level I T,max . Activation from rest can only occur using a combination of sustained and transient inputs. There must be a sufficiently strong sustained input to move the system past the saddle node bifurcation point at the left knee of the firing rate equilibrium curve. This creates a stable equilibrium in the activated state that can be accessed if the transient portion of the input is sufficiently strong to transition the  system into the basin of attraction of this upper equilibrium (see Figure 6).
To understanding firing rate responses for this model, we again formulate the dynamics in the x-s state space: These equations govern the dynamics of the system in the time following a tone onset or offset. The parameter γ should be thought of representing γ on when describing onset responses and γ off when describing offset responses. The initial values for these equations are given by the state of the system immediately prior to sound onset or offset. For the case of tone onset for a system starting from rest (no input), for instance, the initial values would be x(t on ) = x I (0, 0) and s(t on ) = 1, where x I (0, 0) is the inactive state in the case of I T = I N = 0. We will use similar notation throughout this section to indicate that equilibrium points are functions of the input levels.
The Jacobian matrix for these equations is where we have abbreviated the argument of f ′ with u = a E x + [I T − βI N ] + − a I (1 − x)I N + γ on s. As before, we evaluate the Jacobian at the saddle point (when it exists, for sufficiently large I T ) and use the eigenvector associated with the stable manifold of the saddle point to construct a linear approximation to the separatrix curve that defines the threshold for activation. The notable difference between the analysis in this section and the preceding section (for Model 2), is that the sustained input terms can affect the positions of the equilibrium solutions and the shape of the separatrix in the current model setting. Following our earlier calculation (recall Equation 16), we find the eigenvector by solving (J 11 − J 22 )x + J 12 s = 0. After simplifications, we find the linear approximation to the separatrix to be S(x; I T , I N ) = a E + a I I N γ where we are assuming that I T is sufficiently large so that the saddle point x S (I T , I N ) exists. In the remaining sections, we apply this result in the three cases we have been considering (tone only, simultaneous tone and noise, and tones with a noise-filled gap) to characterize activation by tone, masking, and continuity dynamics.

Response Dynamics to Tone-Only Inputs
Activation by a tone-only input (I N = 0) occurs if the onset response causes the system to cross the separatrix defined in Equation (22). In particular, we consider the system starting from rest, with x(t on ) = x I (0, 0) and input variable s instantaneously perturbed to s(t on ) = I T . We then ask if this onset perturbation to s exceeds S(x(t on )). We must also keep in mind that the position of the saddle point is determined by inputs, so in this case we use x S (I T , 0) in Equation (22). From these considerations we conclude that, to satisfy our convention that I T = 1 is the tone threshold, we must set the onset parameter to γ on = a E x S (1, 0) − x I (0, 0) . Simulations showing activation by a tone are shown in Figure 7A. The phase portrait in Figure 7A1 illustrates the dynamics at tone onset. We remark that I offset (t) is not necessary to move the system back to the inactive state. In this model, the return to the inactive state at the end of the tone is guaranteed in the tone-only case because the saddle point and upper equilibrium do not exist for I T = 0. The firing rate variable must return to x I (0, 0) because it is the unique, remaining stable equilibrium. This differs from Model 2 which required an offset response to deactivate x(t). We will see shortly, however, that the offset response does affect dynamics of the continuity illusion dynamics, and we will explore γ off further in that setting.

Response Dynamics to Tone Masked by Noise
In the masking condition, tone and noise inputs are both present simultaneously and thus the saddle point about which we linearize the system depends on I T and I N . To determine whether a noise prevents activation by a tone, we are still interested in onset dynamics from rest. The initial firing rate is x(t on ) = x I (0, 0). We use the separatrix equation (Equation 22) again to determine the masking threshold. We determined γ on from the tone-only response and we use it to here to update the separatrix at tone onset: S(x; I T , This equation reveals the differing effects of sustained tone and noise inputs. The sustained tone input does not alter the slope of the linear approximation to the separatrix, it is the same as the slope found for the model with transient inputs only (Equation 16), but it can change the intercept through the dependence of x S on I T in the numerator. The sustained noise input, in contrast, changes the intercept and also the slope of the intercept (through the term a I I N /a E ). Trajectories in the phase plane and as firing rate time-courses showing masking of a tone by noise are in Figure 7B. In this example, the onset response exceeds the threshold for tone-only inputs (s(t on ) is above one in Figure 7B1). Nevertheless, the population returns to the inactive state because the noise input has raised the threshold for activation (in this case, primarily by steepening the slope of separatrix).
With Equation (23) in hand, we compute the masking threshold as the minimum I N level (for a given I T level) at which the onset response crosses the separatrix. To do this, we set the amplitude of the transient onset response to the height of the separatrix at the initial value. Thus, the masking threshold M(I T ) is the noise level that solves the nonlinear equation I T − βI N = S(x I (0, 0); I T , I N ). We solve this equation numerically and display the resulting masking threshold curve in Figure 7D, for selected parameter values.

Response Dynamics to Tones Interrupted by Noise
In contrast to the two cases just considered (activation by a tone alone, masking by noise), understanding the continuity illusion requires analysis of the offset response. We approximate the threshold, as usual, with the linear approximation to the separatrix that is given in Equation (22). We interpret γ in that equation as γ off since we are concerned with the dynamics at the offset of the first portion of the tone. There is no tone (I T = 0) during the noise-filled interruption between the tones, so the saddle point position is only a function of I N . This highlights the fact that there are two necessary conditions for continuity dynamics in this model: the noise level must be sufficiently strong to preserve the saddle point and the stable active equilibrium, and the offset response must be sufficiently weak (perhaps because it is masked by the noise) to prevent the system from crossing the separatrix. The second of these conditions is satisfied if the offset response, which has initial amplitude −(I T −βI N ), does not cross S(x A ; 0, I N ). A firing rate response that remains active when the tone is removed (consistent with the continuity illusion) is shown in the x-s phase plane in Figure 7C1 and as a time-course in Figure 7C2. The continuity threshold curve, C(I T ), which we calculate by solving for I N the nonlinear equation −(I T − βI N ) = S(x A ; 0, I N ), is shown in Figure 7D.
The offset parameter γ off shifts the system between the two extreme cases represented by Models 1 and 2. If the offset parameter is small, then the occurrence of the continuity illusions relies on the hysteresis effect (the fact that I N can preserve the saddle point and active equilibrium). In this case, the continuity threshold C(I T ) changes slowly with I T (and, in fact, can be constant over a range of I T values). In the extreme case of no offset response, the continuity threshold would be constant with I T as was the case for Model 1 (see Figure 4D). If the offset parameter is large, then the offset response makes a larger contribution to whether the firing persists during the gap. The continuity threshold varies more with I T as γ off increases. These effects of γ off on continuity threshold are shown in Figure 7E.

DISCUSSION
The continuity illusion is an intriguing example of the capacity for the brain to "fill in" missing information. In this case, the "filling in" process creates an illusion of a continuous tone that is, in fact, discontinuous. In the context of hearing in a complex listening environment, a bias toward linking related sounds across time to create longer-lasting auditory objects can be useful when contending with multiple interrupting or distracting sounds that could momentarily obscure a sound of interest. Our contribution has been to use principles of neural dynamics and to draw on previous conceptual frameworks and experimental studies of the continuity illusion to identify dynamical mechanisms that can account for the continuity illusion.
We used a firing rate model of population-level neural activity to explore possible routes toward continuity illusionlike dynamics. A first requirement was that recurrent excitation be strong enough to create a stable equilibrium at a high firing rate level (Figure 3). We stereotyped the inputs to the population as sustained and transient. This setup was based on physiological evidence that neurons in auditory cortex with these response types are possible neural correlates for the continuity illusion (Petkov et al., 2007). We showed, using Model 3, that a firing-rate model can be constructed to require both input types to implement the continuity illusion.
Although the model could also be configured so that sustained inputs alone (Model 1) or transient inputs alone (Model 2) produce dynamics consistent with the continuity illusion, there are shortcomings in both cases that were remedied in Model 3. In the case of sustained inputs alone, continuity dynamics require a hysteresis effect: input levels that are too weak to activate the population from rest can, nevertheless, sustain activity in an already active population. The tone input makes no contribution to the response dynamics during the noise-filled interruption between the tones. As a result, the continuity threshold is constant with tone level (Figure 4D). This is inconsistent with evidence that the probability of perceiving the continuity illusion increases as the noise level becomes louder relative to the tone (Riecke et al., 2008). In the case of transient inputs alone, continuity dynamics require a population that is bistable in the absence of any inputs. If additional mechanisms were not at work (synaptic adaption in the recurrent excitatory connections, for instance), a bistable population could remain in a high firing rate state for a long period of time even when no stimulus is present. An additional limitation of this model is the requirement for symmetric onset and offset response dynamics. If deactivation requires a stronger input than activation (i.e., if the saddle point is at some x < 1/2), then there will exist a range of input strengths for which a tone that activates the population will not return the population to rest at the end of the tone. For the same reason, if activation requires a stronger input than deactivation (i.e., if the saddle point is at some x > 1/2), then in the interruptedtone case a tone that activates the population may be too weak to reactivate the population following the noise-filled gap. While this type of response dynamic could be consistent with forward masking (Moore and Glasberg, 1983), it is inconsistent with the stimuli used to test for the continuity illusion (both tones, before and after the noise-filled gap, are typically audible). Interestingly, the model configurations revealed that continuity thresholds can depend on tone level (I T ) in distinct ways. As I T increased, continuity thresholds increased less for models in which sustained inputs dominated (Model 1) and more for models in which transient inputs dominated (Model 2), see also Figure 7E showing similar results for different offset strengths γ off in Model 3. This outcome of the model could be examined with experiments that assess listeners' perception of the continuity illusion across a range of tone levels using inputs that vary the salience of acoustic edges (e.g., tones that ramp down before the noise-filled interruption; Bregman and Dannenbring, 1977).
Following Petkov et al. (2007), we view the different input populations (sustained, onset, and offset) as possible neural correlates of two aspects of the descriptive theory of the continuity illusion proposed by Bregman (Bregman, 1990). First, sustained responses convey "evidence" that a tone is ongoing. If some sustained neural activity persists during the interruption between two tones, then the tone may be perceived as continuous. This is Bregman's "Sufficiency of Evidence rule." In our model, this is implemented by assuming that noise (as a broadband signal) drives a partial excitatory input to the population whose activity signals the perception of the tone. Second, transient responses at tone onsets and offsets mark "discontinuities" in the tone (acoustic edges). If the noise during the gap between tones obscures this discontinuity, then tone may be perceived as continuous. This is Bregman's "No Discontinuity rule." In our model, this is implemented by assuming noise can reduce the amplitude of the transient response. An additional effect of noise (in Model 3) is to increase the threshold for activation by a transient input (effected by altering the separatrix, see I N term in Equation 23).
By analyzing how these input types drive firing rate activity, we identified two dynamical mechanisms and circuit properties that can support the continuity illusion: 1) a hysteresis effect enabled the noise to convey "sufficient evidence" to sustain firing activity in the gap between tones, 2) an offset response that signaled "discontinuity" in an acoustic signal could be suppressed by noise to confined the system to the basin of attraction of the high firing rate stable equilibrium. These dynamical mechanisms (hysteresis and bistability) required sufficiently strong recurrent excitation, specifically the requirement that a E > 4 for S-shaped firing rate equilibrium curves (Figure 3).
In addition to providing a dynamical framework to accompany Bregman's theory, our approach offers a new perspective on previous models of the continuity illusion. The work of Noto et al. (2016) relied on nonlinear, self-excitation of neural populations which is consistent with our approach for creating hysteresis and bistable dynamics. The work of Vinnik et al. (2010) utilized dynamic synapses to create continuous responses to interrupted tones. Since dynamic synapses are an additional mechanism for creating hysteresis and bistable dynamics (Barak and Tsodyks, 2007), there is also consistency between their work and ours. An extra feature of our study, not considered in these previous models of the continuity illusion, is that we also required that our simulated dynamics exhibit masking for noise and tones presented simultaneously. Incorporating this is essential, in our view, to avoid "trivial" solutions in which noise acts simply (and solely) as an excitatory input that facilitates neural activity during the noise-filled gap between tones. Balancing these excitatory and inhibitory effects of noise required fine-tuning in Model 1 (Figure 4E).
Although the firing rate framework is a caricatured description of neural activity, insights into a number of sensory illusions have been made using similar approaches including for visual bistability (Laing and Chow, 2002;Shpiro et al., 2007Shpiro et al., , 2009 and auditory streaming (Rankin et al., 2015(Rankin et al., , 2017Paredes-Gallardo et al., 2019). In our study, it has been useful as a minimal model that identifies mechanisms that can account for the continuity illusion. We focused on a classic version of the continuity illusion (tone interrupted by noise), but the illusion can be elicited by a variety of sound types (Bregman, 1990;Warren, 2008;McWalter and McDermott, 2019). The facts that the continuity illusion is widespread and that the dynamics of the illusion can be explained with a relatively simple model may indicate that perceptual continuity of interrupted sounds is constructed at more than one stage of auditory processing. Indeed, while we were motivated by observations in auditory cortex (Petkov et al., 2007), the input motifs we used in our model (sustained, onset, and offset responders) are present in other auditory nuclei, and there is evidence for involvement of the brainstem in the continuity illusion in human listeners (Bidelman and Patro, 2016). We suggest future work could focus on the inferior colliculus (IC) as a possible origin of the continuity illusion. The IC is a midbrain structure that receives inputs from numerous brainstem regions including excitatory projections from the ventral cochlear nucleus (a site of sustained and onset neurons; Pfeiffer, 1966;Rhode and Smith, 1986) and inhibitory projections from the superior paraolivary nucleus (a site of offset neurons; Oliver, 2005;Cant and Oliver, 2018;Kopp-Scheinpflug et al., 2018). A critical component of our model was recurrent excitation that creates persistent activity. The IC, accordingly, has numerous, local connections (Ito and Malmierca, 2018) that may prolong the duration of post-synaptic potentials  and increase firing rate responses to high sound intensities . As work continues to identify neural correlates of the continuity illusion, idealized dynamical systems descriptions can inform what features should be included in future, more biophysically-detailed modeling approaches.

DATA AVAILABILITY STATEMENT
Matlab code for the firing rate model and to generate all figures in this study can be found in the github repository at https://github. com/jhgoldwyn/ContinuityIllusion.