^{1}Center for Behavioral Brain Sciences, Magdeburg, Germany^{2}Department of Cognitive Biology, Otto-von-Guericke Universität, Magdeburg, Germany^{3}Centre de Recerca Matemàtica, UAB Science Faculty, Barcelona, Spain^{4}Department de Matemàtica Aplicada I, Universitat Politècnica de Catalunya, Barcelona, Spain^{5}Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain

We report that multi-stable perception operates in a consistent, dynamical regime, balancing the conflicting goals of stability and sensitivity. When a multi-stable visual display is viewed continuously, its phenomenal appearance reverses spontaneously at irregular intervals. We characterized the perceptual dynamics of individual observers in terms of four statistical measures: the distribution of dominance times (mean and variance) and the novel, subtle dependence on prior history (correlation and time-constant). The dynamics of multi-stable perception is known to reflect several stabilizing and destabilizing factors. Phenomenologically, its main aspects are captured by a simplistic computational model with competition, adaptation, and noise. We identified small parameter volumes (~3% of the possible volume) in which the model reproduced both dominance distribution and history-dependence of each observer. For 21 of 24 data sets, the identified volumes clustered tightly (~15% of the possible volume), revealing a consistent “operating regime” of multi-stable perception. The “operating regime” turned out to be marginally stable or, equivalently, near the brink of an oscillatory instability. The chance probability of the observed clustering was <0.02. To understand the functional significance of this empirical “operating regime,” we compared it to the theoretical “sweet spot” of the model. We computed this “sweet spot” as the intersection of the parameter volumes in which the model produced stable perceptual outcomes and in which it was sensitive to input modulations. Remarkably, the empirical “operating regime” proved to be largely coextensive with the theoretical “sweet spot.” This demonstrated that perceptual dynamics was not merely consistent but also functionally optimized (in that it balances stability with sensitivity). Our results imply that multi-stable perception is not a laboratory curiosity, but reflects a functional optimization of perceptual dynamics for visual inference.

## Introduction

The visual system extrapolates beyond the retinal evidence on the basis of prior experience of the visual world (Kersten et al., 2004; Hohwy et al., 2008; Friston et al., 2012). The inferential nature of vision becomes evident when prior experience shapes visual appearance (Weiss et al., 2002; Yang and Purves, 2003; Gerardin et al., 2010), in visual illusions (von Helmholtz, 1866; Bach and Poloschek, 2006; Gregory, 2009), and in visual hallucinations of certain patient populations (Ffytche et al., 2009).

The temporal dynamics of visual inferences is revealed in the phenomenon of multi-stable visual perception (von Helmholtz, 1866; Leopold and Logothetis, 1999; Blake and Logothetis, 2002; Sterzer et al., 2009). When certain ambiguous visual displays are viewed continuously, their appearance changes spontaneously from time to time. For example, some planar motion flows induce an illusory appearance of a volume moving in depth, which occasionally reverses its direction (“kinetic depth”) (Wallach and O'Connell, 1953; Sperling and Dosher, 1994). Implausible visual patterns not encountered in the natural environment induce particularly striking, multi-stable illusions. To reconcile such patterns with prior experience, even strong retinal inputs are intermittently removed from awareness, resulting in “monocular” or “binocular rivalry” (Campbell and Howell, 1972; Leopold and Logothetis, 1999; Bonneh et al., 2001; Blake and Logothetis, 2002).

Multi-stable visual perception engages a distributed network of occipital, parietal, and frontal cortical areas (Tong et al., 2006; Sterzer et al., 2009). The collective dynamics of this network reflects several stabilizing and destabilizing factors (Kohler and Wallach, 1944; Lehky, 1988; Blake et aal., 2003; Lee et al., 2007). Firstly, competition between alternative appearances stabilizes whichever appearance dominates at the time (Blake et al., 1990; Alais et al., 2010). This competition seems to be mediated by inhibitory interactions operating locally within visual representations (Lee et al., 2007; Donner et al., 2008; Maier et al., 2008). Secondly, neural adaptation of visual representations progressively weakens the dominant appearance, limiting its temporal persistence (Wolfe, 1984; Nawrot and Blake, 1989; Petersik, 2002; Blake et aal., 2003; Kang and Blake, 2010). Thirdly, neural noise initiates transitions between alternative appearances at irregular intervals (Hollins, 1980; Brascamp et al., 2006; Kim et al., 2006; Hesselmann et al., 2008; Sterzer and Rees, 2008; Sadaghiani et al., 2010; Pastukhov and Braun, 2011). Finally, volitional processes, such as attention shifts and eye movements, may also destabilize multi-stable appearance (Leopold et al., 2002; Mitchell et al., 2004; van Dam and van Ee, 2006; Zhang et al., 2011).

The interplay of stabilizing and destabilizing factors in multi-stable perception can be captured by simplistic computational models (Laing and Chow, 2002; Moldakarimov et al., 2005; Moreno-Bote et al., 2007; Noest et al., 2007; Shpiro et al., 2007; Curtu et al., 2008; Shpiro et al., 2009), at least under certain stimulus conditions (*viz*. symmetric inputs). More elaborate models are needed to reproduce multi-stable dynamics under more general conditions (Moreno-Bote et al., 2007; Wilson, 2007; Gigante et al., 2009; Seely and Chow, 2011). Here, we show that experimental observations by individual observers in particular displays tightly constrain the dynamical balance of stabilizing and destabilizing factors in multi-stable perception. Because perceptual dynamics is notoriously diverse across observers and displays (Fox and Herrmann, 1967; Borsellino et al., 1972; Walker, 1975), we expected to obtain widely disparate results. Astonishingly, we found that almost all observers operated in a narrow dynamical regime (i.e., with a particular balance of stabilizing and destabilizing factors). In addition, this “operating regime” turned out to be functionally optimal in that it balances perceptual stability and sensitivity. Our observations imply that the temporal dynamics of visual inference is functionally optimized.

## Materials and Methods

### Observers

Fifteen observers (nine female, six male, including author Alexander Pastukhov) with normal or corrected-to-normal vision participated in three experiments [kinetic-depth (KD), binocular rivalry (BR) and Necker cube (NC)]. Because some observers performed multiple experiments, we obtained 24 data sets in total. The data sets from KD and BR displays were used previously to introduce the “cumulative history” measure (Pastukhov and Braun, 2011). Apart from Alexander Pastukhov, all observers were naïve to the purpose of the experiment and were paid to participate. Procedures were approved by the medical ethics board of the Otto-von-Guericke Universität, Magdeburg: “Ethikkomission der Otto-von-Guericke-Universität an der Medizinischen Fakultät.”

### Apparatus

Stimuli were generated online and displayed on a 19” CRT screen (Vision Master Pro 454, Iiyama, Nagano, Japan), with a spatial resolution of 1600 × 1200 pixels and a refresh rate of 100 Hz. The viewing distance was 95 cm, so that each pixel subtended approximately 0.011°. Background luminance was 26 cd/m^{2}. Anaglyph glasses (red/cyan) were used for the dichoptic presentation.

### Multi-Stable Displays

The KD effect stimulus (Figure 1A) consisted of an orthographic projection of 300 dots distributed on a sphere surface (radius 3°). Each dot was a circular patch with a Gaussian luminance profile (σ = 0.057°) and a maximal luminance of 63 cd/m^{2}. The sphere was centered at fixation and rotated around the vertical axis with a period of 4 s. As front and rear surface are not distinguished, the orthographic projection was perfectly ambiguous and consistent with either a clockwise or a counter-clockwise rotation around the axis. Observers perceive a three-dimensional sphere, which reverses its direction of rotation from time to time.

**Figure 1. Experimental displays and statistical measures of multi-stable dynamics. (A)** Kinetic depth (KD) display—viewing planar motion, observers perceive a volumetric rotation in either of two directions. **(B)** Binocular rivalry (BR) display—viewing different patterns with each eye (through red-green glasses), observers typically perceive either pattern. **(C)** Necker Cube (NC) display—viewing a line drawing, observers perceive one of two solid cubes. **(D)** Spontaneous perceptual dynamics varies widely between observers. Four statistical measures (mean and standard errors)—dominance duration *T*_{dom}, coefficient of variation *c*_{V} of dominance duration, coefficient of correlation *c*_{H} with dominance history, time-constant τ_{H} of dominance history (green: 8 observers KD; red: 11 observers BR; blue: 5 observers NC). Different symbols are used for the three exceptional observers jn, lf, and np (pale symbols, see text).

The BR stimulus (Figure 1B) consisted of two gratings presented dichoptically at fixation (radius, 0.9°; spatial frequency 2 cycles/degree). One grating was tilted leftward by 45° and the other rightward by 45°. The right-eye grating (green, visible only through the green filter) grating was kept at 50% contrast, while the contrast of the left-eye grating (red, visible only through the red filter) was adjusted for each subject to balance perceptual strengths. BR gives rise to several alternative perceptual states: two uniform percepts of either the left- or right-eye grating as well as different kinds of transitional percepts. Transitional percepts may be “fused” (i.e., both gratings are perceived) and/or “fragmented” (i.e., parts of both gratings are perceived in different image regions).

The NC stimulus (NC, Figure 1C) consisted of a line drawing of a 3D cube (size 3°). Observers perceive a 3D cube, which reverses its depth from time to time.

### Experimental Procedure

Observers viewed the display continuously and reported the presence and identity of a clear and uniform percept. Observers pressed either the (←) key [for left rotation, left-eye (red) grating, up-and-left looking cube], or the (→) key [for right rotation, right-eye (green) grating or down-and-right looking cube], or (↓) key (for mixed or patchy percepts). Each presentation lasted for 5 min, separated by a compulsory break of (at least) 1 min. Consistent with previous reports (Lehky, 1995; Mamassian and Goutcher, 2005) reversal rates slowed during the initial part of the block, so that only the last 4 min (minus the final, incomplete dominance period) of each presentation were analyzed. Total observation time was 60 min (12 blocks) per observer for KD, 90 min (18 blocks) per observer for BR stimulus and 50 min (10 blocks) per observer for NC. Average number of clear percepts per block was 36 for KD, 110 for BR, and 45 for NC.

### Observables

The perceptual dynamics was characterized in terms of four statistical measures (see Figure 1D and Table 1), each of which varied widely between observers and displays. In addition, the distribution of dominance times was established in the form of a histogram.

#### Dominance distribution

From a sequence of dominance periods *T*_{i} (*i* = 1,…, *N*), we computed the mean dominance time *T*_{dom} and the coefficient of variation *C*_{v} as

As is typical for multi-stable percepts (Fox and Herrmann, 1967; Borsellino et al., 1972; Walker, 1975), average dominance periods varied greatly between observers and stimuli (*T*_{dom} in Table 1). In addition, dominance periods were highly variable (*C*_{v} in Table 1). However, the two alternative percepts dominated for comparable amounts of time (see Table 1). Patchy appearances of the BR display lasted for 1.05 ± 0.42 s.

To characterize the shape of the observed distributions of dominance times (either from human observers), we fitted the empirical distribution with a Gamma distribution with free parameters α (shape) and λ (rate)

an exponential distribution with free parameter λ (rate)

and a Gaussian distribution with free parameters μ (mean) and σ (variance)

Goodness of fit was assessed by means of KS tests. Human dominance distributions were fitted well by Gamma distributions (shape parameter α = 3.7 ± 0.7), but not by either exponential or normal distributions (Table 1), as expected from previous work (Levelt, 1967; Blake et al., 1971; Walker, 1975; Murata et al., 2003).

#### History-dependence

It is well known that successive dominance periods of the same percept tend to exhibit a marginally significant, negative correlation (van Ee, 2009; Kang and Blake, 2010), which is presumably due to neural adaptation. Recently, we have introduced a novel and more sensitive measure for this history-dependence, termed “cumulative history” (Pastukhov and Braun, 2011), which involves both a correlation coefficient, *c*_{H}, and a characteristic time-constant, τ_{H} (Table 1).

The analysis of “cumulative history” in reversal sequences is described in detail by Pastukhov and Braun (2011). Briefly, the observed record of dominance reports *S*_{x}(*t*) is convolved with a leaky integrator (Tuckwell, 2006) to compute hypothetical states *H*_{x}(*t*) of selective neural adaptation of percept *x*:

where *x* denotes a uniform percept, τ_{H} is a time-constant, and *H*_{x}(0) = 0. *S*_{x}(*t*) takes values of 1 for dominance, 0.5 for patchy dominance (BR only), and 0 for non-dominance. The cumulative history *H*_{x}(*t*) reflects both how long and how recently a given percept has dominated in the past. In the absence of “patchy” appearances, the cumulative histories of two competing percepts *x* and *y* sum to unity (*H*_{x} + *H*_{y} = 1).

For suitable values of τ_{H}, the cumulative history *H*(*t*) at a reversal time *t* correlates significantly with the subsequent dominance period *T*_{i}. Specifically, if *t*_{i} marks the beginning of dominance period *T*^{i}_{x}, we computed linear correlations between *H*_{x}(*t*_{i}) and ln(*T*^{i}_{x}) for all four possible combinations of history and percept (*H*_{x} × *T*_{x}, *H*_{x} × *T*_{y}, *H*_{y} × *T*_{y}, and *H*_{y} × *T*_{x}). The average absolute correlation was obtained for values of τ_{H} ranging from 0.01 to 60 s, in order to determine the maximal correlation coefficient *c*_{H} and its associated value of τ_{H} (Figure 2A).

**Figure 2. Analysis of cumulative history in terms of c_{H} and τ_{H}.** As described in “Materials and Methods,” correlations between cumulative history values

*H*(

*t*

_{i}) at reversal times

*t*

_{i}and subsequent dominance periods

*T*

_{i}were computed for different values of τ

_{H}, in order to determine the maximal value of

*c*

_{H}and its associated value of τ

_{H}.

**(A)**Correlation results for all displays and observers,

*c*

_{H}as a function of τ

_{H}, where τ

_{H}is normalized to the average dominance period

*T*

_{dom}of each observer (γ

_{H}= τ

_{H}/

*T*

_{dom}). All data sets exhibit a significant maximum, which quantifies the subtle but significant history-dependence of dominance periods in terms of

*c*

_{H}and τ

_{H}.

**(B)**Analysis of shuffled reversal sequences: all dominance periods were drawn randomly and with replacement from the observed distribution of dominance periods. No significant correlations (indications of history-dependence) remain after shuffling. Panel

**(A)**is modified from Figure 3 of Pastukhov and Braun (2011).

To verify that the values of *c*_{H} and τ_{H} represented a true history-dependence (and not just the spectral characteristics of the data), we repeated the analysis with shuffled reversal sequences (dominance times drawn randomly with replacement from the observed distribution). No significant correlations *c*_{H} were observed in the shuffled data sets (Figure 2B).

### Computational Modeling

To generate a wide variety of dynamical regimes, we simplified the rate model of Laing and Chow (see Laing and Chow, 2002), which has been analyzed and extended by several other groups (Moreno-Bote et al., 2007; Shpiro et al., 2007; Curtu et al., 2008; Shpiro et al., 2009). Two neural populations represent competing percepts. Each population excites itself and inhibits the other population. In addition, each population is subject to adaptation in the form of a threshold elevation and to stochastic effects in the form of additive noise:

where *r*_{1, 2} is population activity, *a*_{1, 2} is adaptive state, *I*_{1, 2} = *I*_{0} is the strength of the (common) input to both populations, and *n*_{1, 2} is colored noise. The sigmoidal function *F*(*x*) is defined as

The parameters α and β control, respectively, the self-excitation and mutual inhibition of the two populations. In a sense, they represent the influence of prior experience. We set α = 0 because we were not interested in the regime of self-sustaining activity. The parameter ϕ_{a} sets the strength of neural adaptation and *I*_{1, 2} represents current retinal input. We typically set *I*_{1} = *I*_{2} = *I*_{0}. The parameters τ_{r} and τ_{a} are the characteristic time-constants of activity and adaptive state, respectively. Finally, additive noise *n*_{1, 2} is provided by two independent Ornstein–Uhlenbeck processes with variance σ_{n} and time-constant τ_{n}:

from two independent sources of Gaussian noise ξ_{1, 2} with

Thus, the signal-to-noise ratio of the retinal input is given by *I*_{1, 2}/σ_{n}. To predict perceptual dominance *S*_{x}(*t*), we assume a reversal to percept *x* whenever the associated activity *r*_{x} is 25% larger than the activity associated with the other percept.

### Model Parameters

The parameters τ_{r}, τ_{n}, and *k* remained fixed at τ_{r} = 10 ms, τ_{n} = 100 ms, and *k* = 0.1. The dynamical regime (stationary, oscillatory, or bistable) depends largely on three parameters, with *I*_{0} setting the general activity and overall stability of percepts, β the strength of mutual inhibition, and ϕ_{a} the strength of adaptation. This three-dimensional parameter space was explored in the limits of *I*_{0} ∈ [0, 2], β ∈ [0, 2], and ϕ_{a} ∈ [0, 1]. For every given triplet of *I*_{0}, β, and ϕ_{a} values, we additionally simulated all combinations of τ_{a} ∈ [1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 3.0, 4.0, 5.0, 6.5, 8.0] s and σ_{n} ∈ [0.01, 0.03, 0.05, …, 0.35]. The latter two parameters influence *T*_{dom} and *C*_{v}, but are inconsequential for the dynamical regime.

For convenience, all model parameters and associated value ranges are listed here: α = 0, β ∈ [0, 2], ϕ_{a} ∈ [0, 2], *I*_{1, 2} = *I*_{0} ∈ [0, 2], σ_{n} ∈ [0.01, 0.35], τ_{a} ∈ [1, 8] s, τ_{r} = 10 ms, τ_{n} = 100 ms, *k* = 0.1.

### Simulations

To generate multi-stable dynamics and to predict psychophysical observables, three simulations of 500 s each were performed for every combination of model parameters. If the value of any predicted observable varied too much (*C*_{v} > 0.5), five simulations of 3000 s were performed. The values of predicted observables were then compared with the empirical values of *T*_{dom}, *C*_{v}, τ_{H}, and *c*_{H} for each observer and display. If all four predictions fell within 25% of the empirical values, the corresponding combination of model parameters *I*_{0}, β, and ϕ_{a} was marked as a “match.” Typically, a match was obtained for σ_{n} ≈ 0.15.

### Frequency Resonance Simulations

To investigate frequency resonance, the two inputs were modulated in anti-phase with different periods *T*_{s}

and the distribution of dominance periods *P*_{res}(*T*) was determined for different values of *T*_{s} (Δ*I* = 0.2*I*_{0}). As shown in Figure 12A, this distribution exhibits resonance peaks at odd multiples of the half-period of modulation $\frac{{{T}}_{{s}}}{2}$. The most pronounced resonance typically occurs for *HP* = *T*_{s}/2 = *T*_{dom}.

To compare frequency resonance at different points in the three-dimensional parameter space *I*_{0} ∈ {0, 2}, β ∈ {0, 2}, and ϕ_{a} ∈ {0, 1}, two simulations of 4000 s were performed at each point with medium noise σ_{n} = 0.15 and τ_{a} = 1 s. One simulation established the unperturbed distribution of dominance periods *P*_{ref}(*T*) and the mean dominance time 〈*T*_{dom}〉. In the other simulation, inputs *I*_{1, 2} were modulated in anti-phase at the best resonance frequency *T*_{s} = 2〈*T*_{dom}〉 and the distorted distribution of dominance periods *P*_{res}(*T*) was established.

The resonance coefficient *P*_{1} was then computed as

where *HP* = *T*_{s}/2.

Finally, to localize the bifurcation surfaces, simulations of 600 s were performed throughout the three-dimensional parameter space in the absence of noise (σ_{n} = 0, τ_{a} = 1 s). Starting from an asymmetric initial condition (*r*_{1,2} = *a*_{1,2} = [0, 1]), we determined whether activities migrated to identical steady-state values *r*_{1} = *r*_{2} = *a* (stationary regime), periodically reversed in rank order to exhibit values with *r*_{1} < *r*_{2} (oscillatory regime), or migrated to steady-state values with the same rank order *r*_{1} > *r*_{2} (bistable regime).

### Simulation Equipment

Simulations were performed on a Linux cluster (Suse Linux Enterprise Server 10, Matlab R2007a, C++ compiler gcc 20070115) with five nodes (each with four processors Intel(R) Xeon(R) CPU E5430 @ 2.66 GHz and 8 GB RAM).

## Results

We studied three canonical multi-stable displays (Figures 1A–C and Video S1): KD in a two-dimensional projection of a rotating cloud of dots (Wallach and O'Connell, 1953), BR between two gratings of different color and orientation (Wheatstone, 1838; Meng and Tong, 2004), and the NC (Necker, 1832). Observers viewed each display continuously for 5 min and reported its appearance either as rotating in depth “front left” or “front right” (KD), or as “uniformly red,” “uniformly green,” or “patchy” (BR), or as the marked corner pointing to “front” or “back” (NC display).

### Dominance Distribution and History-Dependence

For each observer and display, we characterized perceptual dynamics in terms of several statistical measures (Figure 1D and Table 1). The distribution of dominance times was binned into a histogram and summarized in terms of mean dominance duration, *T*_{dom}, and coefficients of variation, *C*_{v}. Both dominance durations (1–22 s) and coefficients of variation (0.2–1.1) varied widely between observers and displays, as is typical for multi-stable percepts (Fox and Herrmann, 1967; Borsellino et al., 1972; Walker, 1975). Also as expected (Levelt, 1967; Blake et al., 1971; Walker, 1975; Murata et al., 2003), the distributions of dominance times resembled Gamma functions with a comparatively narrow range of shape parameters α (3.7 ± 0.6). Specifically, the empirical distributions were consistently fit better by a Gamma distribution (KS-test *p* = 0.7 ± 0.06), than by either an exponential distribution (*p* = 0.03 ± 0.02) or a Gaussian distribution (*p* = 0.09 ± 0.03).

In addition, we captured the subtle history-dependence of dominance times in terms of a correlation coefficient, *c*_{H}, and a characteristic time-constant, τ_{H} (Figures 1D, 2). Due to the destabilizing effect of neural adaptation, successive periods dominated by the same appearance often exhibit a marginally significant, negative correlation (van Ee, 2009; Kang and Blake, 2010; Pastukhov and Braun, 2011). Recently, we have introduced a more sensitive, integral measure, dubbed “cumulative history,” of how long and how recently a given percept has dominated in the past (Hudak et al., 2011; Pastukhov and Braun, 2011). This measure reveals that individual dominance periods are consistently and significantly influenced by prior perceptual history (see “Materials and Methods” and Figure 2). For different observers and displays, the values of *c*_{H} ranged from 0.1 to 0.4 and the values of τ_{H} from 0.6 to 10 s, quantifying the history-dependence in each case (Table 1). Our use of this “cumulative history” measure constitutes an important difference to earlier work (Shpiro et al., 2009).

### Dynamical Regimes of LC-Model

Next, we compared our perceptual observations to a class of generative models for multi-stable dynamics. We chose the model formulated by Laing and Chow (2002) and investigated by several other groups (Moldakarimov et al., 2005; Moreno-Bote et al., 2007; Noest et al., 2007; Shpiro et al., 2007; Curtu et al., 2008; Shpiro et al., 2009), which strikes a dynamical balance between competition β, adaptation ϕ_{a}, and input strength *I*_{0} (Figure 3). Depending on this balance, the “LC-model” is able to generate sequences of perceptual reversals with a wide range of dominance distributions and history-dependencies. Note that all models incorporating adaptation, such as (Laing and Chow, 2002; Moldakarimov et al., 2005; Moreno-Bote et al., 2007; Noest et al., 2007; Shpiro et al., 2007; Curtu et al., 2008; Shpiro et al., 2009), necessarily predict a degree of history-dependence.

**Figure 3. Bifurcation analysis of a class of generative models. (A)** Generative models (schematic) for multi-stable dynamics with two neural populations (after Laing and Chow, 2002). Population activities *r*_{1,2}, strength of cross-inhibition β, visual input *I*_{1,2} = *I*_{0}, strength of neural adaptation ϕ_{a}, time-constant τ_{a} of neural adaptation, independent neural noise ξ_{n}. Dynamical regimes depend largely on only three parameters: β, ϕ_{a}, and *I*_{0}. **(B)** Bistable region (red volume and red lines on bifurcation diagrams EFG), see also Figure 4A. Without neural noise, activities *r*_{1,2} approach one of two steady-states with disparate activity levels (one high, one low). With noise, transitions between the two steady-states occur at irregular intervals. **(C)** Oscillatory regime (blue volume and blue lines on bifurcation diagrams EFG), see also Figure 4B. Without noise, activities *r*_{1,2} oscillate in counter-phase between low and high levels. Neural noise renders the alternation more irregular. **(D)** Stationary regime (green and green lines on bifurcation diagrams EFG). Activities *r*_{1,2} approach a single steady-state, with or without noise. **(E–G)** Bifurcation analysis of parameters ϕ_{a}, *I*_{0}, and β. **(E)** Dependence on ϕ_{a}, revealing bistable, oscillatory, and stationary regimes (β = 1.75, *I*_{0} = 0.5). Hopf bifurcations are marked ϕ_{hb} and ϕ_{HB}. **(F)** Dependence on *I*_{0}, showing a central bistable regime flanked by oscillatory and stationary regimes on either side (β = 1.75, ϕ_{a} = 0.25). **(G)** Dependence on β, showing bistable, oscillatory, and stationary regimes (ϕ_{a} = 0.25, *I*_{0} = 0.5).

Whereas the LC-model generates a continuum of possible dynamics, one may technically distinguish two regimes: a *bistable* or fluctuation-driven regime in which adaptation ϕ_{a} is weak [ϕ_{a} < ϕ^{hb}_{a}(β, *I*_{0})] and dominance periods are terminated by noise (Figure 3B), and an *oscillatory* or limit-cycle regime in which adaptation ϕ_{a} is strong enough [ϕ_{a} > ϕ^{hb}_{a}(β, *I*_{0})] to terminate each dominance period on its own (Figure 3C). The *stationary* regime of the model does not generate reversals and is not relevant here (Figure 3D).

Both the *bistable* and the *oscillatory* regimes of this model generate multi-stable dynamics, but with important differences in detail (Figure 4). A typical *bistable* dynamics is dominated by noise, resulting in irregular trajectories through state space, aperiodic dominance reversals, and an approximately exponential distribution of dominance times (Figure 4A). In marked contrast, a typical *oscillatory* dynamics is dominated by adaptation, with state-space trajectories describing a stereotypical limit-cycle, periodic dominance reversals, and an approximately Gaussian distribution of dominance times (Figure 4B).

**Figure 4. Bistable, oscillatory, and intermediate dynamics. (A)** Bistable dynamics obtained deeply within the bistable regime (far left, cf. Figure 3B). Driven largely by noise, it is characterized by irregular trajectories in state space (middle left), aperiodic dominance reversals (middle right), and an approximately exponential distribution of dominance times (far right). **(B)** Oscillatory dynamics obtained deeply within the oscillatory regime (far left, cf. Figure 3C). Driven largely by adaptation, it is characterized by regular trajectories in state space (middle left), periodic dominance reversals (middle right), and an approximately Gaussian distribution of dominance times (far right). **(C)** The multi-stable dynamics of human observers falls between these two extremes: it exhibits irregular trajectories (middle left), aperiodic reversals (middle right), and a Gamma-like distribution of dominance times (far right). With suitable levels of noise, a large parameter volume (far left) can result in realistic (human-like) distributions of dominance times (see text for details).

The perceptual dynamics of human observers tends to fall between these two extremes. Typically, human dominance periods exhibit a Gamma distribution with shape factor α between 3 and 4 (Murata et al., 2003), a distribution shape that is intermediate between exponential and Gaussian distributions Figure 4C). On this basis, it has been suggested that the operating regime of human multi-stable perception may lie near the boundary between *bistable* and *oscillatory* regimes (Shpiro et al., 2009).

### Realistic Dominance Distribution

We will now show that the distribution shape of dominance periods does not usefully constrain the dynamical regime of multi-stable perception. In essence, this is because the LC-model is highly redundant in the sense that many combinations of parameters generate equally realistic (Gamma-like) distribution shapes. To establish this point, we carried out extensive simulations, independently varying competition β ∈ [0, 2], adaptation ϕ_{a} ∈ [0, 1], input strength *I*_{0} ∈ [0, 2], noise amplitude σ_{n} ∈ [0.01, 0.35], and adaptation time-scale τ_{a} ∈ [1 s, 8 s]). For each parameter combination (β, ϕ_{a}, *I*_{0}, σ_{n}, τ_{a}), we generated reversal sequences and established the best-fitting Gamma, exponential, and Gaussian functions for the resulting distribution of dominance times.

The dominance distribution generated by a parameter combination (β, ϕ_a, *I*_{0}, σ_{n}, τ_{a}) was classified as realistic or human-like, if it was well fit by a Gamma distribution with shape parameter α ∈ [3.1, 4.3] (KS-test *p* > 0.7) and less well by either exponential and Gaussian distributions. The parameter volume in which the LC-model generated human-like distributions of dominance times is shown in Figure 4C (far left). Note that the illustration shows only three of the five parameters. Only some, not all, choices of the two hidden parameters σ_{n}, τ_{a} resulted in realistic distributions. The depicted volume encompassed approximately 57% of the possible volume and was not restricted to the boundary between *bistable* and *oscillatory* regimes.

Accordingly, the distribution shape of dominance periods, taken by itself, does not usefully constrain the dynamical regime of multi-stable perception, as has been claimed (Shpiro et al., 2009). The reason for this discrepancy is that we explored a larger range of hidden parameters σ_{n}, τ_{a} than (Shpiro et al., 2009). Essentially, a realistic distribution shape can almost always be obtained if a suitable noise level σ_{n} and adaptation time-constant τ_{a} are chosen.

### Realistic Dominance Distribution and History-Dependence

Fortunately, a far more informative set of constraints becomes available when both the dominance distribution and the history-dependence of human observers are taken into account. Comparing simulated and human perceptual dynamics, parameter combinations (β, ϕ_{a}, *I*_{0}, σ_{n}, τ_{a}) were considered a “match” if their statistics (*T*_{dom}, *C*_{v}, *c*_{H}, τ_{H}) fell within 25% of the statistics of a particular observer/display combination. In this case, we refrained from comparing distribution shapes explicitly, as this would have complicated the interpretation of the results, but would not have further constrained the parameter volumes.

Astonishingly, the parameter combinations that matched almost all observers/displays clustered in a consistent “operating regime” of approximately 15% of the possible volume (Figure 5B): 8/8 observers of the KD display were matched by 10%, 8/11 observers of the BR display by 13%, and 5/5 observers of the NC display by 7% of the possible parameter volume. The individual results for all observers are presented in Figures 6–8. In most cases, a comparatively small and well-defined parameter volume reproduced all four statistical measures (*T*_{dom}, *C*_{v}, *c*_{H}, τ_{H}) (see Figure 5A for representative examples). On average, the matching volumes comprised 2.4 ± 1.1% (KD display), 4.5 ± 0.7% (BR display), and 2.9 ± 1.0% (NC display), of the possible parameter spaces (bistable and oscillatory regimes).

**Figure 5. Operating regime of multi-stable perception.** KD display (left), BR display (middle), and NC display (right). **(A)** Parameter volumes (green, red, blue) matching the perceptual dynamics of three representative human observers (*lp*, *kt*, and *ia*, respectively) in terms of both the distribution (*T*_{dom}, *C*_{v}) and the subtle history-dependence (*c*_{H}, τ_{H}) of dominance times. The depicted volumes fill approximately 6% of the possible volume and are here compared to the union of observers (transparent gray volumes). **(B)** Union of the matching volumes (green, red, blue) from 8, 8, and 5 observers, respectively. The matching volumes lie entirely within the bistable regime (transparent gray volumes) and fill approximately 15% of the possible volume.

**Figure 6. Parameter volumes matching the perceptual dynamics of individual observers for KD displays.** For each parameter triplet *I*_{0}, ϕ_{a}, and β, different combinations of noise level and adaptation time-constant were explored in the ranges σ_{n} ∈ [0.01, 0.35] and τ_{a} ∈ [1 ms, 8 ms]. A “match” was declared when the statistics of synthetic reversal sequences fell within 25% of the mean values of each of the four observables 〈*T*_{dom}〉, *C*_{v}, *c*_{H}, and τ_{H}. The color coding indicates the value of τ_{a} at which each parameter triplet *I*_{0}, ϕ_{a}, and β best matched observer dynamics. For each matching volume, three orthogonal projections on different planes are shown in gray. The green volume shown on the left of Figure 5B represents the union of the volumes illustrated here.

At this juncture, the reader may well wonder how these results depend on the 25% criterion used to define a “match” between simulated and human reversal statistics. In fact, the “envelope” of the matching volumes described above is largely independent of this criterion choice. If the parameter space (β, ϕ_{a}, *I*_{0}, σ_{n}, τ_{a}) is sampled at a sufficiently densely spaced points, any set of observed statistical measures (*T*_{dom}, *C*_{v}, *c*_{H}, τ_{H}) can be reproduced with arbitrary precision. In other words, the density of parameter sampling determines the precision with which observed statistical measures can be reproduced. The 25% criterion was chosen to obtain cohesive “matching” volumes, given the sampling grid of our simulations. For this criterion value, an observed statistics was typically reproduced by several adjacent grid locations. When a stricter criterion was used, an observed statistics tended to be reproduced only by isolated grid locations, resulting in non-cohesive or “patchy” matching volumes. In sum, the criterion choice merely affected the internal cohesiveness, but not the “envelope,” of the parameter volumes reproducing human reversal statistics.

Why should the four statistical measures (*T*_{dom}, *C*_{v}, *c*_{H}, τ_{H}) offer a more informative set of constraints than the shape of the dominance distribution alone? In the LC-model, distribution shape (*T*_{dom}, *C*_{v}, and higher moments) is determined by the relative strength of adaptation and noise. Accordingly, many parameter combinations produce realistic distribution shapes, provided a suitable level of noise is chosen in each case. History-dependence (*c*_{H}, τ_{H}), on the other hand, is less sensitive to the level of noise and therefore more informative about the absolute strength of adaptation. Thus, distribution shape and history-dependence provide largely independent constraints. That this is indeed the case was evident from the disparate parameter volumes which reproduce different sets of constraints: whereas comparatively small volumes (3.3 ± 1.6% of the possible volume) reproduced both dominance distribution (*T*_{dom}, *C*_{v}) and history-dependence (*c*_{H}, τ_{H}) of individual observers/displays, far larger volumes reproduced either one of these constraints (29 ± 15% for *T*_{dom}, *C*_{v} and 44 ± 7% for *c*_{H}, τ_{H}).

### A Consistent Human “Operating Regime”

Overall, the multi-stable dynamics of 21/24 data sets was matched by a consistent “operating regime,” lying entirely within the *bistable* domain of the model and comprising approximately 15% of the possible volume (Figure 5B). The results from individual observers are detailed in Figure 6 (KD displays), Figure 7 (BR displays), and Figure 8 (NC displays). Only three observers of the BR display (*jn*, *lf*, *np*) exhibited an exceptional dynamics in that their brief dominance times *T*_{dom} and strong history-dependence *c*_{H} were matched not only in the *bistable* but also in the *oscillatory* regime of the LC-model (Figure 7).

**Figure 7. Parameter volumes matching the perceptual dynamics of individual observers for BR displays (see Figure 6 for details).** The color coding indicates the value of τ_{a} at which each parameter triplet *I*_{0}, ϕ_{a}, and β best matched observer dynamics. For exceptional observers (jn, lf, and np) parameter volumes lie partially outside the stable and sensitive volume. For each matching volume, three orthogonal projections on different planes are shown in gray. The red volume shown in the middle of Figure 5B represents the union of the volumes illustrated here.

**Figure 8. Parameter volumes matching the perceptual dynamics of individual observers for NC displays (see Figure 6 for details).** The color coding indicates the value of τ_{a} at which each parameter triplet *I*_{0}, ϕ_{a}, and β best matched observer dynamics. For each matching volume, three orthogonal projections on different planes are shown in gray. The blue volume shown on the right of Figure 5B represents the union of the volumes illustrated here.

We were astonished by this clustering, especially in view of the superficial diversity in the perceptual dynamics exhibited by different observers/displays (Figure 1D). To assess the likelihood of an accidental clustering, we shuffled the pairs of statistical measures (*T*_{dom}, *C*_{v}) and (*c*_{H}, τ_{H}), drawing observables randomly from the value pairs produced by real observers and recombining them to form “virtual” observers. In general, the matching volumes of these “virtual” observers were far more widely scattered (51% of the possible volume) than those of “real” observers. To quantify this further, we computed the centers of all matching volumes (mean parameter vectors) and the norms of the distances between all volume pairs. Whereas the average pair-distance was comparable for real and for “virtual” observers (2.0 ± 1.2 and 3.4 ± 3.8, respectively, Figure 9A), the group-mean for real observers was much smaller than the group-mean for equal numbers of “virtual observers” (Figure 9B), demonstrating that real observers clustered tightly in a consistent “operating regime.” The likelihood of obtaining by chance the clustering exhibited by real observers was not significant (*p* < 0.02).

**Figure 9. Clustering of matching regions in ( I_{0}, ϕ_{a}, β)-space. (A)** Distribution of center-to-center distances between the matching volumes of observer pairs (real and virtual). Vertical lines mark the distribution means.

**(B)**Distribution of the

*mean*of all center-to-center distances among groups of 21 virtual observers (computed over 10,000 randomly chosen sets). The vertical line (red) marks the value obtained for the 21 real observers/data sets. The likelihood that equal numbers of virtual observers cluster as tightly as real observers was <0.02.

### Shape and Location of “Operating Regime”

To examine the “operating regime” of human observers in more detail, we carried out additional simulations in several two-dimensional subspaces, three of which are shown in Figure 10 (ϕ_{a} = 0.25, *I*_{0} = 0.5, and β = 1.75). These detailed simulations revealed that, depending on the assumed level of noise, human observers operate in different shell-like volumes of the bistable regime, each of which follows the bifurcation surface at some distance. As the assumed noise level increased from low (σ_{n} ∈ [0.01, 0.11]) to middle (σ_{n} ∈ [0.13, 0.19]) to high (σ_{n} ∈ [0.21, 00.35]), the distance to the bifurcation surface increased. Thus, the perceptual dynamics of most observers was matched by a shell-shaped volume at the margins of the *bistable* regime or, equivalently, near but not at the brink of the *oscillatory* regime (see also Figure 11).

**Figure 10. Operating regimes of multi-stable perception for different levels of noise (planar subspaces).** The left inset relates the selected subspaces to the three-dimensional volumes of Figure 5. Several regions matching human observer dynamics with different displays and under different noise assumptions are illustrated. Specifically, the union of the matching regions of individual observers is outlined in a different color for each display (KD, BR, NC, see inset). Also marked are the bifurcation surface (black contour) and the functional “sweet spot” for medium noise (dotted black outline, see Figure 12C). Matching regions occupy different shell-like volumes, depending on the assumed level of noise (low, medium, or high). Distance to the bifurcation increases with noise. **(A)** Planar subspace ϕ_{a} = 0.25. **(B)** Planar subspace *I*_{0} = 0.55. **(C)** Planar subspace β = 1.71.

**Figure 11. Matching volumes depend on the assumed level of noise.** Union of matching volumes for all data sets from KD displays (top row), BR displays (middle row), and NC displays (bottom row). Assuming low noise (σ_{n} ∈ [0.01, 0.11]) displaced matching volumes to the margins of the bistable regime (left column), whereas an assumption of high noise (σ_{n} ∈[0.21, 00.35]) shifted matching volumes to the center of that regime (right column). Medium levels of noise (σ_{n} ∈ [0.13, 0.19]) produced the matching volumes shown in the middle column. The dependence of matching volumes on the assumed level of noise is also shown by the dashed contours in Figure 10.

### Shape and Location of Functional “Sweet Spot”

Is there a functional reason as to why multi-stable perception should operate in this particular regime? On the one hand, deep inside the *bistable* regime (strong β and weak ϕ_{a}), perception is particularly stable (dominance times are particularly long). On the other hand, at the bifurcation boundary between the *oscillatory* and *bistable* regimes (β and ϕ_{a} proportional), perception is particularly sensitive to differential input (small imbalances between *I*_{1} and *I*_{2}). Accordingly, any regime combining perceptual stability with perceptual sensitivity would constitute a functional “sweet spot.”

To locate this “sweet spot” in terms of the LC-model, we computed the parameter volume providing exceptional stability (dominance periods >1 s, Figure 12B) and intersected it with the volume providing exceptional sensitivity (Figure 12C). To quantify sensitivity, we established frequency resonance *under the assumption of medium noise* (σ_{n} = 0.15). Frequency resonance is a sensitive method for probing the “operating point” of a dynamical system and is well established for the multi-stable perception of human observers (Kim et al., 2006).

**Figure 12. Functional “sweet spot” combining perceptual stability and sensitivity. (A)** Frequency resonance driven by input modulation. Distribution of dominance times without modulation (far left) and for different modulations (red lines mark half-periods, from 0.25 to 2 Hz). A resonance peak is evident when the modulation half-period coincides with the peak of the unmodulated distribution. **(B)** Volume of maximal stability (orange, *T*_{dom} ≥ 1 s), compared to bistable regime (transparent gray). **(C)** Functional “sweet spot” combining maximal stability with maximal sensitivity to input fluctuations (cyan, frequency resonance measure *P*_{1} ≥ 1.2), compared to bistable regime (transparent gray). **(D–F)** Comparison of functional “sweet spot” (cyan) with regions matching perceptual dynamics of human observers for KD, BR, and NC displays (**D–F**, respectively).

Specifically, a periodic, anti-phase modulation of input strengths *I*_{1, 2} induces frequency resonance in the form of periodic reversals of dominance (Figure 12A). The input modulation moves the bifurcation boundary back and forth (with the movement range depending on modulation amplitude). Periodic reversals are triggered as soon as the boundary displacement reaches the “operating point” (i.e., the operative parameter combination) of the system under investigation. The system's sensitivity to input modulation may therefore be measured either in terms of modulation amplitude or, equivalently, in terms of the multiplicative increase of reversal probabilities around the resonance frequency (*P*_{1} measure, see “Materials and Methods”). The larger the *P*_{1}-measure, the less modulation amplitude is needed to trigger a perceptual reversal.

The functional “sweet spot” of the LC-model, which combines maximal stability and sensitivity (*T*_{dom} > 1 s and *P*_{1} > 1.2), is illustrated in Figure 12C. It formed a shell-shaped volume which followed the bifurcation surface at a distance and was restricted to small values of adaptation. Remarkably, the volumes matching observer dynamics were largely coextensive with this “sweet spot” (Figures 12D–F). A more detailed comparison was possible in the planar subspaces of Figure 10, which juxtaposed the regions matching observer dynamics for low, medium and high noise (colored contours) and the functional “sweet spot” for medium noise (dotted contours). Note that it was the perceptual operating regime for *medium noise* (not for low or high noise) which best matched the functional “sweet spot” for medium noise.

## Discussion

We have compared the dynamics of multi-stable perception with a class of generative models in order to assess the effective contributions of competition, neural adaptation, and neural noise. Astonishingly, we find that highly heterogeneous measurements from different observers and displays consistently constrain these models to the same narrow operating regime (21 of 24 data sets). Moreover, this operating regime falls in a particularly interesting region from the point of view of perceptual performance. Specifically, it falls in a shell-shaped volume at some distance from the bifurcation boundary, which uniquely combines stability of perceptual outcome with sensitivity to input modulations. This constitutes compelling evidence that the temporal dynamics of perceptual inference is functionally optimized.

### A Simplistic Hypothesis

We have tested the hypothesis that different multi-stable phenomena reflect a common mechanism, namely, tectonic shifts of neural activity arising spontaneously within an attractor neural network that may well be distributed across distant cortical areas (Braun and Mattia, 2010). Presumably, a multi-stable display stimulates recurrent neural networks with several distinct steady states of neural activity (“attractor states”), which embody the cumulative residue of prior visual experience. These steady states are not absolutely stable, but are continually destabilized by neural adaptation and by neural noise. The result is an irregular, saltatory dynamics in which stable episodes are punctuated by rapid transitions.

The essential part of this hypothesis is the existence of a balance between competition, neural adaptation, and neural noise. Its precise mathematical formulation [here, the Laing and Chow model (Laing and Chow, 2002)] is only of secondary importance. Accordingly, we would expect that quantitatively different formulations of the same stabilizing and destabilizing factors should lead to qualitatively similar results. Consistent with this expectation, Shpiro et al. (2009) have shown that the broad “operating regimes” defined by the dominance distribution generalize over different models. It remains to be seen whether the same is true for the narrower “operating regimes” reported here (defined by both dominance distribution and history-dependence of multi-stable perception).

The hypothesis advanced here is admittedly simplistic in that it neglects many important aspects of multi-stable perception, such as its dependence on input strength (Moreno-Bote et al., 2007; Wilson, 2007; Seely and Chow, 2011) or its persistence across gaps in stimulation (Leopold et al., 2002; Maier et al., 2003; Brascamp et al., 2008; Pastukhov and Braun, 2008). Moreover, in treating multi-stable perception as a stochastic dynamical system, it ignores volitional processes such as attention shifts or eye movements.

There are two ways to justify this omission. Firstly, there is compelling evidence that reversals in the appearance of multi-stable displays do occur spontaneously, requiring neither attention nor eye movements (Lee et al., 2007; Pastukhov and Braun, 2007), except perhaps in some special situations (Zhang et al., 2011). Secondly, it seems likely that attention shifts and eye movements are part and parcel of the spontaneous dynamics we are postulating here. Recent evidence that reversals engage attentional mechanisms in a feedforward manner (Knapen et al., 2011) is consistent with the latter possibility.

In the end, we feel that the astonishing success of this simplistic hypothesis speaks for itself, especially as it extends to multi-stable displays (NC) known to be particularly susceptible to voluntary control (Meng and Tong, 2004).

### A Hidden Consistency

Our main finding is that the seemingly heterogeneous perceptual dynamics, which different observers exhibit with different multi-stable displays, conceals a hidden consistency. It has often been noted that the variability of dominance times is stereotypical, whereas mean dominance times are not (Murata et al., 2003; Brascamp et al., 2005; van Ee, 2005). On this basis, previous studies have concluded that human observers exhibit a bistable dynamics (Moreno-Bote et al., 2007), or that they operate in the vicinity (on either side) of the bifurcation separating bistable and oscillatory regimes (Shpiro et al., 2009). In contrast to these earlier studies, we also took into consideration the weak (but significant) dependence of dominance times on prior perceptual history (Pastukhov and Braun, 2011). These additional constraints revealed a consistent and narrow operating regime of human observers.

If multi-stable dynamics is so consistent, why do mean dominance times vary so widely between displays and observers? Our findings suggest at least a partial answer: when a dynamical system operates near a bifurcation, its evolution over time is not dominated by a single mechanism and parameter, but by a mixture of mechanisms and a combination of parameters. Indeed, for any given value of the time-constant τ_{a} of adaptation, small perturbations in the other parameters of the Laing and Chow model (Laing and Chow, 2002) generate considerable variance in the dominance time *T*_{dom} and, independently, in the time-constant τ_{H} of cumulative history. As a consequence, the pair-wise correlations between τ_{a}, *T*_{dom} and τ_{H} are quite poor (Pastukhov and Braun, 2011).

### Near, Not at, the Brink

If our mechanistic hypothesis captures the essence of the situation, then visual perception operates in a marginally stable regime, near the brink of an oscillatory instability. According to the theory of dynamical systems, the Hopf bifurcation at the brink of an oscillatory instability constitutes a state of criticality (Camalet et al., 2000), in which signal processing is often found to be optimal in terms of sensitivity, dynamic range, or response latency. Several recent studies have shown that the dynamic range of the system response is enlarged (Kinouchi and Copelli, 2006), and the amount of information transferred increases (Beggs et al., 2003; Plenz and Thiagarajan, 2007; Shew et al., 2009), at the point of criticality. Indeed, operating at or near criticality may be a general principle of brain function (Bak, 1996).

The operating regime we have identified lies at some distance from the bifurcation boundary: it falls near, but not directly at, the brink of the oscillatory instability and is restricted to moderate strengths of adaptation. The functional advantage of such a *marginally stable* regime—in terms of relative stability of perceptual outcome and high sensitivity to input modulations (Figure 10)—may be understood as follows: Both dominance and response times are short at the bifurcation, but grow longer as the system enters more deeply into the bistable regime. A compromise—relatively long dominance and short response times—is reached at some distance to the bifurcation. When the input changes from being balanced (*I*_{1} = *I*_{2}) to being biased (*I*_{1} < *I*_{2}), the bifurcation border moves toward the bistable region. Accordingly, a system previously situated *near* the border may now find itself *at* the border and hence able to respond with a rapid reversal. In short, being *near*, but not directly *at*, the bifurcation affords both stability when the input is constant and sensitivity when the input changes.

### Stability vs. Sensitivity

If visual inference is based on attractor dynamics (Braun and Mattia, 2010; Rolls and Deco, 2010), a goal conflict between stability and sensitivity seems unavoidable. Presumably, a stable and compelling appearance of a visual scene recruits numerous associations at all levels of visual processing—edges, surfaces, objects, generic context, episodic context. In terms of attractor dynamics, reciprocal excitation between visual and memory activity would be expected to stabilize a particular pattern of activity (and, thus, a particular appearance). The downside to this stabilization would be reduced sensitivity to incremental changes in the visual input, for attractor dynamics would tend to counteract any change and to restore the activity pattern that conforms to the activated memories. Accordingly, if the system is to remain sensitive to incremental input changes, associative stabilization by memory traces must not go too far. A combination of neural noise and neural adaptation would seem to offer an appropriate strategy for balancing stability and sensitivity, as this would also ensure that alternative interpretations are exhaustively explored.

### Exploitation-Exploration Dilemma

The present findings have important implications for theories of perceptual inference (Kersten et al., 2004). Given an exhaustive store of prior information, the outcome of Bayesian inference is deterministic. However, if the store of prior knowledge must be acquired by reinforcement learning (i.e., by trial and error), an inferential system faces the “exploitation-exploration dilemma” (Sutton and Barto, 1998). One the one hand, it must *exploit* what it knows already by following successful precedents from the past. On the other hand, if it is to expand its knowledge, it must *explore* alternative possibilities that may prove more successful in the future. The dilemma is that neither strategy can be pursued to the exclusion of the other. At the mechanistic level, such an inferential system must balance prior experience against current input. Favoring the former foregoes *exploring* novel inferences and compromises the *sensitivity* of inference (as input details are ignored). Favoring the latter foregoes the *exploitation* of prior knowledge and impairs the *stability* of inference (as input details are unduly amplified). Several authors have formulated similar thoughts in connection with perceptual inference (Hoyer and Hyvärinen, 2003; Hohwy et al., 2008; Sundareswara and Schrater, 2008; Moreno-Bote et al., 2010, 2011).

### Exception or Rule?

Does marginal stability characterize only perfectly ambiguous, laboratory situations—such as the multi-stable displays investigated here—or does it apply also to real-world visual scenes? The answer hinges on whether the phenomenal appearance of real-world scenes is entirely stable, or whether it fluctuates in some way. Indeed, real-world objects evoke “contextual associations” such as, for example, episodic memories of prior personal experience, or generic knowledge about prototypical uses and locations (Bar, 2004, 2009b). The activation of such contextual associations is temporary and new associative possibilities are continuously being explored (Bar, 2009a). Contextual associations strongly color phenomenal appearance, presumably by activating perceptual representations in the manner of mental imagery (Moulton and Kosslyn, 2009). In certain impoverished visual displays—such as two-tone faces or Rorschach ink blots (Mooney et al., 1957)—this influence is particularly evident. Accordingly, we speculate that multi-stable phenomena form a continuum, ranging from perfectly ambiguous situations (such as the canonical multi-stable displays studied here), to partially ambiguous images with multiple readings of different plausibility (such as two-tone faces), to real-world images with a large number of subtly different associations.

### Final Thoughts

We propose a functional hypothesis as to why visual perception is marginally stable in general, and marginally multi-stable in ambiguous situations. Specifically, we propose that vision operates in a dynamical regime that uniquely combines stability and sensitivity, thus optimizing performance. At the mechanistic level, we speculate that this balance may be struck by attractor dynamics encompassing both visual and memory representations.

## Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## Acknowledgments

Alexander Pastukhov, Joachim Haenicke, and Jochen Braun: BMBF Bernstein Network, EU FP7-269459. Gustavo Deco: BFU2007-61710, Consolider Ingenio 2010, FP7 Brainsync, ITN Codde. Antoni Guillamon: MICINN/FEDER MTM2009-06973 and CUR-DIUE 2009SGR-859. Pedro E. García-Rodríguez: BFU2007-61710.

## Supplementary Material

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/Computational_Neuroscience/10.3389/fncom.2013.00017/abstract

## References

Alais, D., Cass, J., O'Shea, R. P., and Blake, R. (2010). Visual sensitivity underlying changes in visual consciousness. *Curr. Biol*. 20, 1362–1367.

Bach, M., and Poloschek, C. (2006). Visual neuroscience: optical illusions. *Adv. Clin. Neurosci. Rehabil*. 6, 20–21.

Bak, P. (1996). *How Nature Works: The Science of Self-Organized Criticality*. New York, NY: Copernicus Press.

Bar, M. (2009a). Predictions: a universal principle in the operation of the human brain. Introduction. *Philos. Trans. R. Soc. Lond. B Biol. Sci*. 364, 1181–1182.

Bar, M. (2009b). The proactive brain: memory for predictions. *Philos. Trans. R. Soc. Lond. B Biol. Sci*. 364, 1235–1243.

Beggs, J. M., and Plenz, D. (2003). Neuronal avalanches in neocortical circuits. *J. Neurosci*. 23, 11167–11177.

Blake, R., Fox, R., and McIntyre, C. (1971). Stochastic properties of stabilized-image binocular rivalry alternations. *J. Exp. Psychol*. 88, 327–332.

Blake, R., Sobel, K. V., and Gilroy, L. A. (2003). Visual motion retards alternations between conflicting perceptual interpretations. *Neuron* 39, 869–878.

Blake, R., Westendorf, D., and Fox, R. (1990). Temporal perturbations of binocular rivalry. *Percept. Psychophys*. 48, 593–602.

Bonneh, Y., Sagi, D., and Karni, A. (2001). A transition between eye and object rivalry determined by stimulus coherence. *Vis. Res*. 41, 981–989.

Borsellino, A., De Marco, A., Allazetta, A., Rinesi, S., and Bartolini, B. (1972). Reversal time distribution in the perception of visual ambiguous stimuli. *Kybernetik* 10, 139–144.

Brascamp, J. W., Knapen, T. H. J., Kanai, R., Noest, A. J., van Ee, R., and van den Berg, A. V. (2008). Multi-timescale perceptual history resolves visual ambiguity. *PLoS ONE* 3:e1497. doi: 10.1371/journal.pone.0001497

Brascamp, J. W., van Ee, R., Noest, A. J., Jacobs, R. H., and van Den Berg, A. V. (2006). The time course of binocular rivalry reveals a fundamental role of noise. *J. Vis*. 6, 1244–1256.

Brascamp, J. W., van Ee, R., Pestman, W. R., and van Den Berg, A. V. (2005). Distributions of alternation rates in various forms of bistable perception. *J. Vis*. 5, 87–298.

Braun, J., and Mattia, M. (2010). Attractors and noise: twin drivers of decisions and multistability. *Neuroimage* 52, 740–751.

Camalet, S., Duke, T., Jülicher, F., Prost, J., and Julicher, F. (2000). Auditory sensitivity provided by self-tuned critical oscillations of hair cells. *Proc. Natl. Acad. Sci. U.S.A*. 97, 3183–3188.

Campbell, F. W., and Howell, E. R. (1972). Monocular alternation: a method for the investigation of pattern vision. *J. Physiol*. 225, 19P–21P.

Curtu, R., Shpiro, A., Rubin, N., and Rinzel, J. (2008). Mechanisms for frequency control in neuronal competition models. *SIAM J. Appl. Dyn. Sys*. 7, 609–649.

Donner, T. H., Sagi, D., Bonneh, Y. S., and Heeger, D. J. (2008). Opposite neural signatures of motion-induced blindness in human dorsal and ventral visual cortex. *J. Neurosci*. 28, 10298–10310.

Fox, R., and Herrmann, J. (1967). Stochastic properties of binocular rivalry alternations. *Percept. Psychophys*. 2, 432–446.

Friston, K., Breakspear, M., and Deco, G. (2012). Perception and self-organized instability. *Front. Comput. Neurosci*. 6:44. doi: 10.3389/fncom.2012.00044

Gerardin, P., Kourtzi, Z., and Mamassian, P. (2010). Prior knowledge of illumination for 3D perception in the human brain. *Proc. Natl. Acad. Sci. U.S.A*. 107, 16309–16314.

Gigante, G., Mattia, M., Braun, J., and Del Giudice, P. (2009). Bistable perception modeled as competing stochastic integrations at two levels. *PLoS Comput. Biol*. 5:e1000430. doi: 10.1371/journal.pcbi.1000430

Hesselmann, G., Kell, C. a., Eger, E., and Kleinschmidt, A. (2008). Spontaneous local variations in ongoing neural activity bias perceptual decisions. *Proc. Natl. Acad. Sci. U.S.A*. 105, 10984–10989.

Hohwy, J., Roepstorff, A., and Friston, K. (2008). Predictive coding explains binocular rivalry: an epistemological review. *Cognition* 108, 687–701.

Hollins, M. (1980). The effect of contrast on the completeness of binocular rivalry suppression. *Percept. Psychophys*. 27, 550–556.

Hoyer, P. O., and Hyvärinen, A. (2003). “Interpreting neural response variability as monte carlo sampling of the posterior,” in *Advances in Neural Information Processing Systems*, eds S. Becker, S. Thrun, and K. Obermayer (Cambridge, MA: MIT Press), 293–300.

Hudak, M., Gervan, P., Friedrich, B., Pastukhov, A., Braun, J., and Kovacs, I. (2011). Increased readiness for adaptation and faster alternation rates under binocular rivalry in children. *Front. Hum. Neurosci*. 5:128. doi: 10.3389/fnhum.2011.00128

Kang, M.-S., and Blake, R. (2010). What causes alternations in dominance during binocular rivalry? *Atten. Percept. Psychophys*. 72, 179–186.

Kersten, D., Mamassian, P., and Yuille, A. (2004). Object perception as Bayesian inference. *Annu. Rev. Psychol*. 55, 271–304.

Kim, Y.-J., Grabowecky, M., and Suzuki, S. (2006). Stochastic resonance in binocular rivalry. *Vis. Res*. 46, 392–406.

Kinouchi, O., and Copelli, M. (2006). Optimal dynamical range of excitable networks at criticality. *Nat. Phys*. 2, 348–351.

Knapen, T. H. J., Brascamp, J. W., Pearson, J., van Ee, R., and Blake, R. (2011). The role of frontal and parietal brain areas in bistable perception. *J. Neurosci*. 31, 10293–10301.

Kohler, W., and Wallach, H. (1944). Figural after-effects. An investigation of visual processes. *Proc. Am. Philos. Soc*. 88, 269–357.

Laing, C. R., and Chow, C. C. (2002). A spiking neuron model for binocular rivalry. *J. Comput. Neurosci*. 12, 39–53.

Lee, S.-H., Blake, R., and Heeger, D. J. (2007). Hierarchy of cortical responses underlying binocular rivalry. *Nat. Neurosci*. 10, 1048–1054.

Lehky, S. R. (1995). Binocular rivalry is not chaotic. *Philos. Trans. R. Soc. Lond. B Biol. Sci*. 259, 71–76.

Leopold, D. A., and Logothetis, N. K. N. (1999). Multistable phenomena: changing views in perception. *Trends Cogn. Sci*. 3, 254–264.

Leopold, D. A., Wilke, M., Maier, A., and Logothetis, N. K. (2002). Stable perception of visually ambiguous patterns. *Nat. Neurosci*. 5, 605–609.

Levelt, W. J. (1967). Note on the distribution of dominance times in binocular rivalry. *Br. J. Psychol*. 58, 143–145.

Maier, A., Wilke, M., Aura, C., Zhu, C., Ye, F. Q., and Leopold, D. A. (2008). Divergence of fMRI and neural signals in V1 during perceptual suppression in the awake monkey. *Nat. Neurosci*. 11, 1193–1200.

Maier, A., Wilke, M., Logothetis, N. K., and Leopold, D. A. (2003). Perception of temporally interleaved ambiguous patterns. *Curr. Biol*. 13, 1076–1085.

Mamassian, P., and Goutcher, R. (2005). Temporal dynamics in bistable perception. *J. Vis*. 5, 361–375.

Meng, M., and Tong, F. (2004). Can attention selectively bias bistable perception? Differences between binocular rivalry and ambiguous figures. *J. Vis*. 4, 539–551.

Mitchell, J. F., Stoner, G. R., and Reynolds, J. H. (2004). Object-based attention determines dominance in binocular rivalry. *Nature* 429, 410–413.

Moldakarimov, S., Rollenhagen, J. E., Olson, C. R., and Chow, C. C. (2005). Competitive dynamics in cortical responses to visual stimuli. *J. Neurophysiol*. 94, 3388–3396.

Mooney, C. M. (1957). Age in the development of closure ability in children. *Can. J. Psychol*. 11, 219–226.

Moreno-Bote, R., Knill, D. C., and Pouget, A. (2011). Bayesian sampling in visual perception. *Proc. Natl. Acad. Sci. U.S.A*. 108, 12491–12496.

Moreno-Bote, R., Rinzel, J., and Rubin, N. (2007). Noise-induced alternations in an attractor network model of perceptual bistability. *J. Neurophysiol*. 98, 1125–1139.

Moreno-Bote, R., Shpiro, A., Rinzel, J., and Rubin, N. (2010). Alternation rate in perceptual bistability is maximal at and symmetric around equi-dominance. *J. Vis*. 10:1. doi: 10.1167/10.11.1

Moulton, S. T., and Kosslyn, S. M. (2009). Imagining predictions: mental imagery as mental emulation. *Philos. Trans. R. Soc. Lond. B Biol. Sci*. 364, 1273–1280.

Murata, T., Matsui, N., Miyauchi, S., Kakita, Y., and Yanagida, T. (2003). Discrete stochastic process underlying perceptual rivalry. *Neuroreport* 14, 1347–1352.

Nawrot, M., and Blake, R. (1989). Neural integration of information specifying structure from stereopsis and motion. *Science* 244, 716–718.

Necker, L. A. (1832). Observations on some remarkable phenomena seen in Switzerland; and an optical phenomenon which occurs on viewing of a crystal or geometrical solid. *Philos. Mag*. 1, 329–337.

Noest, A. J., van Ee, R., Nijs, M. M., and van Wezel, R. J. A. (2007). Percept-choice sequences driven by interrupted ambiguous stimuli: a low-level neural model. *J. Vis*. 7, 10.

Pastukhov, A., and Braun, J. (2007). Perceptual reversals need no prompting by attention. *J. Vis*. 7, 5.1–5.17.

Pastukhov, A., and Braun, J. (2008). A short-term memory of multi-stable perception. *J. Vis*. 8, 7.1–7.14.

Pastukhov, A., and Braun, J. (2011). Cumulative history quantifies the role of neural adaptation in multistable perception. *J. Vis*. 11:12. doi: 10.1167/11.10.12

Petersik, T. J. (2002). Buildup and decay of a three-dimensional rotational aftereffect obtained with a three-dimensional figure. *Perception* 31, 825–836.

Plenz, D., and Thiagarajan, T. C. (2007). The organizing principles of neuronal avalanches: cell assemblies in the cortex? *Trends Neurosci*. 30, 101–110.

Rolls, E. T., and Deco, G. (2010). *The Noisy Brain: Stochastic Dynamics as a Principle of Brain Function*. New York, NY: Oxford University Press.

Sadaghiani, S., Hesselmann, G., Friston, K. J., and Kleinschmidt, A. (2010). The relation of ongoing brain activity, evoked neural responses, and cognition. *Front. Syst. Neurosci*. 4:20. doi: 10.3389/fnsys.2010.00020

Seely, J., and Chow, C. C. (2011). Role of mutual inhibition in binocular rivalry. *J. Neurophysiol*. 106, 2136–2150.

Shew, W. L., Yang, H., Petermann, T., Roy, R., and Plenz, D. (2009). Neuronal avalanches imply maximum dynamic range in cortical networks at criticality. *J. Neurosci*. 29, 15595–15600.

Shpiro, A., Curtu, R., Rinzel, J., and Rubin, N. (2007). Dynamical characteristics common to neuronal competition models. *J. Neurophysiol*. 97, 462–473.

Shpiro, A., Moreno-Bote, R., Rubin, N., and Rinzel, J. (2009). Balance between noise and adaptation in competition models of perceptual bistability. *J. Comput. Neurosci*. 27, 37–54.

Sperling, G., and Dosher, B. A. (1994). “Depth from motion,” in *Early Vision and Beyond*, eds T. V. Papathomas, A. G. Charles Chubb, and E. Kowler (Cambridge, MA: MIT Press), 133–142.

Sterzer, P., Kleinschmidt, A., and Rees, G. (2009). The neural bases of multistable perception. *Trends Cogn. Sci*. 13, 310–38.

Sterzer, P., and Rees, G. (2008). A neural basis for percept stabilization in binocular rivalry. *J. Cogn. Neurosci*. 20, 389–399.

Sundareswara, R., and Schrater, P. R. (2008). Perceptual multistability predicted by search model for Bayesian decisions. *J. Vis*. 8, 12.1–12.19.

Sutton, R. S., and Barto, A. G. (1998). *Reinforcement Learning: An Introduction*. Cambridge, MA: MIT Press.

Tong, F., Meng, M., and Blake, R. (2006). Neural bases of binocular rivalry. *Trends Cogn. Sci*. 10, 502–511.

Tuckwell, H. C. (2006). *Introduction to Theoretical Neurobiology: Volume 1, Linear Cable Theory and Dendritic Structure*. Cambridge: Cambridge University Press.

van Dam, L. C. J., and van Ee, R. (2006). Retinal image shifts, but not eye movements *per se*, cause alternations in awareness during binocular rivalry. *J. Vis*. 6, 1172–1179.

van Ee, R. (2005). Dynamics of perceptual bi-stability for stereoscopic slant rivalry and a comparison with grating, house-face, and Necker cube rivalry. *Vis. Res*. 45, 29–40.

van Ee, R. (2009). Stochastic variations in sensory awareness are driven by noisy neuronal adaptation: evidence from serial correlations in perceptual bistability. *J. Opt. Soc. Am. A Opt. Image Sci. Vis*. 26, 2612–2622.

von Helmholtz, H. (1866). *Treatise on Physiological Optics*. Vol 3. Birmingham, AL: The Optical Society of America.

Walker, P. (1975). Stochastic properties of binocular-rivalry alternations. *Percept. Psychophys*. 18, 467–473.

Weiss, Y., Simoncelli, E. P., and Adelson, E. H. (2002). Motion illusions as optimal percepts. *Nat. Neurosci*. 5, 598–604.

Wheatstone, C. (1838). Contributions to the physiology of vision – part the first. on some remarkable, and hitherto unobserved, phenomena of binocular vision. *Philos. Trans. R. Soc. Lond*. 128, 371–394.

Wilson, H. R. (2007). Minimal physiological conditions for binocular rivalry and rivalry memory. *Vis. Res*. 47, 2741–2750.

Wolfe, J. M. (1984). Reversing ocular dominance and suppression in a single flash. *Vis. Res*. 24, 471–478.

Yang, Z., and Purves, D. (2003). A statistical explanation of visual space. *Nat. Neurosci*. 6, 632–640.

Keywords: multi-stability, binocular rivalry, adaptation, model, exploitation-exploration dilemma

Citation: Pastukhov A, García-Rodríguez PE, Haenicke J, Guillamon A, Deco G and Braun J (2013) Multi-stable perception balances stability and sensitivity. *Front. Comput. Neurosci.* **7**:17.doi: 10.3389/fncom.2013.00017

Received: 12 December 2012; Accepted: 04 March 2013;

Published online: 20 March 2013.

Edited by:

Klaus R. Pawelzik, University Bremen, GermanyReviewed by:

Udo Ernst, University of Bremen, GermanyRuben Moreno-Bote, Foundation Sant Joan de Deu, Spain

Copyright © 2013 Pastukhov, García-Rodríguez, Haenicke, Guillamon, Deco and Braun. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.

*Correspondence: Alexander Pastukhov and Jochen Braun, Department of Cognitive Biology, Otto-von-Guericke Universität, Leipziger Straße 44/Haus 91, Magdeburg 39120, Germany. e-mail: pastukhov.alexander@gmail.com; jochen.braun@ovgu.de