<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Comput. Sci.</journal-id>
<journal-title>Frontiers in Computer Science</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Comput. Sci.</abbrev-journal-title>
<issn pub-type="epub">2624-9898</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fcomp.2022.866029</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Computer Science</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>The Spatial Leaky Competing Accumulator Model</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Zemliak</surname> <given-names>Viktoria</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1654922/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>MacInnes</surname> <given-names>W. Joseph</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/158790/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Neuroinformatics Lab, Institute of Cognitive Science, University of Osnabr&#x000FC;ck</institution>, <addr-line>Osnabr&#x000FC;ck</addr-line>, <country>Germany</country></aff>
<aff id="aff2"><sup>2</sup><institution>Vision Modelling Laboratory, Department of Psychology, National Research University Higher School of Economics</institution>, <addr-line>Moscow</addr-line>, <country>Russia</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Ramprasaath R. Selvaraju, Salesforce, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Silvia Cascianelli, University of Modena and Reggio Emilia, Italy; Samuele Poppi, University of Modena and Reggio Emilia, Italy</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Viktoria Zemliak <email>vzemlyak&#x00040;gmail.com</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Computer Vision, a section of the journal Frontiers in Computer Science</p></fn></author-notes>
<pub-date pub-type="epub">
<day>09</day>
<month>05</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>4</volume>
<elocation-id>866029</elocation-id>
<history>
<date date-type="received">
<day>30</day>
<month>01</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>29</day>
<month>03</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Zemliak and MacInnes.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Zemliak and MacInnes</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract>
<p>The Leaky Competing Accumulator model (LCA) of Usher and McClelland is able to simulate the time course of perceptual decision making between an arbitrary number of stimuli. Reaction times, such as saccadic latencies, produce a typical distribution that is skewed toward longer latencies and accumulator models have shown excellent fit to these distributions. We propose a new implementation called the Spatial Leaky Competing Accumulator (SLCA), which can be used to predict the timing of subsequent fixation durations during a visual task. SLCA uses a pre-existing saliency map as input and represents accumulation neurons as a two-dimensional grid to generate predictions in visual space. The SLCA builds on several biologically motivated parameters: leakage, recurrent self-excitation, randomness and non-linearity, and we also test two implementations of lateral inhibition. A <italic>global</italic> lateral inhibition, as implemented in the original model of Usher and McClelland, is applied to all competing neurons, while a <italic>local</italic> implementation allows only inhibition of immediate neighbors. We trained and compared versions of the SLCA with both global and local lateral inhibition with use of a genetic algorithm, and compared their performance in simulating human fixation latency distribution in a foraging task. Although both implementations were able to produce a positively skewed latency distribution, only the local SLCA was able to match the human data distribution from the foraging task. Our model is discussed for its potential in models of salience and priority, and its benefits as compared to other models like the Leaky integrate and fire network.</p></abstract>
<kwd-group>
<kwd>information accumulation</kwd>
<kwd>fixations latency distribution</kwd>
<kwd>visual search</kwd>
<kwd>reaction time</kwd>
<kwd>saliency map</kwd>
<kwd>lateral inhibition</kwd>
</kwd-group>
<counts>
<fig-count count="3"/>
<table-count count="2"/>
<equation-count count="6"/>
<ref-count count="57"/>
<page-count count="11"/>
<word-count count="8806"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>We are able to process incoming sensory information rather quickly and efficiently despite limited processing resources available in the brain and the high energy costs of neuronal computations (Lennie, <xref ref-type="bibr" rid="B27">2003</xref>). Given the need to allocate energy for various task demands, attention is commonly described as a system that can select a subset of available sensory information for further processing. In visual attention, selection is often likened to a moving spotlight across the visual field in order to highlight the regions that are most distinguishable and relevant for the task (Carrasco, <xref ref-type="bibr" rid="B11">2011</xref>). In this model, a relatively small region of the entire visual field can be selected and hence attended at any moment, and would result in a boost of perceptual processing in the selected area. Shifts of attention can be driven by bottom-up or top-down factors. The latter follow the task goal and volitional control, while bottom-up factors are task-independent and are determined by objective physical characteristics of the stimulus (Posner, <xref ref-type="bibr" rid="B35">1980</xref>).</p>
<p>Models of bottom-up attention can be used to predict where humans will look in tasks like free examination of a scene or visual search (with some limitations, as the scope of top-down influence may vary a lot depending on the nature of the stimuli). The algorithms at the heart of these models are often based on the idea that attention can be captured by the physical characteristics of the retinal input. In general, objects most different from their surroundings are considered the most salient. These models are consistent with the feature integration theory of attention in that saliency can be computed pre-attentively for different features (Treisman and Gelade, <xref ref-type="bibr" rid="B49">1980</xref>). At the early pre-attentive stage, elementary perceptual properties (shape, color, brightness et al.) of a stimulus are perceived in parallel and encoded into separate feature maps. The attentive stage is for normalizing and integrating the feature maps into a higher-level representation&#x02014;a saliency map. This map corresponds to the spatial dimensions of the initial visual field and encodes the overall saliency of that region. The saliency map provides a representation of the visual field, where the most conspicuous locations are emphasized.</p>
<p>There are multiple candidate areas for locus of a saliency map in the brain. Zhaoping et al. argued that neurons of the primary visual area (V1) respond to basic low-level features of the image and constitute a saliency map, basing this assumption on psychophysical tests (Zhaoping and May, <xref ref-type="bibr" rid="B57">2007</xref>) and neuroimaging recordings (Zhang et al., <xref ref-type="bibr" rid="B55">2012</xref>). For Gottlieb (<xref ref-type="bibr" rid="B16">2007</xref>), a salience representation of the incoming image in monkeys is best matched by the lateral intraparietal area (LIP) with the most analogous brain area in humans being intraparietal sulcus (IPS) (Van Essen et al., <xref ref-type="bibr" rid="B51">2001</xref>). Some authors argue that the concept of saliency map might be biologically invalid (Fecteau and Munoz, <xref ref-type="bibr" rid="B15">2006</xref>), since top-down modulations interfere with bottom-up visual processing at all intermediate and higher levels of the visual system (such as LIP, FEF, and SC). They suggested a brain representation of the attentional map as a priority rather than a saliency map. A priority map emphasizes that allocating spatial attention is based on both bottom-up saliency and top-down goal-related prioritization of the information (Fecteau and Munoz, <xref ref-type="bibr" rid="B15">2006</xref>; Bisley and Mirpour, <xref ref-type="bibr" rid="B6">2019</xref>).</p>
<p>Although originally focused on bottom-up processing, more recent saliency models have extended the idea to include top-down attention. Feature biasing, for example, has been used to assign weights to feature maps when building a saliency map. This can be implemented through supervised learning (Borji et al., <xref ref-type="bibr" rid="B7">2012</xref>) or with use of eye movements recordings (Zhao and Koch, <xref ref-type="bibr" rid="B56">2011</xref>; also see Itti and Borji, <xref ref-type="bibr" rid="B18">2013</xref> for a review). Alternatively, spatial biasing could favor certain locations that are important for scene context (Torralba et al., <xref ref-type="bibr" rid="B47">2006</xref>; Peters and Itti, <xref ref-type="bibr" rid="B34">2007</xref>). Another class of models operates on objects rather than feature salience, and requires an object recognition component (see Krasovskaya and MacInnes, <xref ref-type="bibr" rid="B25">2019</xref> for review). An example is the object-based visual attention model of Sun and Fisher (<xref ref-type="bibr" rid="B45">2003</xref>) which includes competition between objects, their grouping and consequent hierarchical attention shifts with use of top-down modulations.</p>
<p>The most interesting examples of saliency models, from a cognitive neuroscience perspective, include a strong theoretical basis along with neurally plausible computational approaches. For example, gaussian pyramids are used to reflect center-surround receptive fields in the primary visual cortex, and also show a good fit to human data for spatial localization of salient stimuli (Merzon et al., <xref ref-type="bibr" rid="B31">2020</xref>). However, according to MIT/Tuebingin Saliency Benchmark (Bylinskii et al., <xref ref-type="bibr" rid="B9">2018</xref>), classical implementations of saliency models are inferior to novel approaches, such as deep neural networks (K&#x000FC;mmerer et al., <xref ref-type="bibr" rid="B26">2017</xref>; Jia and Bruce, <xref ref-type="bibr" rid="B21">2020</xref>) that turn the problem into one of spatial classification. Nevertheless, biological plausibility of saliency models provides good interpretability and theoretical value.</p>
<p>Another advantage of saliency models is that some implementations (e.g., Walther and Koch, <xref ref-type="bibr" rid="B52">2006</xref>) have the capability to predict temporal dynamics of gaze responses. In general, modeling attentional shifts at the computational level can be tested in a variety of ways including their spatial and temporal components. Spatial models predict where we allocate attention, and their performance is often measured with overt attention, i.e., locations of gaze fixations and tested using established metrics like Accuracy Under Curve or AUC-Judd (Judd et al., <xref ref-type="bibr" rid="B23">2009</xref>). Models that include temporal predictions are less common and may predict the order of these eye movements (a scanpath) and/or their latency distribution. Although early salience models were able to predict fixation latencies (Walther and Koch, <xref ref-type="bibr" rid="B52">2006</xref>), the classical saliency model was shown to have serious limitations in simulating the temporal dynamics of human data (Merzon et al., <xref ref-type="bibr" rid="B31">2020</xref>). At the same time, alternative deep learning-based approaches usually focus on the spatial component alone, and very few of these models address the temporal aspect at all.</p>
<p>Other alternatives have been recently introduced that have adapted Bayesian or diffusion techniques to generate fixations in both space and time. For example, Ratcliff (<xref ref-type="bibr" rid="B36">2018</xref>) implemented a spatial version of the drift diffusion model. The Spatially Continuous Diffusion Model (SCDM) allowed input from a 2-dimensional plane and predicted decision responses when a location on a planar threshold was reached allowing spatial-temporal prediction from touch or eye movement responses. This model was not tested specifically on salience map input, though the planar input used would likely allow this use case.</p>
<p>Additionally, the model LATEST (Tatler et al., <xref ref-type="bibr" rid="B46">2017</xref>) implements a Bayesian decision process to model each fixation as a stay vs go competition to predict fixation latencies and locations. Temporally, the Bayesian process is shown to be an excellent fit to human fixation latencies. Spatially, the model calculated pixel-wise decisions based in part on maps derived from image salience, but also include a map of semantic importance as defined by human rating. Fixations were planned in parallel over the full image using decision maps (as opposed to salience or priority maps) and tended to choose fixations that landed within the high salience areas (Judd et al., <xref ref-type="bibr" rid="B22">2012</xref>).</p>
<p>Computational saliency models can be conceptualized as sequential modules consisting of processing units likened to neuronal populations. At the first stage basic physical properties of visual stimuli are encoded in feature maps, which are further normalized and aggregated into a single saliency or priority map. While many saliency models stop at these spatial predictions, an additional temporal level of the model might generate shifts of attention in a winner-take-all (WTA) fashion: first, the most salient location is attended, with subsequent fixations steered toward novel locations with an inhibitory mechanism like inhibition of return (Posner, <xref ref-type="bibr" rid="B35">1980</xref>). This allocation of attention to each fixational point at the temporal layer could be implemented via a spiking neuron model (Trappenberg et al., <xref ref-type="bibr" rid="B48">2001</xref>; Adeli et al., <xref ref-type="bibr" rid="B2">2017</xref>). Processing units imitate neuronal populations which build up electrical potential and fire when exceeding a certain threshold.</p>
<p>However, many current saliency models have focused on predicting spatial fixations and not retained the ability to imitate spiking processes and predict temporal information. Although the classical saliency model is able to generate a fixation latency distribution, it does not show a good fit to human data (Merzon et al., <xref ref-type="bibr" rid="B31">2020</xref>). We believe there is a gap in the current literature for an alternative mechanism of fixation selection that works with existing saliency map spatial localization.</p>
<p>One candidate to implement a spatial saliency map is the Leaky competing accumulator (LCA; Usher and McClelland, <xref ref-type="bibr" rid="B50">2001</xref>). The Leaky Competing Accumulator has a two-layered structure: the first layer consists of multiple (usually two) visual input stimuli, and the second computational layer includes a range of neuron-like processing units. Each processing unit corresponds to a single input element. Over time, the processing units accumulate information from the input layer, i.e., gradually increase their values over time. When the value of some unit exceeds a threshold, a decision is made and the corresponding input is considered selected. Thus, human fixation latency is simulated as the amount of time it took the model to decide about the next fixation. In comparison with related accumulator models (Ratcliff et al., <xref ref-type="bibr" rid="B39">2007</xref>; Brown and Heathcote, <xref ref-type="bibr" rid="B8">2008</xref>; Ratcliff and McKoon, <xref ref-type="bibr" rid="B40">2008</xref>), the LCA includes a range of additional parameters: information leakage, recurrent self-excitation, randomness, and lateral inhibition. Each of the parameters is well-justified from a biological point of view, and we briefly describe below the psychophysiological phenomena imitated by the model parameters.</p>
<p>Neural currents can be characterized in terms of their passive decay over time. This decay has exponential properties and results in a partial loss, or leakage of information from visual input (Abbott, <xref ref-type="bibr" rid="B1">1991</xref>). The LCA model implements this decay, which leads to a slower increase in the unit values and also filters out weak stimulations that produce insufficient excitation and vanish with decay over time. A second important mechanism, which counteracts and balances such decay, is recurrent self-excitation. This allows neural units to maintain their activity over time and decrease the rate of information leakage (Amit, <xref ref-type="bibr" rid="B3">1989</xref>). Self-excitation is implemented in the model as bottom-up excitatory input to all accumulator units.</p>
<p>Thirdly, LCA incorporates lateral inhibition as a mechanism for neural competition. Although axonal projections from one brain region to others are overwhelmingly excitatory, within a single brain area there are both excitatory and inhibitory interactions (Chelazzi et al., <xref ref-type="bibr" rid="B12">1993</xref>). Lateral inhibition accounts for each active neuron inhibiting adjacent neurons to it in a lateral direction. In the original LCA, the value of each processing unit is decreased by a sum of all others&#x00027; values at every time moment. Thus, self-excitation and lateral inhibition balance each other with units multiplied by their own scaled values from the previous time moment and simultaneously decreased by the values of others.</p></sec>
<sec id="s2">
<title>Proposal</title>
<p>We propose a model of allocating attention as a series of spatio-temporal decisions about where to make the next saccade. The suggested model belongs to the family of information accumulators that represent perceptual decision making as a stochastic process that is gradually evolving over time (Smith, <xref ref-type="bibr" rid="B43">1995</xref>; Usher and McClelland, <xref ref-type="bibr" rid="B50">2001</xref>; Brown and Heathcote, <xref ref-type="bibr" rid="B8">2008</xref>; Ratcliff and McKoon, <xref ref-type="bibr" rid="B40">2008</xref>). These models are extremely accurate in reproducing temporal response distributions (MacInnes, <xref ref-type="bibr" rid="B29">2017</xref>) and can also model neural accumulation in areas like the superior colliculus (Ratcliff et al., <xref ref-type="bibr" rid="B39">2007</xref>).</p>
<p>Specifically, our model is based on the Leaky Competing Accumulator (the LCA; Usher and McClelland, <xref ref-type="bibr" rid="B50">2001</xref>) for calculating information accumulation.</p>
<p>Lateral inhibition is a key mechanism allowing multiple inputs to the LCA model (Usher and McClelland, <xref ref-type="bibr" rid="B50">2001</xref>), but one challenge for adopting an accumulator model to simulate a salience map construction is that traditional algorithms most often select between only two abstract alternatives. Practically, neural competition between multiple alternatives can be imitated by feed-forward inhibition with each input unit sending a positive signal to a corresponding accumulator unit and similar negative values to all others (Heuer, <xref ref-type="bibr" rid="B17">1987</xref>). However, with the increasing number of alternatives, all neurons except the most active would receive excess inhibition, drop below zero quickly and hence fail to compete. Thus, accurate modeling becomes challenging. While race models can also be extended to multiple competing alternatives, each random accumulator added shifts the response distribution to an earlier bias (Wolfe and Gray, <xref ref-type="bibr" rid="B54">2007</xref>). Lateral inhibition, however, may allow for accurate modeling of this neuronal competitive interplay. All inputs to processing units would be excitatory, and the value of inhibition would not need to be as drastic as that of feed-forward. The most active accumulator unit could inhibit the others significantly but gradually and would not result in negative activation after the first iteration. However, in the original LCA model, each neuronal unit sends inhibitory signals to all others, which is not entirely biologically plausible, especially as we consider neurons on a spatial salience map.</p>
<p>We suggest an implementation of LCA that uses a saliency map as an input and operates in both temporal and spatial domains, generating fixation coordinates over time. Thus, the leaky competing accumulator becomes a spatial leaky competing accumulator (SLCA). The number of internal processing units corresponds to the size of the input saliency map, and each of these units represent a corresponding neuronal population. Although these salience maps often describe their size in terms of &#x0201C;pixels,&#x0201D; we will only use the term in the abstract sense of a population&#x00027;s receptive field.</p>
<p>We further propose an alternative implementation of lateral inhibition in LCA, so that each neuron-like unit could influence only its immediate neighbors. In this light, a key advantage of our SLCA would be its mechanism of lateral inhibition, allowing the model to simulate the neuronal competition in visual pathways. Each processing unit would accumulate information over time, and only the first unit to achieve a threshold fires at the moment. Neuron-like elements are thus competing for limited brain attention resources, and their competition is driven by a range of physiologically accurate mechanisms.</p>
<sec>
<title>SLCA Model Architecture</title>
<p>Our model shares the two-layer network structure and the base algorithm for the update of unit values with the original LCA model. Please see the full model architecture on <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>SLCA model architecture. We use EML-Net to produce the salience maps, which are then used as input to our model.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-04-866029-g0001.tif"/>
</fig>
<p>The first layer includes the external input to the model. Our SLCA model does not work directly with image or retinal input, but uses salience/priority map as input. As such, we used an off the shelf implementation of EML-Net to provide the salience maps from the images for our training and testing procedures. The original LCA model operates with several input choice alternatives but without consideration for the spatial proximity of those choices. In our case, a fundamental consideration was that the input would be a two-dimensional salience or priority map, so spatial proximity was added.</p>
<p>The second layer is the same for SLCA and LCA and it consists of accumulator units which are roughly analogous to neural activation clusters processing information about different alternatives. Finally, a winner-take-all selection mechanism was implemented to act iteratively on this layer. Many models use Inhibition of return (IOR) to reduce the likelihood of refixating salient locations but we chose not to implement this mechanism at this time since the previous simple mechanisms implemented may not match the two forms that are proposed to exist (Redden et al., <xref ref-type="bibr" rid="B41">2021</xref>). In terms of processing mechanisms, LCA and our SLCA are quite similar. Much of the description below applies to both models but we highlight where key differences occur.</p>
<p>Accumulator units can be characterized in terms of their input and output values. Input values correspond to the neural population current, i.e., neural activation. The output values stand for population firing rate, which is calculated with use of a linear threshold function. This function can well-approximate relations between the firing rate and the input current (Mason and Larkman, <xref ref-type="bibr" rid="B30">1990</xref>; Jagadeesh et al., <xref ref-type="bibr" rid="B20">1992</xref>). The response is triggered by the unit whose activation first reaches a threshold. Thus, the time required for reaching this criterion value simulates human RT before the next saccade.</p></sec>
<sec>
<title>Algorithm</title>
<p>The mechanism of information processing in both LCA and our SLCA models is implemented in the dynamic behavior of the units&#x00027; activations and their continuous interplay. The algorithm describes how values of the accumulator units increase over time until one of them reaches the threshold. The original LCA model used a constant threshold value, however, our temporal predictions largely depend on the input saliency map values and are sensitive to its changes. Models like the original salience map (Itti and Koch, <xref ref-type="bibr" rid="B19">2000</xref>) normalized the conspicuity maps prior to the leaky integrate and fire layer. However, we achieved this result with a dynamic threshold parameter that depends on the input saliency map values, in the Equation (1).</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:msup><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mo>*</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>S</mml:mi><mml:mo>&#x0003E;</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>6</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Here, <italic>T</italic> stands for the unit activation threshold, which is the sum of two terms. <italic>T</italic><sub>0</sub> stands for the default threshold value, which is independent from the saliency map values. <italic>S</italic> stays for the saliency map values, and <italic>m</italic> is the special saliency multiplication factor. The larger <italic>m</italic> is, the more the resulting threshold value depends on the saliency map.</p>
<p>In general, the model tends to behave as a charging capacitor with an exponential approach. The formula for updating unit values in the original LCA model is presented in the Equation (2).</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mrow><mml:mi>d</mml:mi><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo stretchy='false'>[</mml:mo><mml:msub><mml:mi>&#x003C1;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mi>k</mml:mi><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mstyle><mml:mo stretchy='false'>]</mml:mo></mml:mrow><mml:mo>*</mml:mo></mml:msup><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:mfrac><mml:mo>+</mml:mo><mml:mi>f</mml:mi><mml:mo>+</mml:mo><mml:mo>&#x000A0;</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msqrt><mml:mrow><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:mfrac></mml:mrow></mml:msqrt></mml:mrow></mml:math></disp-formula>
<disp-formula id="E3"><mml:math id="M3"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02192;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>j</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>n</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x000A0;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Here, <italic>dx</italic><sub><italic>i</italic></sub> describes the change of the <italic>i</italic>-th accumulator unit activation value for the time interval <inline-formula><mml:math id="M4"><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula>. This change is driven by the external input &#x003C1;<sub><italic>i</italic></sub>, the excitatory input <italic>x</italic><sub><italic>i</italic></sub> and overall inhibition <inline-formula><mml:math id="M5"><mml:mi>&#x003B2;</mml:mi><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:munder><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. <italic>k</italic> stands for the overall net leakage. <italic>E</italic><sub><italic>i</italic></sub> stands for Gaussian random noise, <italic>f</italic>&#x02013;for offset. <italic>n</italic> is the total number of accumulator units. In order to achieve biological plausibility, an additional restriction is introduced in the model: if the activation value of any accumulator unit has a value lower than 0, it should be immediately truncated to 0.</p>
<p>The external feed-forward input &#x003C1;<sub><italic>i</italic></sub> is a weighted sum of all inputs of the first layer to the <italic>i</italic>-th accumulator unit. Greater weight is assigned for the <italic>i</italic>-th input in comparison with all others. Note that it is possible via simplifying assumption that all of the input units have a zero value before stimuli presentation, and after it these values change in accordance with stimuli saliency values.</p>
<p>The <italic>k</italic> term stands for the overall net leakage, which is the difference between the recurrent self-excitation and information decay. The corresponding formula is presented in the Equation (3).</p>
<disp-formula id="E4"><label>(3)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mi>&#x003BB;</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mi>a</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Here, <italic>a</italic> is a scaling factor for the recurrent self-excitation, while &#x003BB; represents information decay, or leakage of activation. They function together as factors for excitatory input, and the resulting k factor represents balance between self-excitation and leakage. With <italic>k</italic> &#x0003E; 0 the system is stable and tends to zero activation over time, whereas <italic>k</italic> &#x0003C;0 allows for self-amplifying and hence instability of the system.</p>
<p>The overall inhibition of a unit in the original LCA model depends on the input from the other units. It is represented by a sum of other units&#x00027; activations multiplied by a scaling term &#x003B2;. Thus, during the accumulation process, each alternative sends inhibiting signals to all others.</p>
<p>The original LCA model uses inputs abstracted in space, such as several choice alternatives. Regardless of the number of inputs, there is no sense of spatial proximity between units. It operates only in the temporal domain, predicting each units&#x00027; activation over time. In contrast, our proposed Spatial LCA model accounts for data in two dimensions and predicts activation on this map over time. We use a saliency map as an input to the model thus extending the number of choice alternatives up to the number of input units in the saliency map. The second layer of the SLCA model consists of a 2-dimensional array of information accumulation units. These units are likened to neural activation clusters representing a spatial location.</p>
<p>One may consider a simple visual search task, with each location on the saliency map represented by a single node of the input level. The second level of the network includes an equivalent number of units with each corresponding to a certain location. Thus, the external feed-forward input becomes a weighted sum of all inputs of the saliency map to the <italic>ij</italic>-th accumulator unit. The entire model is a simulation of fixation selection as perceptual decision making over a stimulus picture for each time step.</p>
<p>A spatial implementation, however, raises an important question of the degree to which neighboring neurons can influence the rest of the grid. To this end, we implemented two versions of the algorithm for SLCA values update with the crucial difference being whether the lateral inhibition parameter had global or local. The global version is analogous to the original LCA algorithm, where each accumulator unit is potentially inhibited by all others. We propose an alternative local implementation of lateral inhibition where each unit inhibits only its immediate neighbors. Thus, there are eight inhibited neighbors for all non-borderline units. As for those corresponding to units on the physical border of the saliency map, the number of neighbors to inhibit varies from three to five units. Equation (4) contains the resulting formula for values update.</p>
<disp-formula id="E5"><label>(4)</label><mml:math id="M7"><mml:mrow><mml:mi>d</mml:mi><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo stretchy='false'>[</mml:mo><mml:msub><mml:mi>&#x003C1;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mi>k</mml:mi><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x02212;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mstyle><mml:mo stretchy='false'>]</mml:mo></mml:mrow><mml:mo>*</mml:mo></mml:msup><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:mfrac><mml:mo>+</mml:mo><mml:mi>f</mml:mi><mml:mo>+</mml:mo><mml:mo>&#x000A0;</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msqrt><mml:mrow><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:mfrac></mml:mrow></mml:msqrt></mml:mrow></mml:math></disp-formula>
<disp-formula id="E6"><mml:math id="M8"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mtext>&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;</mml:mtext><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02192;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>0</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>j</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mo>&#x000B1;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>i</mml:mi><mml:mo>&#x000B1;</mml:mo><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>i</mml:mi><mml:mo>&#x000B1;</mml:mo><mml:mi>w</mml:mi><mml:mo>&#x000B1;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>The Equation (4) shares the parameters <italic>dx</italic><sub><italic>i</italic></sub>, &#x003C1;<sub><italic>i</italic></sub>, <italic>k</italic>, <italic>x</italic><sub><italic>i</italic></sub>, &#x003B2;, <italic>f, E</italic><sub><italic>i</italic></sub>, <inline-formula><mml:math id="M9"><mml:mfrac><mml:mrow><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:mfrac></mml:math></inline-formula> with the equation (2). Also, a new parameter <italic>w</italic> is introduced. It stands for the width of the original stimulus image and is used to identify the coordinates of neighboring units, which are further used for calculating the local inhibition.</p></sec></sec>
<sec sec-type="materials" id="s3">
<title>Materials</title>
<p>Evaluating the model performance was accomplished by comparing it with human data. We used data from a visual foraging task using natural indoor scenes as stimuli. Forty six participants had to search real photos of scenes for multiple instances of either cups or pictures. These images were taken from the LabelMe dataset (Russell et al., <xref ref-type="bibr" rid="B42">2008</xref>), which provides images of indoor and outdoor scenes. Data were collected using an eye tracker EyeLink 1000&#x0002B;, with the sampling rate 1,000 Hz. Fixation detection was set to a velocity threshold of 35 degrees per second. Fixations with RT &#x0003C;100 and &#x0003E;750 ms were dropped as outliers. The dataset size after outliers&#x00027; exclusion was 55,400 sample fixations. A detailed description of the data collection process was provided in Merzon et al. (<xref ref-type="bibr" rid="B31">2020</xref>). The data was collected with ethical approval from the HSE ethics committee and conforms to the protocols listed in the declaration of Helsinki.</p>
<p>All data was divided into a test and training set. The data from 36 randomly chosen participants were used for the optimization procedure, and data from the remaining 10 participants were used for testing. Each participant viewed 23 pictures, so the total number of training samples was 828, and the number of test samples was 230.</p>
<sec>
<title>Saliency Map Input</title>
<p>The input to the SLCA was saliency maps generated from the raw images used in the experiments described above. The development of the algorithm for generating the saliency map is out of scope of the paper, since there are a wide variety of published solutions for generating salience maps from images (Bylinskii et al., <xref ref-type="bibr" rid="B9">2018</xref>) and our model is capable of working with the data produced by any of these solutions. Predicting the spatial locations of fixations using these solutions is well-tested (Bylinskii et al., <xref ref-type="bibr" rid="B9">2018</xref>) and the spatial accuracy of our SLCA would mostly be determined by the approach used to generate the salience map itself. For example, the top rated model for spatial predictions at the time of writing was DeepGaze IIE (Linardos et al., <xref ref-type="bibr" rid="B28">2021</xref>) with an AUC-JUDD score of 0.8829 (as of Sept, 2021; Bylinskii et al., <xref ref-type="bibr" rid="B9">2018</xref>). For this reason, we focused on the temporal predictions in this paper.</p>
<p>To generate the saliency maps, we used the EML-NET model (Jia and Bruce, <xref ref-type="bibr" rid="B21">2020</xref>), which was pre-trained on the ImageNet (Deng et al., <xref ref-type="bibr" rid="B14">2009</xref>) dataset. It consists of 3.2 million images in total and is commonly used for various computer vision tasks. We chose the EML-NET model for the saliency map generation due to its excellent performance: it was ranked in third place in the MIT/T&#x000FC;bingen Saliency Benchmark (as of Sept, 2021; Bylinskii et al., <xref ref-type="bibr" rid="B9">2018</xref>) by the AUC-Judd metric, and rated at 0.876.</p>
<p>The images from our dataset were fed into the pre-trained EML-NET, which then generated the saliency maps. These saliency maps were then used as an input to our SLCA model to generate the final fixation latencies. See <xref ref-type="fig" rid="F2">Figure 2</xref> for the example of the generated saliency map and the corresponding human fixations heatmap. Each saliency map had a size of 120 &#x000D7;68 pixels, or 8,160 pixels in total. Hence, the accumulator layer of the model consists of 8,160 neuron-like units, each of which processed information from the corresponding pixel of the input saliency map.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Example image with a heatmap for the reference, and resulting salience map. Our proposed SLCA accepts the salience map as input and is agnostic of the algorithm used to produce that map.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-04-866029-g0002.tif"/>
</fig></sec></sec>
<sec sec-type="methods" id="s4">
<title>Methods</title>
<p>We tested two versions of the SLCA model&#x02014;with the global and local inhibition parameter&#x02014;in their ability to simulate human fixation latency in a visual search task. Both variants of the model were able to produce a sequence of fixations with predictions for both latency and location coordinates. Nonetheless, in this work we focus on the temporal aspect only since the accuracy of spatial predictions are largely influenced by the choice of model used to generate the salience map. The models were implemented in programming language Python 3 in an object-oriented style with use of additional library <italic>numpy</italic> for efficient mathematical calculations.</p>
<p>We also implemented a machine learning based genetic algorithm (GA) in order to find the optimal set of SLCA parameters for better model performance. The genetic algorithm belongs to a family of evolutionary algorithms and is inspired by the principles of evolution and natural selection (Mitchell, <xref ref-type="bibr" rid="B32">1996</xref>). It is based on three biologically inspired computational operators: mutation, crossover and selection. Each algorithm iteration includes slightly modifying model parameters, running the model with these parameters and evaluating fitness to human data. Thus, the goal of optimization was to find the set of parameters which could help to simulate the human data more accurately. For evaluating fitness, we used Kolmogorov-Smirnov statistic as a loss function. The KS test was chosen because it has already proved its efficiency for evolutionary algorithms (see Weber et al., <xref ref-type="bibr" rid="B53">2006</xref>; MacInnes, <xref ref-type="bibr" rid="B29">2017</xref>).</p>
<p>Firstly, both variants of the SLCA model were tested with the default parameters. Then they were tested with the best parameters sets found during the GA optimization procedure. All data was divided into a training and testing set. Data of 36 participants was used for training, 10&#x02014;for testing the models.</p>
<p>For the optimization process, 40 fixation latencies were simulated for each of 23 images for each of 36 training participants. They were gathered into a single set, as long as the human fixation latencies for each image. Then 500 values were then randomly sampled for 30 times from both human and simulation datasets. To compare their distributions, we averaged 30 KS-statistic values for these samples. The same procedure was applied for 10 test participants.</p>
<p>For evaluating fitness, we used Kolmogorov-Smirnov (KS) statistic chosen because it already proved its efficiency for evolutionary algorithms (see MacInnes, <xref ref-type="bibr" rid="B29">2017</xref>). We used a two-sampled KS implementation from the scipy library in Python 3, which produces two values as an output: KS statistic and <italic>p</italic>-value. Thus, we attempted to minimize the KS statistic.</p>
<p>We ran the GA algorithm for optimization at 100 iterations (epochs). We initialized the LCA model with different parameters set, evaluated the results of each iteration with use of the KS statistic and subjected them to mutation, crossover and selection operations of GA. Throughout these iterations we attempted to optimize up to 7 parameters: (1) the leakage term &#x003BB;; (2) self-excitation <italic>a</italic>; (3) input strength of the feedforward weights &#x003C1;<sub><italic>ij</italic></sub>; (4) standard deviation of random noise <italic>E</italic><sub><italic>i</italic></sub>; (5) lateral inhibition <inline-formula><mml:math id="M10"><mml:mi>&#x003B2;</mml:mi><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:munder><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>; (6) cross talk of the feedforward weights; (7) the offset <italic>f</italic>; (8) the salience multiplier term for threshold change <italic>m</italic>. To prevent overparameterization of the model, two parameters of the SLCA model were fixed: (1) the time-step size <italic>t</italic>; (2) the default activation threshold <italic>T</italic><sub>0</sub>.</p>
<p>Other fixed parameters included: (1) 40 trials, (2) maximum of 750 time steps during each trial, (3) 8,160 accumulator units with respect to the input map size.</p>
<p>When 100 training epochs were completed, the two variants of the SLCA model with best parameter sets found were estimated on the test set.</p></sec>
<sec sec-type="results" id="s5">
<title>Results</title>
<p>We compared the performance of the two SLCA model variants with different implementations of the lateral inhibition parameter: (1) the original global lateral inhibition by Usher and McClelland (<xref ref-type="bibr" rid="B50">2001</xref>); (2) the proposed local lateral inhibition, where each unit inhibits only its immediate neighbors.</p>
<p>The performance of both models was evaluated with use of the two-sided Kolmogorov-Smirnov test: the data generated with a given set of parameters was compared with real human data from the visual search task. Each algorithm was run for each of 23 images and 10 test participants, which is 23 <sup>&#x0002A;</sup> 10 = 230 times in total. Both human and simulated data were gathered into two big sets. Then 500 values were randomly sampled for 30 times from both human and simulation datasets. To compare their distributions, we averaged 30 KS-statistic values for these samples. See <xref ref-type="fig" rid="F3">Figure 3</xref> for visualization of distributions for best data simulated by a local SLCA (<xref ref-type="fig" rid="F3">Figure 3A</xref>) and a global SLCA (<xref ref-type="fig" rid="F3">Figure 3B</xref>) models in comparison with the human samples.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>(A)</bold> The KS-statistic is equal to 0.08 (<italic>p</italic> &#x0003C;0.05) for local SLCA; <bold>(B)</bold> the KS-statistic is equal to 0.32 (<italic>p</italic> &#x0003C;0.05) for global SLCA.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fcomp-04-866029-g0003.tif"/>
</fig>
<p>The typical human saccadic latency distribution can be characterized by slight skewness toward longer latencies. The SLCA with local inhibition was able to simulate the basic pattern of this time-course, although it was not always able to capture the slower responses of the distribution. The final parameter set for local and global versions were tested against the human data by running 30 iterations of model results and comparing them against sampled human data. Comparisons used the KS test with alpha set to 0.05. Over 30 iterations the model simulated data that rejected the null hypothesis (human and model were different) 23 times. Thus, in 46% cases SLCA with local inhibition was able to simulate the data reliably. As for the SLCA with the original global inhibition, even with the best parameters set it was not able to reject the null hypothesis that the model and simulated data were from different distributions. The best KS value for data generated by the SLCA with local inhibition was 0.08, whereas the best KS value of the original version was 0.32. See <xref ref-type="table" rid="T1">Table 1</xref> with the average and maximum KS statistic, for which the <italic>p</italic> &#x0003C;0.05.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>KS statistics for SLCA with global and local inhibition.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center"><bold>SLCA with local inhibition</bold></th>
<th valign="top" align="center"><bold>SLCA with global inhibition</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Mean KS statistic</td>
<td valign="top" align="center">0.16</td>
<td valign="top" align="center">0.437</td>
</tr>
<tr>
<td valign="top" align="left">Min KS statistic</td>
<td valign="top" align="center">0.08</td>
<td valign="top" align="center">0.32</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec>
<title>Parameters</title>
<p>During the training/optimization procedure, two sets of parameters were found: for the SLCA with local and global inhibition. The optimization procedures were run for 100 epochs each. Please see <xref ref-type="table" rid="T2">Table 2</xref> for optimal parameter sets found.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Best parameters found <italic>via</italic> optimization.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Model parameter</bold></th>
<th valign="top" align="center"><bold>SLCA with local inhibition</bold></th>
<th valign="top" align="center"><bold>SLCA with global inhibition</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Time step size<xref ref-type="table-fn" rid="TN1"><sup>&#x0002A;</sup></xref></td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">0.01</td>
</tr>
<tr>
<td valign="top" align="left">Default threshold<xref ref-type="table-fn" rid="TN1"><sup>&#x0002A;</sup></xref></td>
<td valign="top" align="center">5.0</td>
<td valign="top" align="center">5.0</td>
</tr>
<tr>
<td valign="top" align="left">Leakage</td>
<td valign="top" align="center">0.256</td>
<td valign="top" align="center">0.4</td>
</tr>
<tr>
<td valign="top" align="left">Competition (lateral inhibition)</td>
<td valign="top" align="center">1.379</td>
<td valign="top" align="center">0.024</td>
</tr>
<tr>
<td valign="top" align="left">Recurrent self-excitation</td>
<td valign="top" align="center">0.372</td>
<td valign="top" align="center">0.41</td>
</tr>
<tr>
<td valign="top" align="left">Input strength of feedforward weights</td>
<td valign="top" align="center">0.64</td>
<td valign="top" align="center">0.1</td>
</tr>
<tr>
<td valign="top" align="left">Cross-talk of feedforward weights</td>
<td valign="top" align="center">0.097</td>
<td valign="top" align="center">1.001</td>
</tr>
<tr>
<td valign="top" align="left">Offset</td>
<td valign="top" align="center">0.312</td>
<td valign="top" align="center">0.1</td>
</tr>
<tr>
<td valign="top" align="left">Standard deviation of noise</td>
<td valign="top" align="center">1.043</td>
<td valign="top" align="center">1.0</td>
</tr>
<tr>
<td valign="top" align="left">Saliency multiplication factor</td>
<td valign="top" align="center">4.654</td>
<td valign="top" align="center">0.178</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN1"><label>&#x0002A;</label><p><italic>It marks the fixed parameters that were not subjects to optimization</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>The parameters marked with <sup>&#x0002A;</sup> were fixed. The most drastic differences can be observed in the following parameters: <italic>competition, cross-talk of feedforward weights</italic>, and <italic>saliency multiplication factor</italic>. The <italic>saliency multiplication factor</italic> defines the influence of the overall saliency of the image on the threshold. The more salient regions the image has&#x02014;the greater activation threshold will be. Interestingly, the local SLCA model parameters tend to have higher thresholds.</p>
<p>At the same time, the value of <italic>competition</italic>, which defines the inhibition strength, was several times greater than in the global SLCA. We can suggest the following explanation to this fact: although each neuron was only inhibited by its immediate neighbors, their inhibition strength should have been comparable with the inhibition in the global SLCA model, where each neuron was inhibited by all others. Thus, the inhibitory power of each neuron should have been much greater in the SLCA model to compensate.</p>
<p>The model could have alternatively evolved to smaller excitation values, and we can partially observe this in <italic>cross-talk of feed-forward weights</italic>. This parameter contributes to the excitation, and it was greater in the global SLCA in comparison with the local one. It should be noted that the genetic algorithms were not guaranteed to converge to a global minimum, so the parameters could have been evolved in other ways.</p></sec></sec>
<sec sec-type="discussion" id="s6">
<title>Discussion</title>
<p>We proposed and implemented a two-dimensional version of the leaky competing accumulator (LCA) model that allowed for calculations based on neural proximity in an accumulation grid. This Spatial LCA allowed us to modify the global lateral inhibition parameter of the LCA so that only proximal neurons in the network were inhibited. We also introduced a dynamic threshold to the model, so that it could flexibly adjust to the inputs of different image average saliency. This allowed us to train and test the SLCA model on various images with various saliency with no need to adjust the parameters to each of them separately.</p>
<p>Finally, we tested the SLCA as a potential replacement for other spiking layers (like LIF) that are frequently used to generate shifts of attention based on a salience map generated from input images. We optimized two versions of the SLCA with global and local lateral inhibition against human fixation data from a visual foraging task.</p>
<sec>
<title>Performance</title>
<p>The SLCA model with local lateral inhibition was able to generate plausible distribution of fixation latencies across the various images and participants. In contrast, the model using global lateral inhibition was not able to match human latencies on the full dataset. Lateral inhibition, as a mechanism, encourages sensitivity to variability over uniformity in the visual field. Any stimulation from a uniform visual field would equally suppress neighboring regions and thus inhibit responses. Our SLCA model with local lateral inhibition limited the interaction of spatial neurons to only those most adjacent on the two-dimensional grid. Although we use the term lateral in a literal, spatial sense, it is interesting to note that lateral inhibition is also believed to work in a more abstract sense to inhibit alternatives of non-spatial modalities (Carpenter, <xref ref-type="bibr" rid="B10">1997</xref>), and this may be closer to the non-spatial implementation of the original LCA.</p>
<p>The model performance can be compared with other existing solutions. For instance (Merzon et al., <xref ref-type="bibr" rid="B31">2020</xref>) tested the LIF algorithm used in Walther and Koch (<xref ref-type="bibr" rid="B52">2006</xref>) and showed that the algorithm was only able to generate latency distributions by using the inherent salience differences between the images and was unable to produce any variation in responses given a single image. This comparison holds extra validity, since Merzon and colleagues applied the LIF model to data from the same task as the current mode&#x02014;reconstructing latency distribution for visual search. Although there may be room for improvement in our results, we would emphasize that by implementing neurons with spatial proximity, we were able to use lateral inhibition with limited, local scope. Although both versions of the lateral inhibition&#x02014;with the local and global scope&#x02014;were able to learn skewed distributions typical of human latencies, the local scope consistently was a better fit for the human data.</p>
<p>Lateral inhibition plays a role in saccadic responses in the intermediate layers of the superior colliculus (Munoz and Istvan, <xref ref-type="bibr" rid="B33">1998</xref>), but may not need to be a component of modeling saccadic behavior. For example, Ratcliff et al. (<xref ref-type="bibr" rid="B38">2011</xref>) did not find evidence of inhibition in the SC, contrary to expectations. Subsequently, recent spatial models have been shown to model human temporal data without lateral inhibition (Ratcliff, <xref ref-type="bibr" rid="B36">2018</xref>). Ratcliff&#x00027;s Spatially Continuous Diffusion Model (SCDM) uses a noise parameter added during accumulation over a spatial continuum and decisions occurring when signal reaches a planar threshold. This model was shown correctly simulate many aspects of saccadic responses including reaction time distribution and response angle to salient locations on a generated annulus. Although SCDM was not tested with salience maps generated from real images, many of the stimuli used contained salient patches amidst noise and could be comparable to the current SLCA results. Similar to our model, SCDM was not tested on sequences of saccades as this would require a suppression mechanism, like IOR, to prevent refixations on previously fixated locations. A direct comparison of results between SCDM and SLCA is not possible with the current data, but should be possible in future work and could provide a theoretical test on the utility of lateral inhibition. Another strong comparable is the recent LATEST model using Bayesian stay/go decision processes (Tatler et al., <xref ref-type="bibr" rid="B46">2017</xref>). The LATEST model goes beyond simple saccadic decisions, however, and includes the creation of a full &#x0201C;decision map.&#x0201D; This map is built in LATEST using bottom up image salience in addition to top down semantic information produced by human judgements on those images. A full comparison of all recent approaches is certainly warranted, but would require a common dataset with semantic information for the LATEST model.</p></sec>
<sec>
<title>Spatial Predictions</title>
<p>Although the SLCA model generates both spatial and temporal predictions, we focused on the temporal aspect&#x02014;in particular, on the latency of fixations. Our model could be used as the basis to achieve more accurate spatial predictions. First of all, if fixations are allocated according to attention on the saliency map we might implement a mechanism like inhibition of return (IOR). IOR is believed to be a foraging facilitator and a low-level mechanism that could help reduce the likelihood of revisiting previously attended locations (Klein and MacInnes, 1999; Bays and Husain, <xref ref-type="bibr" rid="B4">2012</xref>; MacInnes et al., 2014; but see Smith and Henderson, <xref ref-type="bibr" rid="B44">2011</xref>). This could provide insights on distributions of fixation sequences, but also improve our understanding of inhibition of return itself (Redden et al., <xref ref-type="bibr" rid="B41">2021</xref>).</p></sec>
<sec>
<title>Bottom-Up and Top-Down Mechanisms</title>
<p>Our model used salience maps as the input layer to our SLCA implementations, but it was truly agnostic to the algorithm used to generate these maps. For example, we did not include any top-down attentional modulations in the current implementation, but our SLCA could easily fit as a layer on a priority map or decision map instead. Although models of bottom-up salience have produced valuable insights in visual processing, the idea of a priority map with top-down influence is closer to what we observe in human and primate biology (Fecteau and Munoz, <xref ref-type="bibr" rid="B15">2006</xref>; Bisley and Goldberg, <xref ref-type="bibr" rid="B5">2010</xref>). Bottom up saliency might be enough for predicting distributions of fixation latency, but locations and even order would certainly need various degrees of top down and contextual information depending on the task. When presented with real-life like scenes, the human visual system obviously makes use of top-down information, perhaps to an even greater degree than bottom up salience (Chen and Zelinsky, <xref ref-type="bibr" rid="B13">2006</xref>).</p>
<p>Introducing top-down processes into the SLCA model could be understood as simply operating with a priority map rather than a saliency map, which means moving upwards in visual processing hierarchy and modeling feedback connections from higher structures, such as LIP or FEF. This would be similar to the approach taken by LATEST (Tatler et al., <xref ref-type="bibr" rid="B46">2017</xref>) who used maps created from human judgements as a proxy for areas of semantic importance. The problem is that incorporation of top-down attention modulators would be task and scene specific and require pre-training in order to adapt a model to a particular situation, i.e., to train it for recognizing specific objects, patterns or locations. On the contrary, bottom-up saliency models do not require specific training and are able to operate with any kind of input, hence are more versatile and need no tuning for a particular task.</p></sec>
<sec>
<title>Behavioral and Neuronal Data</title>
<p>Another promising area for further research would be testing how well the SLCA could fit not only behavioral, but also neuronal data. Numerous accumulator models are based on biological principles and have been shown to match neural processing of the perceptual decision making in various brain areas, i.e., superior colliculus (Ratcliff et al., <xref ref-type="bibr" rid="B37">2003</xref>, <xref ref-type="bibr" rid="B39">2007</xref>). As soon as the SLCA is capable of simulating human behavioral data accurately, it could also be tested in predicting neural responses in brain areas involved in visual processing at the level of the saliency or the priority mapping, such as V1, SC, LIP, and FEF (Fecteau and Munoz, <xref ref-type="bibr" rid="B15">2006</xref>; Gottlieb, <xref ref-type="bibr" rid="B16">2007</xref>; Zhang et al., <xref ref-type="bibr" rid="B55">2012</xref>).</p></sec></sec>
<sec sec-type="data-availability" id="s7">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found at: <ext-link ext-link-type="uri" xlink:href="https://osf.io/9f7xe/">https://osf.io/9f7xe/</ext-link> and <ext-link ext-link-type="uri" xlink:href="https://github.com/rainsummer613/slca">https://github.com/rainsummer613/slca</ext-link>.</p></sec>
<sec id="s8">
<title>Ethics Statement</title>
<p>The studies involving human participants were reviewed and approved by HSE University Ethics Review Committee. The patients/participants provided their written informed consent to participate in this study.</p></sec>
<sec id="s9">
<title>Author Contributions</title>
<p>VZ and WM contributed to conception and design of the study. VZ built the model, performed the experiments, and wrote the first draft of the manuscript. WM wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.</p></sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
<sec sec-type="disclaimer" id="s10">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p></sec> </body>
<back>
<ack>
<p>This research was supported in part through computational resources of HPC facilities at HSE University (Kostenetskiy et al., <xref ref-type="bibr" rid="B24">2021</xref>).</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Abbott</surname> <given-names>L..</given-names></name></person-group> (<year>1991</year>). <article-title>&#x0201C;Firing-rate models for neural populations,&#x0201D;</article-title> in <source>Neural Networks: From Biology to High-Energy Physics</source>, eds O. Benhar, C. Bosio, P. Del Giudice, and E. Tablet (<publisher-loc>Pisa</publisher-loc>: <publisher-name>ETS Editrice</publisher-name>), <fpage>179</fpage>&#x02013;<lpage>196</lpage>.</citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Adeli</surname> <given-names>H.</given-names></name> <name><surname>Vitu</surname> <given-names>F.</given-names></name> <name><surname>Zelinsky</surname> <given-names>G. J.</given-names></name></person-group> (<year>2017</year>). <article-title>A model of the superior colliculus predicts fixation locations during scene viewing and visual search</article-title>. <source>J. Neurosci.</source> <volume>37</volume>, <fpage>1453</fpage>&#x02013;<lpage>1467</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.0825-16.2016</pub-id><pub-id pub-id-type="pmid">28039373</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Amit</surname> <given-names>D. J..</given-names></name></person-group> (<year>1989</year>). <source>Modeling Brain Function, the World of Attractor Dynamics</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>. <pub-id pub-id-type="doi">10.1017/CBO9780511623257</pub-id><pub-id pub-id-type="pmid">28939929</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bays</surname> <given-names>P. M.</given-names></name> <name><surname>Husain</surname> <given-names>M.</given-names></name></person-group> (<year>2012</year>). <article-title>Active inhibition and memory promote exploration and search of natural scenes</article-title>. <source>J. Vis.</source> <volume>12</volume>, <fpage>8</fpage>. <pub-id pub-id-type="doi">10.1167/12.8.8</pub-id><pub-id pub-id-type="pmid">22895881</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bisley</surname> <given-names>J. W.</given-names></name> <name><surname>Goldberg</surname> <given-names>M. E.</given-names></name></person-group> (<year>2010</year>). <article-title>Attention, intention, and priority in the parietal lobe</article-title>. <source>Annu. Rev. Neurosci.</source> <volume>33</volume>, <fpage>1</fpage>&#x02013;<lpage>21</lpage>. <pub-id pub-id-type="doi">10.1146/annurev-neuro-060909-152823</pub-id><pub-id pub-id-type="pmid">20192813</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bisley</surname> <given-names>J. W.</given-names></name> <name><surname>Mirpour</surname> <given-names>K.</given-names></name></person-group> (<year>2019</year>). <article-title>The neural instantiation of a priority map</article-title>. <source>Curr. Opin. Psychol.</source> <volume>29</volume>, <fpage>108</fpage>&#x02013;<lpage>112</lpage>. <pub-id pub-id-type="doi">10.1016/j.copsyc.2019.01.002</pub-id><pub-id pub-id-type="pmid">30731260</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Borji</surname> <given-names>A.</given-names></name> <name><surname>Sihite</surname> <given-names>D. N.</given-names></name> <name><surname>Itti</surname> <given-names>L.</given-names></name></person-group> (<year>2012</year>). <article-title>&#x0201C;Probabilistic learning of task-specific visual attention,&#x0201D;</article-title> in <source>2012 IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Piscataway, NJ</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>470</fpage>&#x02013;<lpage>477</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2012.6247710</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brown</surname> <given-names>S. D.</given-names></name> <name><surname>Heathcote</surname> <given-names>A.</given-names></name></person-group> (<year>2008</year>). <article-title>The simplest complete model of choice response time: Linear ballistic accumulation</article-title>. <source>Cogn. Psychol.</source> <volume>57</volume>, <fpage>153</fpage>&#x02013;<lpage>178</lpage>. <pub-id pub-id-type="doi">10.1016/j.cogpsych.2007.12.002</pub-id><pub-id pub-id-type="pmid">18243170</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bylinskii</surname> <given-names>Z.</given-names></name> <name><surname>Judd</surname> <given-names>T.</given-names></name> <name><surname>Oliva</surname> <given-names>A.</given-names></name> <name><surname>Torralba</surname> <given-names>A.</given-names></name> <name><surname>Durand</surname> <given-names>F.</given-names></name></person-group> (<year>2018</year>). <article-title>What do different evaluation metrics tell us about saliency models?</article-title> <source>IEEE Trans. Pattern Anal. Mach. Intell.</source> <volume>41</volume>, <fpage>740</fpage>&#x02013;<lpage>757</lpage>. <pub-id pub-id-type="doi">10.1109/TPAMI.2018.2815601</pub-id><pub-id pub-id-type="pmid">29993800</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Carpenter</surname> <given-names>R. H. S..</given-names></name></person-group> (<year>1997</year>). <article-title>Sensorimotor processing: charting the frontier</article-title>. <source>Curr. Biol</source>. <volume>7</volume>, <fpage>R348</fpage>&#x02013;<lpage>R351</lpage>.<pub-id pub-id-type="pmid">9197226</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Carrasco</surname> <given-names>M..</given-names></name></person-group> (<year>2011</year>). <article-title>Visual attention: the past 25 years</article-title>. <source>Vis. Res.</source> <volume>51</volume>, <fpage>1484</fpage>&#x02013;<lpage>1525</lpage>. <pub-id pub-id-type="doi">10.1016/j.visres.2011.04.012</pub-id><pub-id pub-id-type="pmid">21549742</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chelazzi</surname> <given-names>L.</given-names></name> <name><surname>Miller</surname> <given-names>E. K.</given-names></name> <name><surname>Duncan</surname> <given-names>J.</given-names></name> <name><surname>Desimone</surname> <given-names>R.</given-names></name></person-group> (<year>1993</year>). <article-title>A neural basis for visual search in inferior temporal cortex</article-title>. <source>Nature</source> <volume>363</volume>, <fpage>345</fpage>&#x02013;<lpage>347</lpage>. <pub-id pub-id-type="doi">10.1038/363345a0</pub-id><pub-id pub-id-type="pmid">8497317</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Zelinsky</surname> <given-names>G. J.</given-names></name></person-group> (<year>2006</year>). <article-title>Real-world visual search is dominated by top-down guidance</article-title>. <source>Vis. Res.</source> <volume>46</volume>, <fpage>4118</fpage>&#x02013;<lpage>4133</lpage>. <pub-id pub-id-type="doi">10.1016/j.visres.2006.08.008</pub-id><pub-id pub-id-type="pmid">17005231</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Deng</surname> <given-names>J.</given-names></name> <name><surname>Dong</surname> <given-names>W.</given-names></name> <name><surname>Socher</surname> <given-names>R.</given-names></name> <name><surname>Li</surname> <given-names>L. J.</given-names></name> <name><surname>Li</surname> <given-names>K.</given-names></name> <name><surname>Fei-Fei</surname> <given-names>L.</given-names></name></person-group> (<year>2009</year>). <article-title>&#x0201C;Imagenet: a large-scale hierarchical image database,&#x0201D;</article-title> in <source>IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Piscataway, NJ</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>248</fpage>&#x02013;<lpage>255</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2009.5206848</pub-id></citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fecteau</surname> <given-names>J. H.</given-names></name> <name><surname>Munoz</surname> <given-names>D. P.</given-names></name></person-group> (<year>2006</year>). <article-title>Salience, relevance, and firing: a priority map for target selection</article-title>. <source>Trends Cogn. Sci.</source> <volume>10</volume>, <fpage>382</fpage>&#x02013;<lpage>390</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2006.06.011</pub-id><pub-id pub-id-type="pmid">16843702</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gottlieb</surname> <given-names>J..</given-names></name></person-group> (<year>2007</year>). <article-title>From thought to action: the parietal cortex as a bridge between perception, action, and cognition</article-title>. <source>Neuron</source>, <volume>53</volume>, <fpage>9</fpage>&#x02013;<lpage>16</lpage>.<pub-id pub-id-type="pmid">17196526</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heuer</surname> <given-names>H..</given-names></name></person-group> (<year>1987</year>). <article-title>Visual discrimination and response programming</article-title>. <source>Psychol. Res.</source> <volume>49</volume>, <fpage>91</fpage>&#x02013;<lpage>98</lpage>. <pub-id pub-id-type="doi">10.1007/BF00308673</pub-id><pub-id pub-id-type="pmid">3671632</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Itti</surname> <given-names>L.</given-names></name> <name><surname>Borji</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>&#x0201C;Computational models: bottom-up and top-down aspects,&#x0201D;</article-title> in <source>The Oxford Handbook of Attention</source>, eds A. C. Nobre and S. Kastner (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>20</lpage>. <pub-id pub-id-type="doi">10.1093/oxfordhb/9780199675111.013.026</pub-id></citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Itti</surname> <given-names>L.</given-names></name> <name><surname>Koch</surname> <given-names>C.</given-names></name></person-group> (<year>2000</year>). <article-title>A saliency-based search mechanism for overt and covert shifts of visual attention</article-title>. <source>Vis. Res.</source> <volume>40</volume>, <fpage>1489</fpage>&#x02013;<lpage>1506</lpage>. <pub-id pub-id-type="doi">10.1016/S0042-6989(99)00163-7</pub-id><pub-id pub-id-type="pmid">10788654</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jagadeesh</surname> <given-names>B.</given-names></name> <name><surname>Gray</surname> <given-names>C. M.</given-names></name> <name><surname>Ferster</surname> <given-names>D.</given-names></name></person-group> (<year>1992</year>). <article-title>Visually evoked oscillations of membrane potential in cells of cat visual cortex</article-title>. <source>Science</source> <volume>257</volume>, <fpage>552</fpage>&#x02013;<lpage>554</lpage>. <pub-id pub-id-type="doi">10.1126/science.1636094</pub-id><pub-id pub-id-type="pmid">1636094</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jia</surname> <given-names>S.</given-names></name> <name><surname>Bruce</surname> <given-names>N. D.</given-names></name></person-group> (<year>2020</year>). <article-title>EML-NET: an expandable multi-layer network for saliency prediction</article-title>. <source>Image Vis. Comput.</source> <volume>95</volume>, <fpage>103887</fpage>. <pub-id pub-id-type="doi">10.1016/j.imavis.2020.103887</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Judd</surname> <given-names>T.</given-names></name> <name><surname>Durand</surname> <given-names>F.</given-names></name> <name><surname>Torralba</surname> <given-names>A.</given-names></name></person-group> (<year>2012</year>). <source>A benchmark of computational models of saliency to predict human fixations</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://hdl.handle.net/1721.1/68590">http://hdl.handle.net/1721.1/68590</ext-link></citation>
</ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Judd</surname> <given-names>T.</given-names></name> <name><surname>Ehinger</surname> <given-names>K.</given-names></name> <name><surname>Durand</surname> <given-names>F.</given-names></name> <name><surname>Torralba</surname> <given-names>A.</given-names></name></person-group> (<year>2009</year>). <article-title>&#x0201C;Learning to predict where humans look,&#x0201D;</article-title> in <source>2009 IEEE 12th International Conference on Computer Vision (IEEE)</source>, <fpage>2106</fpage>&#x02013;<lpage>2113</lpage>.</citation>
</ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kostenetskiy</surname> <given-names>P. S.</given-names></name> <name><surname>Chulkevich</surname> <given-names>R. A.</given-names></name> <name><surname>Kozyrev</surname> <given-names>V. I.</given-names></name></person-group> (<year>2021</year>). <article-title>HPC resources of the higher school of economics</article-title>. <source>J. Phys. Conf. Ser.</source> <volume>1740</volume>, <fpage>012050</fpage>. <pub-id pub-id-type="doi">10.1088/1742-6596/1740/1/012050</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krasovskaya</surname> <given-names>S.</given-names></name> <name><surname>MacInnes</surname> <given-names>W. J.</given-names></name></person-group> (<year>2019</year>). <article-title>Salience models: a computational cognitive neuroscience review</article-title>. <source>Vision</source> <volume>3</volume>, <fpage>56</fpage>. <pub-id pub-id-type="doi">10.3390/vision3040056</pub-id><pub-id pub-id-type="pmid">31735857</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>K&#x000FC;mmerer</surname> <given-names>M.</given-names></name> <name><surname>Wallis</surname> <given-names>T. S.</given-names></name> <name><surname>Bethge</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>DeepGaze II: predicting fixations from deep features over time and tasks</article-title>. <source>J. Vis.</source> <volume>17</volume>, <fpage>1147</fpage>. <pub-id pub-id-type="doi">10.1167/17.10.1147</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lennie</surname> <given-names>P..</given-names></name></person-group> (<year>2003</year>). <article-title>The cost of cortical computation</article-title>. <source>Curr. Biol.</source> <volume>13</volume>, <fpage>493</fpage>&#x02013;<lpage>497</lpage>. <pub-id pub-id-type="doi">10.1016/S0960-9822(03)00135-0</pub-id><pub-id pub-id-type="pmid">12646132</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Linardos</surname> <given-names>A</given-names></name> <name><surname>K&#x000FC;mmerer</surname> <given-names>M.</given-names></name> <name><surname>Press</surname> <given-names>O</given-names></name> <name><surname>Bethge</surname> <given-names>M</given-names></name></person-group>. (<year>2021</year>). <article-title>Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling</article-title>. <source>arXiv [Preprint]. arXiv:2105.12441.</source> <pub-id pub-id-type="doi">10.1109/ICCV48922.2021.01268</pub-id></citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>MacInnes</surname> <given-names>W. J..</given-names></name></person-group> (<year>2017</year>). <article-title>Multiple diffusion models to compare saccadic and manual responses for inhibition of return</article-title>. <source>Neural Comput.</source> <volume>29</volume>, <fpage>804</fpage>&#x02013;<lpage>824</lpage>. <pub-id pub-id-type="doi">10.1162/NECO_a_00904</pub-id><pub-id pub-id-type="pmid">27764599</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mason</surname> <given-names>A.</given-names></name> <name><surname>Larkman</surname> <given-names>A.</given-names></name></person-group> (<year>1990</year>). <article-title>Correlations between morphology and electrophysiology of pyramidal neurons in slices of rat visual cortex. II. Electrophysiology</article-title>. <source>J. Neurosci.</source> <volume>10</volume>, <fpage>1415</fpage>&#x02013;<lpage>1428</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.10-05-01415.1990</pub-id><pub-id pub-id-type="pmid">2332788</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Merzon</surname> <given-names>L.</given-names></name> <name><surname>Malevich</surname> <given-names>T.</given-names></name> <name><surname>Zhulikov</surname> <given-names>G.</given-names></name> <name><surname>Krasovskaya</surname> <given-names>S.</given-names></name> <name><surname>MacInnes</surname> <given-names>W. J.</given-names></name></person-group> (<year>2020</year>). <article-title>Temporal limitations of the standard leaky integrate and fire model</article-title>. <source>Brain Sci.</source> <volume>10</volume>, <fpage>16</fpage>. <pub-id pub-id-type="doi">10.3390/brainsci10010016</pub-id><pub-id pub-id-type="pmid">31892197</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mitchell</surname> <given-names>M..</given-names></name></person-group> (<year>1996</year>). <source>An Introduction to Genetic Algorithms</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Munoz</surname> <given-names>D. P.</given-names></name> <name><surname>Istvan</surname> <given-names>P. J.</given-names></name></person-group> (<year>1998</year>). <article-title>Lateral inhibitory interactions in the intermediate layers of the monkey superior colliculus</article-title>. <source>J. Neurophysiol.</source> <volume>79</volume>, <fpage>1193</fpage>&#x02013;<lpage>1209</lpage>. <pub-id pub-id-type="doi">10.1152/jn.1998.79.3.1193</pub-id><pub-id pub-id-type="pmid">9497401</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Peters</surname> <given-names>R. J.</given-names></name> <name><surname>Itti</surname> <given-names>L.</given-names></name></person-group> (<year>2007</year>). <article-title>&#x0201C;Beyond bottom-up: incorporating task dependent influences into a computational model of spatial attention,&#x0201D;</article-title> in <source>2007 IEEE Conference on Computer Vision and Pattern Recognition</source> (<publisher-loc>Piscataway, NJ</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1109/CVPR.2007.383337</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Posner</surname> <given-names>M. I..</given-names></name></person-group> (<year>1980</year>). <article-title>Orienting attention</article-title>. <source>Q. J. Exp. Psychol.</source> <volume>32</volume>, <fpage>3</fpage>&#x02013;<lpage>25</lpage>. <pub-id pub-id-type="doi">10.1080/00335558008248231</pub-id><pub-id pub-id-type="pmid">7367577</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ratcliff</surname> <given-names>R..</given-names></name></person-group> (<year>2018</year>). <article-title>Decision making on spatially continuous scales</article-title>. <source>Psychol. Rev.</source> <volume>125</volume>, <fpage>888</fpage>&#x02013;<lpage>935</lpage>. <pub-id pub-id-type="doi">10.1037/rev0000117</pub-id><pub-id pub-id-type="pmid">31838271</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ratcliff</surname> <given-names>R.</given-names></name> <name><surname>Cherian</surname> <given-names>A.</given-names></name> <name><surname>Segraves</surname> <given-names>M.</given-names></name></person-group> (<year>2003</year>). <article-title>A comparison of macaque behavior and superior colliculus neuronal activity to predictions from models of two-choice decisions</article-title>. <source>J. Neurophysiol.</source> <volume>90</volume>, <fpage>1392</fpage>&#x02013;<lpage>1407</lpage>. <pub-id pub-id-type="doi">10.1152/jn.01049.2002</pub-id><pub-id pub-id-type="pmid">12761282</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ratcliff</surname> <given-names>R.</given-names></name> <name><surname>Hasegawa</surname> <given-names>Y. T.</given-names></name> <name><surname>Hasegawa</surname> <given-names>R. P.</given-names></name> <name><surname>Childers</surname> <given-names>R.</given-names></name> <name><surname>Smith</surname> <given-names>P. L.</given-names></name> <name><surname>Segraves</surname> <given-names>M. A.</given-names></name></person-group> (<year>2011</year>). <article-title>Inhibition in superior colliculus neurons in a brightness discrimination task?</article-title>. <source>Neural Comput.</source> <volume>23</volume>, <fpage>1790</fpage>&#x02013;<lpage>1820</lpage>. <pub-id pub-id-type="doi">10.1162/NECO_a_00135</pub-id><pub-id pub-id-type="pmid">21492006</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ratcliff</surname> <given-names>R.</given-names></name> <name><surname>Hasegawa</surname> <given-names>Y. T.</given-names></name> <name><surname>Hasegawa</surname> <given-names>R. P.</given-names></name> <name><surname>Smith</surname> <given-names>P. L.</given-names></name> <name><surname>Segraves</surname> <given-names>M. A.</given-names></name></person-group> (<year>2007</year>). <article-title>Dual diffusion model for single-cell recording data from the superior colliculus in a brightness-discrimination task</article-title>. <source>J. Neurophysiol.</source> <volume>97</volume>, <fpage>1756</fpage>&#x02013;<lpage>1774</lpage>. <pub-id pub-id-type="doi">10.1152/jn.00393.2006</pub-id><pub-id pub-id-type="pmid">17122324</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ratcliff</surname> <given-names>R.</given-names></name> <name><surname>McKoon</surname> <given-names>G.</given-names></name></person-group> (<year>2008</year>). <article-title>The diffusion decision model: theory and data for two-choice decision tasks</article-title>. <source>Neural Comput.</source> <volume>20</volume>, <fpage>873</fpage>&#x02013;<lpage>922</lpage>. <pub-id pub-id-type="doi">10.1162/neco.2008.12-06-420</pub-id><pub-id pub-id-type="pmid">18085991</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Redden</surname> <given-names>R. S.</given-names></name> <name><surname>MacInnes</surname> <given-names>W. J.</given-names></name> <name><surname>Klein</surname> <given-names>R. M.</given-names></name></person-group> (<year>2021</year>). <article-title>Inhibition of return: an information processing theory of its natures and significance</article-title>. <source>Cortex</source> <volume>135</volume>, <fpage>30</fpage>&#x02013;<lpage>48</lpage>. <pub-id pub-id-type="doi">10.31234/osf.io/s29f5</pub-id><pub-id pub-id-type="pmid">33360759</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Russell</surname> <given-names>B. C.</given-names></name> <name><surname>Torralba</surname> <given-names>A.</given-names></name> <name><surname>Murphy</surname> <given-names>K. P.</given-names></name> <name><surname>Freeman</surname> <given-names>W. T.</given-names></name></person-group> (<year>2008</year>). <article-title>LabelMe: a database and web-based tool for image annotation</article-title>. <source>Int. J. Comput. Vis.</source> <volume>77</volume>, <fpage>157</fpage>&#x02013;<lpage>173</lpage>. <pub-id pub-id-type="doi">10.1007/s11263-007-0090-8</pub-id></citation>
</ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>P. L..</given-names></name></person-group> (<year>1995</year>). <article-title>Psychophysically principled models of visual simple reaction time</article-title>. <source>Psychol. Rev.</source> <volume>102</volume>, <fpage>567</fpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.102.3.567</pub-id></citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>T. J.</given-names></name> <name><surname>Henderson</surname> <given-names>J. M.</given-names></name></person-group> (<year>2011</year>). <article-title>Looking back at Waldo: oculomotor inhibition of return does not prevent return fixations</article-title>. <source>J. Vis.</source> <volume>11</volume>, <fpage>3</fpage>. <pub-id pub-id-type="doi">10.1167/11.1.3</pub-id><pub-id pub-id-type="pmid">21205873</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>Y.</given-names></name> <name><surname>Fisher</surname> <given-names>R.</given-names></name></person-group> (<year>2003</year>). <article-title>Object-based visual attention for computer vision</article-title>. <source>Art. Intell.</source> <volume>146</volume>, <fpage>77</fpage>&#x02013;<lpage>123</lpage>. <pub-id pub-id-type="doi">10.1016/S0004-3702(02)00399-5</pub-id></citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tatler</surname> <given-names>B. W.</given-names></name> <name><surname>Brockmole</surname> <given-names>J. R.</given-names></name> <name><surname>Carpenter</surname> <given-names>R. H.</given-names></name></person-group> (<year>2017</year>). <article-title>LATEST: a model of saccadic decisions in space and time</article-title>. <source>Psychol. Rev.</source> <volume>124</volume>, <fpage>267</fpage>. <pub-id pub-id-type="doi">10.1037/rev0000054</pub-id><pub-id pub-id-type="pmid">28358564</pub-id></citation></ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Torralba</surname> <given-names>A.</given-names></name> <name><surname>Oliva</surname> <given-names>A.</given-names></name> <name><surname>Castelhano</surname> <given-names>M. S.</given-names></name> <name><surname>Henderson</surname> <given-names>J. M.</given-names></name></person-group> (<year>2006</year>). <article-title>Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search</article-title>. <source>Psychol. Rev.</source> <volume>113</volume>, <fpage>766</fpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.113.4.766</pub-id><pub-id pub-id-type="pmid">17014302</pub-id></citation></ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Trappenberg</surname> <given-names>T. P.</given-names></name> <name><surname>Dorris</surname> <given-names>M. C.</given-names></name> <name><surname>Munoz</surname> <given-names>D. P.</given-names></name> <name><surname>Klein</surname> <given-names>R. M.</given-names></name></person-group> (<year>2001</year>). <article-title>A model of saccade initiation based on the competitive integration of exogenous and endogenous signals in the superior colliculus</article-title>. <source>J. Cogn. Neurosci.</source> <volume>13</volume>, <fpage>256</fpage>&#x02013;<lpage>271</lpage>. <pub-id pub-id-type="doi">10.1162/089892901564306</pub-id><pub-id pub-id-type="pmid">11244550</pub-id></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Treisman</surname> <given-names>A. M.</given-names></name> <name><surname>Gelade</surname> <given-names>G.</given-names></name></person-group> (<year>1980</year>). <article-title>A feature-integration theory of attention</article-title>. <source>Cogn. Psychol.</source> <volume>12</volume>, <fpage>97</fpage>&#x02013;<lpage>136</lpage>. <pub-id pub-id-type="doi">10.1016/0010-0285(80)90005-5</pub-id><pub-id pub-id-type="pmid">7351125</pub-id></citation></ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Usher</surname> <given-names>M.</given-names></name> <name><surname>McClelland</surname> <given-names>J. L.</given-names></name></person-group> (<year>2001</year>). <article-title>The time course of perceptual choice: the leaky, competing accumulator model</article-title>. <source>Psychol. Rev.</source> <volume>108</volume>, <fpage>550</fpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.108.3.550</pub-id><pub-id pub-id-type="pmid">11488378</pub-id></citation></ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Van Essen</surname> <given-names>D. C.</given-names></name> <name><surname>Lewis</surname> <given-names>J. W.</given-names></name> <name><surname>Drury</surname> <given-names>H. A.</given-names></name> <name><surname>Hadjikhani</surname> <given-names>N.</given-names></name> <name><surname>Tootell</surname> <given-names>R. B.</given-names></name> <name><surname>Bakircioglu</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2001</year>). <article-title>Mapping visual cortex in monkeys and humans using surface-based atlases</article-title>. <source>Vision Res.</source> <volume>41</volume>, <fpage>1359</fpage>&#x02013;<lpage>1378</lpage>. <pub-id pub-id-type="doi">10.1016/s0042-6989(01)00045-1</pub-id><pub-id pub-id-type="pmid">11322980</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Walther</surname> <given-names>D.</given-names></name> <name><surname>Koch</surname> <given-names>C.</given-names></name></person-group> (<year>2006</year>). <article-title>Modeling attention to salient proto-objects</article-title>. <source>Neural Netw.</source> <volume>19</volume>, <fpage>1395</fpage>&#x02013;<lpage>1407</lpage>. <pub-id pub-id-type="doi">10.1016/j.neunet.2006.10.001</pub-id><pub-id pub-id-type="pmid">17098563</pub-id></citation></ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Weber</surname> <given-names>M. D.</given-names></name> <name><surname>Leemis</surname> <given-names>L. M.</given-names></name> <name><surname>Kincaid</surname> <given-names>R. K.</given-names></name></person-group> (<year>2006</year>). <article-title>Minimum Kolmogorov&#x02013;Smirnov test statistic parameter estimates</article-title>. <source>J. Stat. Comput. Simul.</source> <volume>76</volume>, <fpage>195</fpage>&#x02013;<lpage>206</lpage>. <pub-id pub-id-type="doi">10.1080/00949650412331321098</pub-id></citation>
</ref>
<ref id="B54">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wolfe</surname> <given-names>J. M.</given-names></name> <name><surname>Gray</surname> <given-names>W.</given-names></name></person-group> (<year>2007</year>). <article-title>&#x0201C;Guided search 4.0,&#x0201D;</article-title> in <source>Integrated Models of Cognitive Systems</source>, ed W. D. Gray (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>), <fpage>99</fpage>&#x02013;<lpage>119</lpage>. <pub-id pub-id-type="doi">10.1093/acprof:oso/9780195189193.003.0008</pub-id></citation>
</ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>X.</given-names></name> <name><surname>Zhaoping</surname> <given-names>L.</given-names></name> <name><surname>Zhou</surname> <given-names>T.</given-names></name> <name><surname>Fang</surname> <given-names>F.</given-names></name></person-group> (<year>2012</year>). <article-title>Neural activities in V1 create a bottom-up saliency map</article-title>. <source>Neuron</source> <volume>73</volume>, <fpage>183</fpage>&#x02013;<lpage>192</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuron.2011.10.035</pub-id><pub-id pub-id-type="pmid">26879771</pub-id></citation></ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhao</surname> <given-names>Q.</given-names></name> <name><surname>Koch</surname> <given-names>C.</given-names></name></person-group> (<year>2011</year>). <article-title>Learning a saliency map using fixated locations in natural scenes</article-title>. <source>J. Vis.</source> <volume>11</volume>, <fpage>9</fpage>. <pub-id pub-id-type="doi">10.1167/11.3.9</pub-id><pub-id pub-id-type="pmid">21393388</pub-id></citation></ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhaoping</surname> <given-names>L.</given-names></name> <name><surname>May</surname> <given-names>K. A.</given-names></name></person-group> (<year>2007</year>). <article-title>Psychophysical tests of the hypothesis of a bottom-up saliency map in primary visual cortex</article-title>. <source>PLoS Comput. Biol.</source> <volume>3</volume>, <fpage>e62</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.0030062</pub-id><pub-id pub-id-type="pmid">17411335</pub-id></citation></ref>
</ref-list> 
</back>
</article> 