<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Robot. AI</journal-id>
<journal-title>Frontiers in Robotics and AI</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Robot. AI</abbrev-journal-title>
<issn pub-type="epub">2296-9144</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/frobt.2016.00005</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Robotics and AI</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Just Imagine! Learning to Emulate and Infer Actions with a Stochastic Generative Architecture</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Schrodt</surname> <given-names>Fabian</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="cor1">&#x0002A;</xref>
<uri xlink:href="http://frontiersin.org/people/u/177694"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Butz</surname> <given-names>Martin V.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://frontiersin.org/people/u/58395"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Cognitive Modeling, Department of Computer Science, University of T&#x000FC;bingen</institution>, <addr-line>T&#x000FC;bingen</addr-line>, <country>Germany</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Guido Schillaci, Humboldt University of Berlin, Germany</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Lorenzo Jamone, Instituto Superior Tecnico, Portugal; Ugo Pattacini, Istituto Italiano di Tecnologia, Italy; Felix Reinhart, Bielefeld University, Germany</p></fn>
<corresp content-type="corresp" id="cor1">&#x0002A;Correspondence: Fabian Schrodt, <email>tobias-fabian.schrodt&#x00040;uni-tuebingen.de</email></corresp>
<fn fn-type="other" id="fn001"><p>Specialty section: This article was submitted to Humanoid Robotics, a section of the journal Frontiers in Robotics and AI</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>04</day>
<month>03</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="collection">
<year>2016</year>
</pub-date><volume>3</volume>
<elocation-id>5</elocation-id>
<history>
<date date-type="received">
<day>08</day>
<month>10</month>
<year>2015</year>
</date>
<date date-type="accepted">
<day>09</day>
<month>02</month>
<year>2016</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2016 Schrodt and Butz.</copyright-statement>
<copyright-year>2016</copyright-year>
<copyright-holder>Schrodt and Butz</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Theories on embodied cognition emphasize that our mind develops by processing and inferring structures given the encountered bodily experiences. Here, we propose a distributed neural network architecture that learns a stochastic generative model from experiencing bodily actions. Our modular system learns from various manifolds of action perceptions in the form of (i) relative positional motion of the individual body parts, (ii) angular motion of joints, and (iii) relatively stable top-down action identities. By Hebbian learning, this information is spatially segmented in separate neural modules that provide embodied state codes and temporal predictions of the state progression inside and across the modules. The network is generative in space and time, thus being able to predict both, missing sensory information and next sensory information. We link the developing encodings to visuomotor and multimodal representations that appear to be involved in action observation. Our results show that the system learns to infer action types and motor codes from partial sensory information by emulating observed actions with the own developing body model. We further evaluate the generative capabilities by showing that the system is able to generate internal imaginations of the learned types of actions without sensory stimulation, including visual images of the actions. The model highlights the important roles of motor cognition and embodied simulation for bootstrapping action understanding capabilities. We conclude that stochastic generative models appear very suitable for both, generating goal-directed actions and predicting observed visuomotor trajectories and action goals.</p>
</abstract>
<kwd-group>
<kwd>artificial neural networks</kwd>
<kwd>mental imagery</kwd>
<kwd>embodied simulation</kwd>
<kwd>sensorimotor learning</kwd>
<kwd>generative model</kwd>
<kwd>action understanding</kwd>
<kwd>action emulation</kwd>
<kwd>Bayesian inference</kwd>
</kwd-group>
<contract-num rid="cn02">Open Access Publishing Fund</contract-num>
<contract-num rid="cn04">EIA-0196217</contract-num>
<contract-sponsor id="cn01">Deutsche Forschungsgemeinschaft<named-content content-type="fundref-id">10.13039/501100001659</named-content></contract-sponsor>
<contract-sponsor id="cn02">Eberhard Karls Universit&#x000E4;t T&#x000FC;bingen<named-content content-type="fundref-id">10.13039/501100002345</named-content></contract-sponsor>
<contract-sponsor id="cn03">Landesgraduiertenf&#x000F6;rderung Baden-W&#x000FC;rttemberg</contract-sponsor>
<contract-sponsor id="cn04">National Science Foundation<named-content content-type="fundref-id">10.13039/100000001</named-content></contract-sponsor>
<counts>
<fig-count count="11"/>
<table-count count="1"/>
<equation-count count="11"/>
<ref-count count="70"/>
<page-count count="15"/>
<word-count count="11692"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="introduction">
<label>1</label> <title>Introduction</title>
<p>It appears that humans are particularly good at learning by imitation, gaze following, social referencing, and gestural communication from very early on (Tomasello, <xref ref-type="bibr" rid="B67">1999</xref>). Inherently, the observation of others is involved in all of these forms of social learning. Learning by imitation, for instance, is assumed to develop from pure mimicking of bodily movements toward the inference and emulation of the intended goals of others from about 1&#x02009;year of age onward (Carpenter et al., <xref ref-type="bibr" rid="B8">1998</xref>; Want and Harris, <xref ref-type="bibr" rid="B70">2002</xref>; Elsner, <xref ref-type="bibr" rid="B14">2007</xref>). Yet <italic>how</italic> are goals and intentions inferred from visual observations, and how does this facilitate the activation of the respective motor commands for imitation? The intercommunication between specific brain regions, which are often referred to as mirror neuron system or action observation network, has been suggested to enable this inference of others&#x02019; intentions and imitation of their behavior (Buccino et al., <xref ref-type="bibr" rid="B6">2004</xref>; Rizzolatti and Craighero, <xref ref-type="bibr" rid="B57">2004</xref>, <xref ref-type="bibr" rid="B58">2005</xref>; Iacoboni, <xref ref-type="bibr" rid="B36">2005</xref>, <xref ref-type="bibr" rid="B37">2009</xref>; Iacoboni and Dapretto, <xref ref-type="bibr" rid="B38">2006</xref>; Kilner et al., <xref ref-type="bibr" rid="B42">2007</xref>). While a genetic predisposition may supply the foundation to develop such a system (Rizzolatti and Craighero, <xref ref-type="bibr" rid="B57">2004</xref>; Ferrari et al., <xref ref-type="bibr" rid="B16">2006</xref>; Lepage and Th&#x000E9;oret, <xref ref-type="bibr" rid="B47">2007</xref>; Bonini and Ferrari, <xref ref-type="bibr" rid="B4">2011</xref>; Casile et al., <xref ref-type="bibr" rid="B9">2011</xref>), its development &#x02013; <italic>per se</italic> &#x02013; seems to be strongly determined by social interaction (Meltzoff, <xref ref-type="bibr" rid="B48">2007</xref>; Heyes, <xref ref-type="bibr" rid="B35">2010</xref>; Nagai et al., <xref ref-type="bibr" rid="B49">2011</xref>; Froese et al., <xref ref-type="bibr" rid="B19">2012</xref>; Saby et al., <xref ref-type="bibr" rid="B59">2012</xref>), sensorimotor experience, motor cognition, and embodiment (Gallese and Goldman, <xref ref-type="bibr" rid="B23">1998</xref>; Catmur et al., <xref ref-type="bibr" rid="B10">2007</xref>; Gallese, <xref ref-type="bibr" rid="B21">2007a</xref>; Gallese et al., <xref ref-type="bibr" rid="B24">2009</xref>). Due to observations such as the foregoing, cognitive science has recently undergone a pragmatic turn, focusing on the enactive roots of cognition (Engel et al., <xref ref-type="bibr" rid="B15">2013</xref>).</p>
<p>Embodied cognitive states, according to Barsalous simulation hypothesis (Barsalou, <xref ref-type="bibr" rid="B2">1999</xref>, <xref ref-type="bibr" rid="B3">2008</xref>), are situated simulations that temporarily activate &#x02013; or re-enact &#x02013; particular events by means of a set of embodied modal codes. However, if mental states are grounded in own-bodily experiences and self-observations, how does the brain establish the correspondence to the observation of others in the first place? We have recently shown that this so-called correspondence problem [cf. Heyes (<xref ref-type="bibr" rid="B34">2001</xref>) and Dautenhahn and Nehaniv (<xref ref-type="bibr" rid="B13">2002</xref>)] can be solved by an embodied neural network model that is adapting to the individual perspectives of others (Schrodt et al., <xref ref-type="bibr" rid="B63">2015</xref>). This model clustered sensorimotor contingencies and learned about their progress in a single competitive layer composed of cells with multimodal tuning, enabling it to infer proprioceptive equivalents to visual observations while taking an actors perspective.</p>
<p>In this paper, we propose a stochastic variant of the clustering algorithm, which we introduced in our previous work, that is generative in multiple, distributed domains. The system can be considered to develop several hidden Markov models from scratch and incorporates them by integrating conditional state transition probabilities statistically. It thereby learns an embodied action model that is able to simulate forward in time consistent visual-proprioceptive self-perceptions. This bodily grounded simulation is primed when observing biological motion patterns, leading to the ability to re-enact the observed behavior using the own embodied codes. Hence, our model supports the view that mental states are embodied simulations [cf. Gallese (<xref ref-type="bibr" rid="B22">2007b</xref>)] and provides an explanation to how the perception of others&#x02019; actions can be consistently incorporated with the own action experiences when encoded at distributed neural sites.</p>
<p>Our model can be compared to an action observation network, in that it models the processing of (i) visual motion signals, believed to be processed in the superior temporal sulcus; (ii) spatiotemporal motor codes, which can be related to neural activities in the posterior parietal lobule and the premotor cortex, and (iii) compressed, intentional action codes, which have been associated with neural activities in the inferior frontal gyrus [see, e.g., Iacoboni (<xref ref-type="bibr" rid="B36">2005</xref>), Kilner (<xref ref-type="bibr" rid="B41">2011</xref>), and Turella et al. (<xref ref-type="bibr" rid="B68">2013</xref>)]. Accordingly, we train and evaluate a tripartite network structure, interpreting and referring to (i) relative positional body motion as <italic>visual</italic> biological motion stimuli, (ii) joint angular motion as <italic>motor</italic> codes, and (iii) action identities as <italic>intentions</italic> or <italic>goals</italic> in our experiments. In doing so, we focus on bodily movements, including walking, running, and playing basketball, where the stimuli originate from motion captures of human subjects. Despite the simplicity of these stimuli, our results show that it is possible to identify compressed intention codes from observing biological motion patterns and to concurrently infer consistent motor emulations of observed actions using distributed, bodily grounded encodings. Analogously, actions can be simulated in visual and motor modalities when only an intention prior is provided, offering a possible explanation to how simulation processes may drive forth goal-directed and imitative behavior, and link it to social learning.</p>
<p>In the following, we refer to related work in Section <xref ref-type="sec" rid="S2">2</xref> and specify the model architecture, including its modularized structure as well as the probabilistic learning and information processing mechanisms in Section <xref ref-type="sec" rid="S3">3</xref>. We then describe the motion capture stimuli, the bottom-up processing, and clarify the connection of the resulting perceptions to encodings involved in action understanding in Section <xref ref-type="sec" rid="S4">4</xref>. The model is evaluated on motion tracking data, showing action inference, completion, and imagination capabilities in Section <xref ref-type="sec" rid="S5">5</xref>. Finally, we discuss current challenges and future application options in Section <xref ref-type="sec" rid="S6">6</xref>.</p>
</sec>
<sec id="S2">
<label>2</label> <title>Related Work</title>
<p>Lallee and Dominey (<xref ref-type="bibr" rid="B45">2013</xref>) implemented a model that integrates low-level sensory data of an iCub robot, encoding multimodal contingencies in a single, 3D, and self-organizing competitive map. When driven by a single modal stimulus, this multimodal integration enables mental imagery of corresponding perceptions in other modalities. In accordance with findings from neuroscience, the modeled self-organizing map is topographic with respect to its discrete multimodal cell tunings. The states generated by our model can also be embedded in metric spaces. In contrast, however, our model encodes modal prototype vectors separately and activates them stochastically. This allows to encode multimodal perceptions without redundancies. Moreover, it enables the resolution of ambiguities over time by predictive interactions between the encoded modalities. Our results show that cells can be activated by multimodal perceptions without necessarily encoding multimodal stimuli locally, while moreover being able to encode specific actions by means of distributed temporal statistics.</p>
<p>Taylor et al. (<xref ref-type="bibr" rid="B65">2006</xref>) implemented a stochastic generative neural network model based on conditional restricted Boltzmann machines (RBMs). When trained on motion captures similar to those used in our evaluations, the model is able to reproduce walking and running movements as well as transitions between them in terms of sequences of angular postures. Although the encoding capacity of RBMs is theoretically superior in comparison to Markov state-based models because they encode multidimensional state variables, the experiments show the typical tradeoff of requiring considerably more training trials and randomized sampling. Our model is able to expand its encoding capacity on-demand and thus avoids both a sampling and frequency bias. Our model, nevertheless, accounts for scalability and encoding capacity since states are distributed over several Markov models. This enables to learn modal state transition densities locally and to reconcile them with sensory signals and cross-modal predictions as required.</p>
<p>Comparable to the realization by Baker et al. (<xref ref-type="bibr" rid="B1">2009</xref>) of a qualitative cognitive model suggested by Gergely et al. (<xref ref-type="bibr" rid="B26">1995</xref>), intention inferences in our model are based on Bayesian statistics given visually observed action sequences. In contrast, our model learns the sensorimotor contingencies that facilitate this inference without relying on specific behavioral rationality assumptions. Comparably, the intention priors in our model are statistically determined by assessing the own behavioral biases during an embodied training phase. Thereby, our experiments are based on the assumption that an observer expects an actor to behave in the same way they would behave &#x02013; that is, by inferring cross-modal observation equivalences based on the own-bodily experiences &#x02013; and thus essentially models the development of social cognition [cf. Meltzoff (<xref ref-type="bibr" rid="B48">2007</xref>)].</p>
<p>Similar to Friston et al. (<xref ref-type="bibr" rid="B17">2011</xref>), our neural network models action understanding by inferring higher level, compact action codes, given lower level sensory motion signals. However, in contrast to Friston et al. (<xref ref-type="bibr" rid="B17">2011</xref>), no motion primitives are provided, but they are learned in the form of intention clusters, which integrate sensory&#x02013;motor information over space and time.</p>
</sec>
<sec id="S3">
<label>3</label> <title>Neural Network Architecture</title>
<p>The stochastic generative neural model consists of several stochastic neural layers or modules, which process information in identical fashion. The layers can be arranged hierarchically and connected selectively. Each layer calculates a normalized, discrete probability density estimate for the determination of a state in a specific state space. Each neuron corresponds to a possible state, and the binary activation of a single cell corresponds to the determination of that state. The neurons are activated by developing and incorporating prototype tunings and temporal state predictions. Each neuron sends intramodular state transition predictions to the other neurons in the layer and cross-modular predictions to associated layers, such that the distributed states are able to develop self-preserving, generative temporal dynamics. The development of these predictions can be compared to predictive coding (Rao and Ballard, <xref ref-type="bibr" rid="B56">1999</xref>) and results in a Hebbian learning rule similar to Oja&#x02019;s rule (Oja, <xref ref-type="bibr" rid="B50">1989</xref>) as described in Section <xref ref-type="sec" rid="S3-3">3.3</xref>.</p>
<p>Figure <xref ref-type="fig" rid="F1">1</xref> shows the particular network architecture developed here. Referring to the human action observation network, three layers of this kind interact with each other in a hierarchy of two levels: at the bottom level, a <italic>vision layer</italic> processes bottom-up visual motion cues and predicts the continuation of this visual motion over time as well as corresponding action intentions and motor codes. Further, a <italic>motor layer</italic> processes bottom-up proprioceptions of joint angular motion and predicts the continuation of these signals over time as well as corresponding action intentions and visual motion. Finally, at the top level, an <italic>intention layer</italic> encodes the individual actions for which the system is trained on, predicts possible action transitions over time, and top-down the corresponding vision and motor layer states that may be active during a particular action. Hence, at the bottom level, top-down and generative activities are fused with bottom-up sensory signals, in common with the intramodular and cross-modular predictions generated by the bottom layers themselves. In a context where each bottom module represents a specific modality, the intramodular predictions can be considered to represent the expected state progression in the respective modality, while cross-modular predictions implement cross-modal inferences. The cross-modular predictions enable the inference of motor and intention codes from visual observations during action observation, where only visual motion cues are available.</p>
<fig position="float" id="F1">
<label>Figure 1</label>
<caption><p><bold>Architecture overview in the context of action inference and simulation</bold>. The model consists of three stochastic layers: a <italic>vision layer</italic>, a <italic>motor layer</italic>, and an <italic>intention layer</italic>. All layers predict the next state in other layers (red arrows) and the next state in the same layer (blue arrows). The vision and motor layers can be driven by sensory, bottom-up signals (green arrows 1 and 2), while the top layer can be driven by top-down signal input (green arrow 3). Normalization of a layer input is indicated by the circled &#x003A3;.</p></caption>
<graphic xlink:href="frobt-03-00005-g001.tif"/>
</fig>
<p>The streams of sensory information are assumed to be provided by populations of locally receptive cells with tuning to specific stimuli, which is in accordance with findings in neuroscience (Pouget et al., <xref ref-type="bibr" rid="B54">2000</xref>). These populations essentially forward the information by means of a full connection to the bottom stochastic layer that reflects the corresponding modality. Section <xref ref-type="sec" rid="S4">4</xref> elaborates further on how the respective perceptions and stimuli are encoded and how they can be related to an action observation network. This encoding has been published recently as part of a perspective-inference model given dynamic motion patterns (Schrodt et al., <xref ref-type="bibr" rid="B63">2015</xref>). The following sections thus focus on the stochastic neural layers on top of the populations.</p>
<sec id="S3-1">
<label>3.1</label> <title>Stochastic Neural Layers</title>
<p>Each stochastic neural layer learns a discrete, prototypic representation of the provided sensory input information. To do so, the layer grows a set of cells on demand with distinct sensory tunings. The recruitment of cells and adaptation of prototypes is accomplished by unsupervised mechanisms as explained in Section <xref ref-type="sec" rid="S3-2">3.2</xref>. Each cell in a layer learns predictions of the temporal progress of these prototypic state estimates in the layer. Furthermore, each cell learns to predict the cell activations that may be observed in other, associated layers, which is explained in Section <xref ref-type="sec" rid="S3-3">3.3</xref>. An exemplary stochastic neural layer connected to another layer in this way, together with the neural populations that forward sensory signals is shown in Figure <xref ref-type="fig" rid="F2">2</xref>. In the following, the determination of states and incorporation of predictions is formalized.</p>
<fig position="float" id="F2">
<label>Figure 2</label>
<caption><p><bold>Two stochastic state layers in hierarchical compound</bold>. The bottom-up sensory state recognition signal is provided by multiple populations of tuned cells. The signals <italic>P<sub>k</sub></italic><sub>&#x0007C;</sub><italic><sub>S</sub></italic> provided by the match <italic>m<sub>k</sub></italic> between cell prototypes <inline-formula><mml:math id="M18"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>w</mml:mi><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and the current stimulus <inline-formula><mml:math id="M19"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>a</mml:mi><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mi>M</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are indicated by the green connection diagram. Analogously, top-down recognition signals can be defined. Lateral recurrences in blue represent state transition probabilities, including the self-recurrence to preserve a state. Red lines denote cross-modular predictions of states. Arrow head lines indicate signals that are summed up by a cell, while bullet head lines indicate modulations of cell inputs [cf. equation <xref ref-type="disp-formula" rid="E2">(2)</xref>].</p></caption>
<graphic xlink:href="frobt-03-00005-g002.tif"/>
</fig>
<p>The layers in our model simplify competitive neural processes such that only a single cell in each layer is activated at the same time. Cell activations are binary and represent the event that a specific state in the corresponding state space is determined. This is comparable to a winner-takes-all approach [cf. Grossberg (<xref ref-type="bibr" rid="B29">1973</xref>), for evaluations]. However, the determination of the state in each layer depends on a fusion of predictive intramodular and cross-modular probabilities and sensory state recognition probabilities. By stochastic sampling, a single cell is selected as <italic>competition winner</italic> in each time step, where the winning probability is determined by the fused inputs to each cell. In the process, the input vector to a layer depicts a discrete probability density for the stochastic event of observing a particular state. For this reason, each layer uses a specific normalization of incoming signals that ensures that the all signals sum up to 1.</p>
<p>We denote cells inside a layer by an index set <italic>M</italic> and cells outside by an index set <italic>N</italic>. The binary output <italic>x<sub>k</sub></italic>(<italic>t</italic>) of a state cell indexed <italic>k</italic>&#x02009;&#x02208;&#x02009;<italic>M</italic> is determined by the normalized probability term
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable columnalign="left" class="align"><mml:mtr><mml:mtd columnalign="right" class="align-odd"><mml:mtable columnalign="right left" class="split"><mml:mtr class="split-mtr"><mml:mtd class="split-mtd"><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mtd><mml:mtd class="split-mtd"><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003E;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mtext>&#x02009;</mml:mtext><mml:mo class="MathClass-rel">&#x02200;</mml:mo><mml:mi>j</mml:mi><mml:mo class="MathClass-rel">&#x02260;</mml:mo><mml:mi>k</mml:mi><mml:mo class="MathClass-punc">,</mml:mo><mml:mi>j</mml:mi><mml:mo class="MathClass-rel">&#x02208;</mml:mo><mml:mi>M</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr class="split-mtr"><mml:mtd class="split-mtd"></mml:mtd><mml:mtd class="split-mtd"><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfrac><mml:mrow><mml:mtext>ne</mml:mtext><mml:msub><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo class="MathClass-rel">&#x02208;</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mtext>ne</mml:mtext><mml:msub><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where <italic>X<sub>k</sub></italic>(<italic>t</italic>) denotes the winning event probability, and <italic>x<sub>k</sub></italic>(<italic>t</italic>)&#x02009;&#x02208;&#x02009;{0,1} denotes the realization of this probability or abstract, binary cell activation calculated by stochastic sampling at time step <italic>t</italic>. The input net<italic><sub>k</sub></italic>(<italic>t</italic>) to the cell <italic>k</italic> is provided by the probability fusion
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:msub><mml:mrow><mml:mtext>net</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mi>C</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:mfenced><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mi>I</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></disp-formula>
where <italic>P<sub>k</sub></italic><sub>&#x0007C;</sub><italic><sub>S</sub></italic>(<italic>t</italic>) is a <italic>sensory</italic> (S) recognition signal depicting the probability that the state <italic>k</italic> is considered the current observation given sensory inputs, <italic>P<sub>k</sub></italic><sub>&#x0007C;</sub><italic><sub>I</sub></italic>(<italic>t</italic>) is the <italic>intramodular</italic> (I) prediction of the successor state, and <italic>P<sub>k</sub></italic><sub>&#x0007C;</sub><italic><sub>C</sub></italic>(<italic>t</italic>) is the <italic>cross-modular</italic> (C) prediction of the succession, defined by
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mi>I</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x0220F;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:munder><mml:mn>1</mml:mn></mml:mstyle></mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:mn>1</mml:mn><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mn>1</mml:mn><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:mfenced></mml:math></disp-formula>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mi>C</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x02211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mn>1</mml:mn><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Taken together, equation <xref ref-type="disp-formula" rid="E2">(2)</xref> firstly fuses probabilistic sensory recognition signals with probabilistic cross-modular predictions coming in from the last winner cells of other layers. Then, it restricts the activation of cells to probabilistic intramodular predictions propagated from the last winner cell in the layer to all potential successors (including the last winner itself), as indicated in Figure <xref ref-type="fig" rid="F2">2</xref>.</p>
<p>The sensory recognition probability <italic>P<sub>k</sub></italic><sub>&#x0007C;</sub><italic><sub>S</sub></italic>(<italic>t</italic>) is also responsible for clustering the sensory streams into discrete, prototypic states. In the following, we explain the segmentation by unsupervised Hebbian learning.</p>
</sec>
<sec id="S3-2">
<label>3.2</label> <title>Segmentation and Recognition of Population-Encoded Activations</title>
<p>For generating the above binary stochastic cells, we use an instar algorithm that is capable of unsupervised segmentation of normalized vector spaces similar to Grossberg&#x02019;s Adaptive Resonance Theory (Grossberg, <xref ref-type="bibr" rid="B30">1976a</xref>,<xref ref-type="bibr" rid="B31">b</xref>,<xref ref-type="bibr" rid="B32">c</xref>). In contrast, our approach provides state recognition probabilities and can thus be applied to implement non-deterministic learning and recognition. Another difference to common implementations is that cell prototypes are created on demand and initialized with zero vectors.</p>
<p>We define the sensory recognition probability <italic>P<sub>k</sub></italic><sub>&#x0007C;</sub><italic><sub>S</sub></italic>(<italic>t</italic>) of a state <italic>k</italic>&#x02009;&#x02208;&#x02009;<italic>M</italic> as a function of the congruence or match <italic>m<sub>k</sub></italic>(<italic>t</italic>) between a state cell&#x02019;s prototype vector <inline-formula><mml:math id="M24"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>w</mml:mi><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and the current activation vector <inline-formula><mml:math id="M20"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>a</mml:mi><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mi>M</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>(t) jointly provided by all population cells. The concatenated population activation dedicated to a state layer is assumed to be normalized to length 1. Since the model is designed for a separate learning and testing phase, we provide separate recognition functions, assuming full sensory confidence during training, and some sensory uncertainty during testing, which generally means observing previously unseen data. During training, this assumption inevitably results in the sensory recognition of the best matching state via
<disp-formula id="E5"><label>(5)</label><mml:math id="M5"><mml:mtable columnalign="left" class="align"><mml:mtr><mml:mtd columnalign="left" class="align-odd"><mml:msubsup><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mtext>training</mml:mtext></mml:mrow></mml:msubsup><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfenced separators="" open="{" close=""><mml:mrow><mml:mtable equalrows="false" columnlines="none" equalcolumns="false" class="array"><mml:mtr><mml:mtd columnalign="left" class="align-odd"><mml:mn>1</mml:mn></mml:mtd><mml:mtd columnalign="left" class="align-odd"><mml:mtext>if</mml:mtext><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x02265;</mml:mo><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mtext>&#x02009;</mml:mtext><mml:mo class="MathClass-rel">&#x02200;</mml:mo><mml:mi>l</mml:mi><mml:mo class="MathClass-rel">&#x02208;</mml:mo><mml:mi>M</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="split-mtd"><mml:mn>0</mml:mn><mml:mspace width="1em" class="quad"/></mml:mtd><mml:mtd columnalign="left" class="align-odd"><mml:mtext>else</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mfenced></mml:mtd><mml:mtd class="align-even"></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
as well as a sensory recognition that is distributed over all states during testing, which we define by
<disp-formula id="E6"><label>(6)</label><mml:math id="M6"><mml:msubsup><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mtext>testing</mml:mtext></mml:mrow></mml:msubsup><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mi>&#x003B2;</mml:mi><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:mfrac><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo class="MathClass-bin">&#x0002B;</mml:mo><mml:mi mathvariant="normal">exp</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>&#x003BA;</mml:mn><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula>
where <italic>&#x003BA;</italic> denotes an uncertainty measure for sensory data, and <italic>&#x003B2;</italic> denotes the maximum sensor confidence. The prototype match to the current stimulus is described by
<disp-formula id="E7"><label>(7)</label><mml:math id="M7"><mml:msub><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfenced separators="" open="{" close=""><mml:mrow><mml:mtable equalrows="false" columnlines="none" equalcolumns="false" class="array"><mml:mtr><mml:mtd class="array" columnalign="left"><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo class="MathClass-op">&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x02299;</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mo class="MathClass-op">&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mo class="MathClass-op">&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:mo class="MathClass-rel">&#x0007C;</mml:mo></mml:mrow></mml:mfrac><mml:mspace width="1em" class="quad"/></mml:mtd><mml:mtd class="array" columnalign="left"><mml:mtext>if&#x02009;</mml:mtext><mml:mi>k</mml:mi><mml:mtext>&#x02009;is&#x02009;</mml:mtext><mml:mi mathvariant="italic">recruited</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd class="array" columnalign="left"><mml:mi>&#x003B8;</mml:mi><mml:mspace width="1em" class="quad"/></mml:mtd><mml:mtd class="array" columnalign="left"><mml:mtext>if&#x02009;</mml:mtext><mml:mi>k</mml:mi><mml:mtext>&#x02009;is&#x02009;</mml:mtext><mml:mi mathvariant="italic">free</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mfenced><mml:mo class="MathClass-rel">&#x02208;</mml:mo><mml:mrow><mml:mo class="MathClass-open">[</mml:mo><mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn><mml:mo class="MathClass-punc">,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">]</mml:mo></mml:mrow></mml:math></disp-formula>
where &#x02299; denotes the scalar product, such that the match function is based on the angular match between the normalized prototype vector <inline-formula><mml:math id="M21"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>w</mml:mi><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> encoded in cell <italic>k</italic> and the current normalized stimulus <inline-formula><mml:math id="M22"><mml:mrow><mml:msub><mml:mover accent='true'><mml:mi>a</mml:mi><mml:mo>&#x02192;</mml:mo></mml:mover><mml:mi>M</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>(t). Each layer expands its capacity on demand, comparable to Growing Neural Gas by Fritzke (<xref ref-type="bibr" rid="B18">1995</xref>). When a cell has fired a sensory recognition signal [<italic>P<sub>k</sub></italic><sub>&#x0007C;</sub><italic><sub>S</sub></italic>(<italic>t</italic>)&#x02009;&#x0003D;&#x02009;1] once during training, it is converted from a <italic>free cell</italic> to a <italic>recruited cell</italic> in the sense that its prototype vector is adapted from zero to the current stimulus [following the learning rule in equation <xref ref-type="disp-formula" rid="E8">(8)</xref>]. The match of a free cell is fixed to <italic>&#x003B8;</italic>, such that when no cell match is greater than <italic>&#x003B8;</italic>, the free pattern is recruited and another free cell is created with zero vector prototype. Thus, we call <italic>&#x003B8;</italic> the <italic>recruitment threshold</italic> in the following. Assuming a small learning rate, we can ensure that each training input is encoded in the network with a tolerance mismatch of <italic>&#x003B8;</italic>, irrespective of the amount of data, the presentation order, or frequency. Further, it was suggested previously that adding noise to the match function introduces a specific degree of noise robustness to this segmentation algorithm during training (Schrodt et al., <xref ref-type="bibr" rid="B63">2015</xref>).</p>
<p>Prototype vectors of cells are trained to represent the current population activation using the Hebbian inspired instar learning rule:
<disp-formula id="E8"><label>(8)</label><mml:math id="M8"><mml:mo class="MathClass-rel">&#x02207;</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mo class="MathClass-op">&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo class="MathClass-op">&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mo class="MathClass-op">&#x02192;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:math></disp-formula>
where <italic>&#x003B7;<sub>s</sub></italic> denotes the spatial learning rate. Since learning is gated by the binary cell realization <italic>x<sub>k</sub></italic>(<italic>t</italic>), only the prototype of the winner cell is adapted.</p>
<p>During testing, the sensory recognition function [equation <xref ref-type="disp-formula" rid="E6">(6)</xref>] ensures the distribution of sensory state recognition probabilities over all stochastic cells rather than a single one to account for sensory uncertainty. Perfectly matching cells are recognized with probability <italic>&#x003B2;</italic> (before normalization), whereas the probability to recognize states not perfectly in the center of the stimulus decreases in dependency on <italic>&#x003BA;</italic> and the mismatch. This means also that when no learned prototype matches sufficiently well during testing, the sensory recognition distribution becomes nearly uniform, such that intramodular and cross-modular predictions gain a relatively strong influence on the determination of the current state [cf. equation <xref ref-type="disp-formula" rid="E2">(2)</xref>]. Therefore, the network is able to dynamically switch from a bottom-up driven state recognition to a forward simulation of the state progression when sensory information is unknown or uncertain. In the following, we detail how intramodular and cross-modular predictions can be learned by a Hebbian learning rule that is equivalent to Bayesian inference.</p>
</sec>
<sec id="S3-3">
<label>3.3</label> <title>Learning Intramodular and Cross-Modular Predictions</title>
<p>Upon winning, a cell learns to predict which observations will be made next in the same and in other layers. This is realized by asymmetric bidirectional recurrences between cells in a layer, representing the intramodular predictions <italic>P<sub>k</sub></italic><sub>&#x0007C;</sub><italic><sub>I</sub></italic>(<italic>t</italic>), and between cells of two layers, representing the cross-modular predictions <italic>P<sub>k</sub></italic><sub>&#x0007C;</sub><italic><sub>C</sub></italic>(<italic>t</italic>). Intramodular recurrences propagate the state transition probability from the last winner to all cells in the same layer and thus implement a discrete-time Markov chain, where Markov states are learned from scratch during the training procedure. Cross-modular connections bias the state transition probability density in other layers, given the current sensory observation, by means of temporal Bayesian inference.</p>
<p>Taken together, in a fully connected architecture, intramodular and cross-modular state predictions are represented by a full connection between all state cells in the network (including self-recurrences). These connections generally encode conditional probabilities for the subsequent observation of specific states. They can be learned by Bayesian statistics, which would result in asymmetric weights.</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="M9"><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">ij</mml:mtext></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mn>1</mml:mn><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="E10"><label>(10)</label><mml:math id="M10"><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">ji</mml:mtext></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mn>1</mml:mn><mml:mo class="MathClass-rel">&#x0007C;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula>
<p>To derive a neurally more plausible learning rule to train a weight from cell <italic>i</italic> to cell <italic>j</italic>, we transpose the derivative of this formula with respect to time:
<disp-formula id="E11"><label>(11)</label><mml:math id="M11"><mml:mtable columnalign="left" class="align"><mml:mtr><mml:mtd columnalign="right" class="align-odd"></mml:mtd><mml:mtd class="align-even"><mml:mfrac><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">ij</mml:mtext></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x02202;t</mml:mi></mml:mrow></mml:mfrac><mml:mspace width="2em"/></mml:mtd><mml:mtd columnalign="right" class="align-label"></mml:mtd><mml:mtd class="align-label"><mml:mspace width="2em"/></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="right" class="align-odd"></mml:mtd><mml:mtd class="align-even"><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfrac><mml:mrow><mml:mfrac><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x02202;t</mml:mi></mml:mrow></mml:mfrac><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mfrac><mml:mrow><mml:mi>&#x02202;</mml:mi><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x02202;t</mml:mi></mml:mrow></mml:mfrac><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:mfenced></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac><mml:mspace width="2em"/></mml:mtd><mml:mtd columnalign="right" class="align-label"></mml:mtd><mml:mtd class="align-label"><mml:mspace width="2em"/></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="right" class="align-odd"></mml:mtd><mml:mtd class="align-even"><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:mfenced></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac><mml:mspace width="2em"/></mml:mtd><mml:mtd columnalign="right" class="align-label"></mml:mtd><mml:mtd class="align-label"><mml:mspace width="2em"/></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="right" class="align-odd"></mml:mtd><mml:mtd class="align-even"><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">ij</mml:mtext></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mspace width="2em"/></mml:mtd><mml:mtd columnalign="right" class="align-label"></mml:mtd><mml:mtd class="align-label"><mml:mspace width="2em"/></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="right" class="align-odd"></mml:mtd><mml:mtd class="align-even"><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">ij</mml:mtext></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:mfenced></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mspace width="2em"/></mml:mtd><mml:mtd columnalign="right" class="align-label"></mml:mtd><mml:mtd class="align-label"><mml:mspace width="2em"/></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="right" class="align-odd"></mml:mtd><mml:mtd class="align-even"><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x022C5;</mml:mo><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">ij</mml:mtext></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow><mml:mo class="MathClass-punc">,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>With the predictive learning rate <italic>&#x003B7;<sub>p</sub></italic> set constant, this is a temporal variant of Oja&#x02019;s associative learning rule (Oja, <xref ref-type="bibr" rid="B50">1989</xref>), also referred to as outstar learning rule. Thus, this form of Hebbian learning is equivalent to Bayesian inference under the assumption of a learning rate that decays inversely proportional to the number of activations of the preceding cell <italic>i</italic>. In this case, each cell calculates the average of all observed (temporally) conditional probability densities in the same and other layers. However, since the states are adapted simultaneously with the learning of state conditionals, it is advantageous to implement a form of forgetting. Hence, we define the learning rate by <inline-formula><mml:math id="M12"><mml:msub><mml:mrow><mml:mi>&#x003B7;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo class="MathClass-rel">&#x0003D;</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mfenced separators="" open="(" close=")"><mml:mrow><mml:msub><mml:mrow><mml:mo class="MathClass-op">&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x02009;</mml:mtext><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo class="MathClass-open">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo class="MathClass-bin">&#x02212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo class="MathClass-close">)</mml:mo></mml:mrow></mml:mrow></mml:mfenced></mml:mrow><mml:mrow><mml:mn>&#x003B1;</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:math></inline-formula>, where <italic>&#x003B1;</italic>&#x02009;&#x0003C;&#x02009;1 implements forgetting. All state predicting weights <italic>w<sub>ij</sub></italic> are initialized equally to represent multiple uniform distributions, and adapt in accordance with learning rule 11.</p>
<p>The capability of simulating distributed state progressions, also without sensory stimulation, follows from the stochastic selection of cell activations based on the learned, conditional state predictions. As a result of the bidirectional connections, the model becomes able to infer momentarily or permanently unobservable states and to mutually synchronize, or keep consistent, activations in the respective layers. By pre-activating a subset of cells in a layer, also a subset of learned state sequences can unfold. In the context of actions, this leads to the ability to synchronously simulate the state progression that corresponds to one of multiple encoded bodily movements in the vision and motor layers when biased top-down by a constant intention signal. The probability fusion in equation <xref ref-type="disp-formula" rid="E2">(2)</xref> accounts for an approximation of the respective, multi-conditional state probabilities. In the following section, we describe in further detail the application of this model to action understanding and the respective stimuli used in our evaluations.</p>
</sec>
</sec>
<sec id="S4">
<label>4</label> <title>Modeling Action Observation</title>
<p>The focus of this paper lies on the learning of an embodied, distributed, and multimodal model of action understanding, which involves bottom-up as well as top-down and generative processes. It consists of three stochastic layers, each modeling codes and processes that are believed to be involved in action observation, the inference of goals, and respective motor commands that facilitate the emulation of observed actions. The first layer comprises <italic>visual biological motion</italic> patterns. The second layer encodes the corresponding joint angular <italic>motor perceptions</italic>. Accordingly, the model includes two groups of modal input populations, which encode visual and proprioceptive stimuli. Moreover, we include an amodal or multimodal intrinsic representation of <italic>action intentions</italic>. These codes are believed to be represented at distributed neural sites. It is typically assumed that action goals and intentions are encoded inferior frontally, motor codes and plans posterior parietally, and biological, mainly visually driven motion patterns in the superior temporal sulcus [cf. Iacoboni (<xref ref-type="bibr" rid="B36">2005</xref>), Kilner (<xref ref-type="bibr" rid="B41">2011</xref>), and Turella et al. (<xref ref-type="bibr" rid="B68">2013</xref>)]. Inferences and synchronization processes between these neural sites are modeled by cross-modular state predictions between the layers in the network, while the intramodular predictions restrict the state progression to the experienced, own-bodily contingencies. Figure <xref ref-type="fig" rid="F1">1</xref> shows an overview of the implemented learning architecture in this context.</p>
<p>In the following, we describe the bottom-up processing chain of our model referring to psychological and neuroscientific evidence. We start with the simulation environment and the motion capture data format that provides the respective stimuli for our evaluations. Subsequently, we focus on important key aspects for the recognition of biological motion, their implications, and implementation in the model. Finally, we describe how the resulting perceptions are interpreted in the context of different modalities involved in action perception, inference, and emulation.</p>
<sec id="S4-4">
<label>4.1</label> <title>Motion Captures and Data Representation</title>
<p>We evaluate our model making use of the CMU Graphics Lab Motion Capture Database (<uri xlink:href="http://mocap.cs.cmu.edu/">http://mocap.cs.cmu.edu/</uri>). Recordings from subjects performing three different cyclic movements (<italic>walking</italic>, <italic>running</italic>, and <italic>basketball dribbling</italic>) in three trials each were utilized, as shown in Figure <xref ref-type="fig" rid="F3">3</xref>. For each movement, we chose a short, cyclic segment of the first trial as the training set and the other two, full trials as the testing set. In this way, the training set was rather idealized, while the testing set contained more information which, although inside the same action classes, strongly differed to the training data. The motion tracking data were recorded with 12 high-resolution infra-red cameras at 120&#x02009;Hz using 41 tracking markers attached to the subjects. The resulting 3D positions were then matched to separate skeleton templates for learning and testing to obtain series of <italic>joint angular postures</italic> and coherent <italic>relative joint positions</italic>.</p>
<fig position="float" id="F3">
<label>Figure 3</label>
<caption><p><bold>Simulated body driven by motion capture data</bold>. The left-sided image shows the limbs (blue lines between dots) and joints (green dots) that provide relative visual and joint angular input to the model. Moreover, three snapshots of the utilized motion tracking trials are shown: basketball dribbling, running, and walking.</p></caption>
<graphic xlink:href="frobt-03-00005-g003.tif"/>
</fig>
<p>In the experiments, we chose the time series of 12 of the calculated relative joint positions as input to the visual processing pathway of the model. We selected the start and end points of the left and right upper arm, forearm, upper and lower leg, shoulder, and hip joints relative to the waist, as shown in Figure <xref ref-type="fig" rid="F3">3</xref>. Each was encoded by a three-dimensional Cartesian coordinate. As input to the motor pathway, we chose the calculated joint angles of 8 joints, each encoded by a one- to three-dimensional radian vector, depending on the degrees of freedom of the respective joint. We selected the left and right hip joints, knee joint, shoulder joints, and the elbow joints, resulting in 16 DOF overall. A map of the inputs at a single, exemplary time step is shown in Figure <xref ref-type="fig" rid="F3">3</xref>. The visual and motor pathways are neural substructures of the here proposed model and preprocess the raw data as described in the following.</p>
</sec>
<sec id="S4-5">
<label>4.2</label> <title>Aspects of Biological Motion and Preprocessing</title>
<p>Giese and Poggio (<xref ref-type="bibr" rid="B27">2003</xref>) summarize critical properties of the recognition of biological motion from visual observations, such as selectivity for temporal order, generality, robustness, and view dependence. First, scrambling the temporal order in which biological motion patterns are displayed typically impairs the recognition of the respective action. This temporal selectivity is realized in our model by learning temporally directed state predictions. Second, biological motion recognition is highly robust against spatiotemporal variances (such as position, scale, and speed), body morphology and exact posture control, incomplete representations (such as point-light displays), or variances in illumination. We model these generalization capabilities by means of (i) the usage of simplified forms of representation of biological motion stimuli as described above, (ii) the extraction of invariant and valuable information in a neural preprocessing stage, and (iii) the simulation of observed motion with the own embodied encodings. Third, the recognition performance decreases with the amount of rotation an action is perceived from with respect to common perspectives. The prototypic cells in our network also respond to specific, learned views of observed movements. However, the preprocessing of our model is able to also infer and adapt to observed perspectives to a certain degree.</p>
<p>This neurally deployed preprocessing is a part of the model that is not detailed in this paper. To summarize, the extraction of relevant information results in fundamental spatiotemporal invariances of the visual perception to scale, translation, movement speed, and body morphology. This is achieved by (i) exponential smoothing to account for noise in the data, (ii) calculation of the velocity, and (iii) normalization of the data to obtain the relative motion direction of each relative feature processed [see Schrodt and Butz (<xref ref-type="bibr" rid="B60">2014</xref>) and Schrodt et al. (<xref ref-type="bibr" rid="B61">2014a</xref>,<xref ref-type="bibr" rid="B62">b</xref>, <xref ref-type="bibr" rid="B63">2015</xref>) for details]. For reasons of consistency, both the visual and motor perceptions are preprocessed in this manner. As to visual perception, the preprocessing stage is able to account also for invariance to orientation by means of active inference of the perspective an observed biological motion is perceived from. Compensating for the perspective upon observation solves the correspondence problem, which can be considered a premise for the ability to infer intrinsic action representations of others using the own, embodied encodings, as detailed in our previous work. As a matter of focus, however, we neglect the influence of orientation in the following experiments, meaning that the orientation of the learned and observed motions was identical.</p>
<p>Visual stimuli preprocessed in this manner are represented by a number of neural populations, each encoding the spatially relative motion direction of a specific bodily feature. Consequently, each cell in a population is tuned to a specific motion direction of a limb. Following this, the visual state layer accomplishes a segmentation of the concatenation of all visual population activations into whole-body, directional motion patterns. Analogously, the directions of changes in the joint angles are represented by populations and segmented into whole-body motor codes. In the following, we draw a comparison of this visuomotor perspective and our representation of intention codes to findings in neuroscience and psychology.</p>
</sec>
<sec id="S4-6">
<label>4.3</label> <title>Visuomotor Perspective and Intentions</title>
<p>The superior temporal sulcus is particularly well known for encoding (also whole-body) biological motion patterns (Bruce et al., <xref ref-type="bibr" rid="B5">1981</xref>; Perrett et al., <xref ref-type="bibr" rid="B53">1985</xref>; Oram and Perrett, <xref ref-type="bibr" rid="B51">1994</xref>) and has been considered to provide important visual input for the development of attributes linked with the mirror neuron system (Grossman et al., <xref ref-type="bibr" rid="B33">2000</xref>; Gallese, <xref ref-type="bibr" rid="B20">2001</xref>; Puce and Perrett, <xref ref-type="bibr" rid="B55">2003</xref>; Ulloa and Pineda, <xref ref-type="bibr" rid="B69">2007</xref>; Pavlova, <xref ref-type="bibr" rid="B52">2012</xref>; Cook et al., <xref ref-type="bibr" rid="B12">2014</xref>). Visual motion cues are necessary and most critical for the recognition of actions (Garcia and Grossman, <xref ref-type="bibr" rid="B25">2008</xref>; Thurman and Grossman, <xref ref-type="bibr" rid="B66">2008</xref>). As initially shown by Johansson (<xref ref-type="bibr" rid="B39">1973</xref>), the perception of point-like bodily landmarks in relative motion is sufficient in this process. Thus, we assume that the above relative directional motion information can be perceived visually and is sufficient for action recognition. In contrast, joint angular motion cannot be perceived directly from such minimal visual information, which particularly applies to inner rotations of limbs. Thus, we assume that the directional angular limb motion is perceived proprioceptively. In the context of actions, we consider a prototype of such whole-body joint angular motion a <italic>motor code</italic>. Similar motor codes are assumed to be activated during the observation of learned movements (Calvo-Merino et al., <xref ref-type="bibr" rid="B7">2005</xref>) and may be found in posterior parietal areas and related premotor areas (Iacoboni, <xref ref-type="bibr" rid="B36">2005</xref>; Friston et al., <xref ref-type="bibr" rid="B17">2011</xref>; Turella et al., <xref ref-type="bibr" rid="B68">2013</xref>).</p>
<p>Further, in the context of the mirror neuron system, intentional structures can be assumed to be encoded in the inferior frontal gyrus (Iacoboni, <xref ref-type="bibr" rid="B36">2005</xref>; Kilner, <xref ref-type="bibr" rid="B41">2011</xref>; Turella et al., <xref ref-type="bibr" rid="B68">2013</xref>). We simplify these intention codes by top-down, symbolic representations of specific actions. For the following experiments, we define three binary intentions in line with the motion tracking recordings explained before (basketball, running, and walking). Due to this symbol-like nature, the resulting intention layer cells can also be considered action classes or labels, while the derivation of intentions can be considered an online classification of observed bodily motion given visual cues. Since intentions are provided during training, the intention state cells and their predictions can be considered to develop by supervised training of action labels. However, all state variables are segmented using the unsupervised algorithm as described in Section <xref ref-type="sec" rid="S3-2">3.2</xref>.</p>
<p>During the observation of others, neither information about their proprioceptions nor their intentions are directly accessible. According to the embodied simulation hypothesis, the developing embodied states can nevertheless be inferred when observing others (Barsalou, <xref ref-type="bibr" rid="B2">1999</xref>, <xref ref-type="bibr" rid="B3">2008</xref>; Calvo-Merino et al., <xref ref-type="bibr" rid="B7">2005</xref>). Hence, in the following experiments, we evaluate the inference and embodied simulation capabilities of our model.</p>
</sec>
</sec>
<sec id="S5">
<label>5</label> <title>Evaluations</title>
<p>In the following experiments, we evaluate (a) the embodied learning of modal prototypes and predictions by means of the segmentation of different streams of information into prototypic state cells, (b) the resulting ability to infer intentions and motor states upon the observation of others&#x02019; actions, and (c) the model&#x02019;s capability to simulate movements without sensory stimulation, keeping visual and motor states consistently. For all of the experiments, we chose the parameterization <italic>&#x003B7;<sub>s</sub></italic>&#x02009;&#x0003D;&#x02009;0.01, <italic>&#x003B1;</italic>&#x02009;&#x0003D;&#x02009;0.9, <italic>&#x003B2;</italic>&#x02009;&#x0003D;&#x02009;0.5, <italic>&#x003BA;</italic>&#x02009;&#x0003D;&#x02009;16, and <italic>&#x003B8;</italic>&#x02009;&#x0003D;&#x02009;0.85 unless stated otherwise.</p>
<sec id="S5-7">
<label>5.1</label> <title>Experiment 1: Learning a Sensorimotor Model Mediated by Intentions</title>
<p>In the first experiment, we show how state cells develop from scratch given streams of relative visual and motor motion input. As shown in Figure <xref ref-type="fig" rid="F4">4</xref>, all layers are driven by data, assuming maximum sensory confidence and thus disabling the influence of predictions. Training consisted of learning perfectly cyclic motion tracking snippets: first, a 115 time steps or 0.96-s basketball trial where a single dribble and 2 footsteps were performed was shown 11 times in succession, resulting in 1265 time steps of training. Then, a 91 time steps or 0.75-s running trial performing 2 footsteps was shown 14 times, resulting in 1274 frames. Finally, a 260 time steps or 2.17-s walking trial performing 2 steps was shown 5 times repeatedly, resulting in 1300 frames. The training data thus consisted of 3.88&#x02009;s of unique data samples. The whole cyclic repetition of these trials was streamed into the model five times, while recruiting states, learning state prototypes and the resulting intra- and cross-modular predictions.</p>
<fig position="float" id="F4">
<label>Figure 4</label>
<caption><p><bold>Experiment 1: state segmentation and learning</bold>. Dashed lines indicate the learning of prototype states and temporal predictions. During learning, all information is assumed to be available, sensory signals are fully trusted.</p></caption>
<graphic xlink:href="frobt-03-00005-g004.tif"/>
</fig>
<p>Figure <xref ref-type="fig" rid="F5">5</xref> shows the recruitment of five visual and three motor state cells from scratch and the respective match to the driving stimuli in the example of a recruitment threshold <italic>&#x003B8;</italic>&#x02009;&#x0003D;&#x02009;0.1. Because of the cyclic nature of the trained movements, the activations of those states form cyclic time series. The recruitment threshold <italic>&#x003B8;</italic> basically defines the discretization of the state spaces. Hence, the higher the recruitment threshold <italic>&#x003B8;</italic>, the more states develop, as concluded in Table <xref ref-type="table" rid="T1">1</xref>. Note that learning was deterministic in these settings, which means that (a) adapted weights were not initialized randomly, but with a zero vector and (b) we assumed full sensor confidence such that the probability to recognize a state is a binary function. In consequence, there was no variance in the developing states.</p>
<fig position="float" id="F5">
<label>Figure 5</label>
<caption><p><bold>(A)</bold> The systematic time series of prototype matches and resulting recognition signals of five visual states, developing during training with <italic>&#x003B8;</italic>&#x02009;&#x0003D;&#x02009;0.1. Three different, cyclic motion tracking trials were learned repeatedly during this procedure (the first two repetitions are shown). Light red patches tagged with <inline-formula><mml:math id="M13"><mml:msup><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">train</mml:mtext></mml:mrow></mml:msup></mml:math></inline-formula> indicate the time intervals the <italic>basketball</italic> training trial was shown, the light green patches <inline-formula><mml:math id="M14"><mml:msup><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">train</mml:mtext></mml:mrow></mml:msup></mml:math></inline-formula> indicate the intervals of the <italic>running</italic> trial, and the light blue patches tagged <inline-formula><mml:math id="M15"><mml:msup><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">train</mml:mtext></mml:mrow></mml:msup></mml:math></inline-formula> indicate training on the <italic>walking</italic> trial. It can be seen that the time series of prototype matches (blue lines) are comparable when re-enacting the presentation of a motion tracking trial, since cells learn to encode specific parts of the data. Because the movements were cyclic, also the determined visual states (red lines) formed cyclic time series. Initially, some state prototypes were recoded when another movement was shown. <bold>(B)</bold> Equivalent evaluation of the state cell development in the motor layer.</p></caption>
<graphic xlink:href="frobt-03-00005-g005.tif"/>
</fig>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p><bold>Overview of the number of developing states during learning in dependency on <italic>&#x003B8;</italic> and the resulting classification performances during observation of movements not seen during training</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"><italic><bold><italic>&#x003B8;</italic></bold></italic></th>
<th align="left">Layer</th>
<th align="center">No. of developing states</th>
<th align="center" colspan="3">Correct classifications (%)<hr/></th>
<th align="center" colspan="3">Classifier confidence (%)<hr/></th>
</tr>
<tr>
<th align="left" colspan="3"/>
<th align="center" colspan="3"/>
</tr>
<tr>
<th align="left" colspan="6"/>
<th align="center" colspan="3"/>
</tr>
<tr>
<th align="left"/>
<th align="left"/>
<th align="center"/>
<th align="center">Basketball</th>
<th align="center">Running</th>
<th align="center">Walking</th>
<th align="center">Basketball</th>
<th align="center">Running</th>
<th align="center">Walking</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">0.1</td>
<td align="left">Visual</td>
<td align="center">6</td>
<td align="center">21.49</td>
<td align="center">92.36</td>
<td align="center">96.14</td>
<td align="center">44.71</td>
<td align="center">44.65</td>
<td align="center">47.00</td>
</tr>
<tr>
<td align="left"/>
<td align="left">Motor</td>
<td align="center">3</td>
<td align="center"/>
</tr>
<tr>
<td align="left"/>
<td align="left">Intentions</td>
<td align="center">3</td>
<td align="center"/>
</tr>
<tr>
<td align="left">0.3</td>
<td align="left">Visual</td>
<td align="center">8</td>
<td align="center">60.31</td>
<td align="center">98.28</td>
<td align="center">95.43</td>
<td align="center">50.90</td>
<td align="center">50.95</td>
<td align="center">50.75</td>
</tr>
<tr>
<td align="left"/>
<td align="left">Motor</td>
<td align="center">7</td>
<td align="center"/>
</tr>
<tr>
<td align="left"/>
<td align="left">Intentions</td>
<td align="center">3</td>
<td align="center"/>
</tr>
<tr>
<td align="left">0.5</td>
<td align="left">Visual</td>
<td align="center">16</td>
<td align="center">51.64</td>
<td align="center">99.66</td>
<td align="center">99.71</td>
<td align="center">53.97</td>
<td align="center">59.88</td>
<td align="center">65.99</td>
</tr>
<tr>
<td align="left"/>
<td align="left">Motor</td>
<td align="center">15</td>
<td align="center"/>
</tr>
<tr>
<td align="left"/>
<td align="left">Intentions</td>
<td align="center">3</td>
<td align="center"/>
</tr>
<tr>
<td align="left">0.7</td>
<td align="left">Visual</td>
<td align="center">31</td>
<td align="center">43.26</td>
<td align="center">98.51</td>
<td align="center">99.24</td>
<td align="center">60.83</td>
<td align="center">67.97</td>
<td align="center">75.79</td>
</tr>
<tr>
<td align="left"/>
<td align="left">Motor</td>
<td align="center">37</td>
<td align="center"/>
</tr>
<tr>
<td align="left"/>
<td align="left">Intentions</td>
<td align="center">3</td>
<td align="center"/>
</tr>
<tr>
<td align="left">0.85</td>
<td align="left">Visual</td>
<td align="center">72</td>
<td align="center">53.02</td>
<td align="center">99.02</td>
<td align="center">99.18</td>
<td align="center">65.10</td>
<td align="center">73.50</td>
<td align="center">80.73</td>
</tr>
<tr>
<td align="left"/>
<td align="left">Motor</td>
<td align="center">107</td>
<td align="center"/>
</tr>
<tr>
<td align="left"/>
<td align="left">Intentions</td>
<td align="center">3</td>
<td align="center"/>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic><italic>Correct classification</italic> denotes the percentage of time steps the maximally likely intention output corresponded to the actually shown movement. The <italic>classifier confidence</italic> shows the average inferred probability of the maximally likely intention during testing</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>Figure <xref ref-type="fig" rid="F5">5</xref> also indicates that non-disjunct state encodings develop for the three different movements: only one of the states is recognized exclusively during the perception of a specific movement. Thus, classifications of movements are barely possible using Bayesian statistics with such a low recruitment threshold. Hence, in the following section, we examine the influence of increasing the visual and motor state granularity on the model&#x02019;s ability to infer movement classes.</p>
</sec>
<sec id="S5-8">
<label>5.2</label> <title>Experiment 2a: Inference of Intentions upon Observation</title>
<p>For the classification of movements, or in this context, for the inference of intentions, the distinctness of the state structures with respect to the movements developing during training plays a major role. Since the information the state cells are encoding in their prototype vector is hard to visualize, we calculated the average pixel snapshot of the simulation display for each state while it was recognized [using an averaging formula analogously to equation <xref ref-type="disp-formula" rid="E11">(11)</xref>]. Basketball movements were displayed in red, running movements in green, and walking movements in blue. Consequently, if only a single state was created to represent all of the training data, the resulting state snapshot would show a mixture of all postures included in all of the movements, while overlapping postures would be black and non-overlapping postures would be colored. On the contrary, a state cell that was recognized only at a single time step during training would result in a snapshot showing only the corresponding posture in the respective color of the movement. Hence, the color of the snapshots can be considered a qualitative measure for the distinctness of states with respect to the three movements. Also, each snapshot shows the segments of the movements a state cell responds to and thus the model&#x02019;s &#x0201C;imagination&#x0201D; of the movement when modalities are inferred or simulated. Figure <xref ref-type="fig" rid="F6">6</xref> shows exemplar snapshots of cells created during the training phases using different recruitment thresholds <italic>&#x003B8;</italic>. As expected, higher thresholds lead to the creation of movement-exclusive states.</p>
<fig position="float" id="F6">
<label>Figure 6</label>
<caption><p><bold>Average simulation display while specific sensory states were observed</bold>. <bold>(A)</bold> Displays an example snapshot of a state created with learning threshold <italic>&#x003B8;</italic>&#x02009;&#x0003D;&#x02009;0.3. It shows that the state was recognized non-exclusively, that is, for each of the learned movements. The disjunction of patterns improves when increasing the threshold, such that for <italic>&#x003B8;</italic>&#x02009;&#x0003D;&#x02009;0.85, movements and specific parts of the movements are clearly identifiable by observing specific states [snapshots <bold>(B&#x02013;D)</bold>].</p></caption>
<graphic xlink:href="frobt-03-00005-g006.tif"/>
</fig>
<p>To evaluate the influence of the multimodal state segmentation on the model&#x02019;s ability to infer intentions and to test for generalization at the same time, we measured the influence of <italic>&#x003B8;</italic> on the correctness of the inferred values and the model&#x02019;s confidence, when different movements were presented after training. As indicated above, the testing set did not contain the motion tracking trials trained on. Rather, it contained two other basketball trials of 4.39 and 3.2&#x02009;s, two other running trials of 3.56&#x02009;s each, and two other running trials of 1.15 and 1.27&#x02009;s. The testing data thus consisted of 17.13&#x02009;s of unique data samples. Some trials included motion segments very different from the learned movements. Particularly, the basketball testing trials contained segments where the subject stood still and was lifting the ball or segments where the dribbling was incongruent with the footstep cycle, whereas the model was only trained on a single, congruent basketball dribbling snippet. Also, as indicated in Figure <xref ref-type="fig" rid="F7">7</xref>, only the visual modality was fed into the network during testing trials, which accounts for the fact that intentions and also motor commands are not directly observable during observation of actions. Note that the model did not obtain information about the time step when a new movement was shown during testing.</p>
<fig position="float" id="F7">
<label>Figure 7</label>
<caption><p><bold>Experiment 2a/b: inference of intentions and motor commands from visually observed movements</bold>. Solid lines indicate the propagation of sensory signals (green), cross-modular predictive signals (red), and intramodular predictions (blue). During testing, only the visual sensor is available and fused with the emerging predictions, assuming some uncertainty in sensory information. Intentions are inferred visually, while corresponding motor commands are inferred from vision and the derived intention.</p></caption>
<graphic xlink:href="frobt-03-00005-g007.tif"/>
</fig>
<p>Classification results for four different <italic>&#x003B8;</italic> averaged over 6 independent testing trials are shown in Figure <xref ref-type="fig" rid="F8">8</xref>. Despite the missing motor modality and the deviations in the observed posture control, the model was able to identify the character of the running and walking movements throughout, as concluded in Table <xref ref-type="table" rid="T1">1</xref>. In doing so, accurately recognized visual state cells were enough to push the visual, motor, and intention state determination into temporal attractor sequences that consisted of the cyclic emulation of the respective movement using the embodied encodings. Following inputs then either maintained this emulation when close enough to the encodings or forced the convergence to another attractor sequence, that is, a shift in the perception. This effect can be seen clearly in the basketball trials, were episodes similar enough to the training data existed. However, as explained above, the basketball training trials were short and idealized, and they did not contain incongruent dribbling. The model then partly inferred a similarity with the trained walking movement in these segments, resulting in a bistable perception as shown in the graphs. This effect shows how the model is limited to the learned, embodied encodings when inferring intentions. It can be avoided by adding further training data.</p>
<fig position="float" id="F8">
<label>Figure 8</label>
<caption><p><bold>Inference of intentions from visually observed movements shown for different <italic>&#x003B8;</italic></bold>. The red line indicates the moving average (one-second time window) of the derived <italic>basketball</italic> state probability in the intention layer, while light red background shows the interval in time the testing trials <inline-formula><mml:math id="M16"><mml:msubsup><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">test</mml:mtext></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math id="M17"><mml:msubsup><mml:mrow><mml:mi>b</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">test</mml:mtext></mml:mrow></mml:msubsup></mml:math></inline-formula> were actually presented. Analogously, green indicates <italic>running</italic> and blue indicates <italic>walking</italic>. The classifier confidence improves with <italic>&#x003B8;</italic> as a result of learning more disjunct sets of states per movement.</p></caption>
<graphic xlink:href="frobt-03-00005-g008.tif"/>
</fig>
<p>When the learned movements were represented by a higher number of mainly disjunct states with respect to the movements, the model&#x02019;s ability to infer the intentions slightly improved. As a result of the more disjunct patterns, however, the confidence in classification improved consistently with <italic>&#x003B8;</italic> from about 45 to 73% on average. As explained in the following, the classifier confidence has an influence on the inference of motor states.</p>
</sec>
<sec id="S5-9">
<label>5.3</label> <title>Experiment 2b: Inference of Motor Commands upon Observation</title>
<p>Analogously to the preceding experiment, where we could show that intentions could be classified purely from visually observed motion patterns, we now evaluate if also the corresponding motor commands can be inferred using the same mechanisms. Potentially, this task is more difficult, since the set of available motor commands consists of a larger number of states in the motor layer when compared with the intention layer, and since the motor state transitions typically underlie faster dynamics. Seeing that the observed movements differed severely from the learned movements, we evaluate if the inferred motor state snapshots correspond to the visual state snapshots at the same time steps and if the sequence in which they occur during the observation is plausible.</p>
<p>Figure <xref ref-type="fig" rid="F9">9</xref> shows the coincidence of state snapshots of the recognized visual states and inferred motor states when observing the testing trials. When similar state snapshots are activated in both the visual and the motor domains, the two modalities can be considered to be synchronized in the emulation of the observed movement. In this process, both the cross-modular prediction from the vision to motor layer and the motor states predicted by the currently inferred intention bias the activation of motor states as indicated in Figure <xref ref-type="fig" rid="F7">7</xref>. The classifier confidence depicts the probability that a cell in the intention layer is selected as winner. Thus, increasing the classifier confidence will also increase the probability that movement-specific motor states are determined. Thus, since the classifier confidence increases with <italic>&#x003B8;</italic>, the ability to imagine a sequence of motor codes corresponding to the currently observed visual motion, and the interpreted intention improves with the discretization of the state spaces.</p>
<fig position="float" id="F9">
<label>Figure 9</label>
<caption><p><bold>Example clips of the state sequences recognized in the visual layer and inferred in the motor layer when observing three different movements (<italic>basketball</italic>, <italic>running</italic>, and <italic>walking</italic> testing trials)</bold>. Each row shows the sequence of states by means of the representing snapshots over time (FLTR) for the respective modality and motion capture trial. Snapshots at the same position show the same time step in the sequence of visual and motor states and mostly show very similar parts of the movements. Because the inference is a stochastic process and because visual and motor states are not segmented in identical fashion as a result of the different information coding in the modalities, slight misalignments can occur. However, strong incongruence is avoided because of the visuomotor coupling. Moreover, although ambiguous patterns are included in the sequence, the network maintains the activation cycle of the movement-specific states because pattern transition probabilities are biased by top-down propagated intention signals.</p></caption>
<graphic xlink:href="frobt-03-00005-g009.tif"/>
</fig>
</sec>
<sec id="S5-10">
<label>5.4</label> <title>Experiment 3: Simulation of Actions</title>
<p>Learning a tripartite model of visual motion states, corresponding motor codes, and intentions enables the inference of various bits of missing information. Seeing that information is encoded in normalized probability densities and information transfer is realized stochastically, activities in the network are self-sustaining even when sensory input is completely suppressed. When only provided with a top-down activation of a particular motion intention in the intention layer (cf. Figure <xref ref-type="fig" rid="F10">10</xref>), the model simulates likely sequences of modal visual and motor state sequences according to the learned temporal statistics.</p>
<fig position="float" id="F10">
<label>Figure 10</label>
<caption><p><bold>Experiment 3: movement simulation with visuomotor coupling</bold>. No sensory signal is provided. While the stochastically emerging visual and motor states are biased by the top-down predictions induced by a constant intention signal, the coupling between vision and motor codes ensures the synchronization of the simulated pattern sequences.</p></caption>
<graphic xlink:href="frobt-03-00005-g010.tif"/>
</fig>
<p>In this experiment, we recorded the coinciding visual and motor state sequences generated by the model when a top-down intention-like action code is kept active in the intention layer. The results in Figure <xref ref-type="fig" rid="F11">11</xref> show that the learned sequences can be replicated accurately both in the visual and in the motor domains. Although multiple ambiguous states were learned, as can be seen in the visual imaginations that are multi-colored, the simulated state sequence remains in the correct sequence and movement class. This is because the transition probabilities in the respective modalities are biased by the top-down intention signal.</p>
<fig position="float" id="F11">
<label>Figure 11</label>
<caption><p><bold>Example clips of the state sequences simulated synchronously in the visual and motor modalities when three different intention priors (<italic>basketball</italic>, <italic>running</italic>, and <italic>walking</italic>) are provided</bold>. Each row shows the respective modality and intention prior. As during motor inference, snapshots at the same position mostly show very similar parts of the movements, while misalignments are mostly avoided. Again, the network maintains the activation cycle of the movement-specific states because pattern transition probabilities are biased by the provided top-down-propagated intention signals.</p></caption>
<graphic xlink:href="frobt-03-00005-g011.tif"/>
</fig>
<p>The results also show that motor and visual state estimates remain approximately synchronized, seeing that the simulated states represent similar visual and motor imaginations at similar time steps. This indicates that the sensorimotor coupling is capable of synchronizing different modalities for periods of time. The reason for this synchronization lies in the lateral predictive connections between vision and motor layers: upon a transition from one visual state to another, the conditional probabilities for motor states given the new visual state change in an according fashion, such that the current motor state is more likely to transit to the most likely successor, which is not only determined by the top-down intention layer signal but also by the intramodular motor state transition probabilities and by the cross-modular activation predictions from the vision layer. Vice versa, the motor states bias the transition in the visual modality, leading to the observable mutual synchronization.</p>
</sec>
</sec>
<sec id="S6">
<label>6</label> <title>Summary and Conclusion</title>
<p>Our work shows that stochastic generative neural networks can be used to model action inference, mental imagery, and action simulation capabilities. Referring to Barsalou&#x02019;s simulation hypothesis, it suggests that simulation processes in the brain may help to recognize, generalize, and maintain action perceptions and inferences using the own embodied encodings. In our model, these embodied simulations enable a consistent, multimodal interpretation of observed actions in abstract domains. In particular, we have shown that action observation models may rely on encodings that represent actions in a distributed and predictive manner: although some cells were encoding motion components that were active during the observation of various actions, cross-modular predictions enabled the consistent simulation of specific action sequences. Due to the predictive visuomotor coupling, temporal synchronicity of the activated states was ensured. Thus, the predictive, stochastic, and generative encodings resulted in the maintenance of overall consistent, multimodal motion imaginations. In combination with the previously published substructure of the model that resolves spatiotemporal variances by preprocessing of stimuli and inference of the perspective (Schrodt et al., <xref ref-type="bibr" rid="B63">2015</xref>), a neural network architecture can be generated that infers the type of observed actions and possibly underlying motor commands, irrespective of the vantage point and despite variations of the movements. The model is thus able to establish the correspondence between self-perceptions and the perception of others, which can be considered an essential challenge in modeling action understanding.</p>
<p>Despite these successes, the model is currently based on several assumptions. For one, we assume that raw visual and motor perceptions and intentions can be simplified by compressed codes without losing model relevance, and that the respective motion features can be identified reliably. Although it is particularly unclear how to incorporate realistic motor and intention codes in computational models, future model versions can be enhanced toward the processing of raw video streams of actions: the simulation snapshots in the experiments (see Figures <xref ref-type="fig" rid="F9">9</xref> and <xref ref-type="fig" rid="F11">11</xref>) were calculated analogously to the conditional state predictions [equation (12)]. This shows that the states developed by the system can be suitably mapped onto lower level visual modalities. Thus, further developed models may hierarchically process lower level visual information similar to Jung et al. (<xref ref-type="bibr" rid="B40">2015</xref>), however, based on top-down predicted, higher level, and bodily grounded motion estimates.</p>
<p>Further, without sensory stimuli, the system&#x02019;s simulation of action states is a discrete time stochastic process. While the sequence of simulated states was mostly correct, the temporal duration of the activation was characterized by relatively high variance. Adding further modal state layers could diminish this variance. Particularly, the current model incorporates motion signals only and no static or postural information is processed. Exemplarily, the model implemented by Layher et al. (<xref ref-type="bibr" rid="B46">2014</xref>) triggers a reinforcement learning signal upon the encounter of low motion energy, which was used to foster the generation of posture snapshots in extreme poses. Comparably to the variance of simulated states, also the mean durations of state activations were partially distorted because of the approximate fusion of predicted state probability densities during testing. Integrating the systems predictions also during learning to a certain extent may improve the fusion of probabilities. It may also improve noise robustness and the establishment of disjunct modal state sequences. As shown in the experiments, disjunct states and state transitions are advantageous for the correct classification and emulation of actions. Techniques are available that can prevent the system to fall into an illusionary loop, when overly trusting the own predictions (Kneissler et al., <xref ref-type="bibr" rid="B44">2014</xref>, <xref ref-type="bibr" rid="B43">2015</xref>).</p>
<p>Moreover, the system currently simplifies a cell activation competition such that only one cell in each layer is adapted at each iteration. Using Mexican hat or softmax functions for the adaptation of learned states may speed up learning. Along similar lines, learning may be further improved when allowing a differential weighting of the provided input features. Currently, each input feature has the same influence in determining the creation of a new state. The recruitment of new prototypic states may be made dependent on the predictive value of all currently available states, including their specificity and accuracy, as is, for example, done in the XCSF learning classifier system architecture (Stalph et al., <xref ref-type="bibr" rid="B64">2012</xref>; Kneissler et al., <xref ref-type="bibr" rid="B44">2014</xref>). Another current challenge to the system is to infer limb identities purely from visual information. The observed limb positions are fed into the dedicated neural network inputs. An adaptive confusion matrix could wire respective limb information appropriately, possibly by back-propagating mismatch signals. Additionally, lower level Gestalt constraints may be learned and used to adapt such a matrix.</p>
<p>Finally, despite the challenges remaining, also in its current form, the system may be evaluated as a cognitive model, and it may be used in robotics applications. Main predictions of the cognitive model come in the form of how visual motion will be segmented into individual motion clusters and how predictive encodings of the modalities modeled in the system will influence each other. Also, false information or distracting information from one module is expected to impair action recognition and simulation capabilities in the connected modules. On the robotics side, related techniques were applied using virtual visual servoing for object tracking (Comport et al., <xref ref-type="bibr" rid="B11">2006</xref>) and for improving the pose estimates of a robot (Gratal et al., <xref ref-type="bibr" rid="B28">2011</xref>). Our model offers both generative, visual servoing options and temporal motion predictions and inference-based, action recognition capabilities. In future work, this offers the opportunity to develop a cognitive system that is able to identify and subsequently emulate specific intention- or goal-oriented actions, striving for the same goal but adapting the motor commands to the own-bodily experiences and capabilities.</p>
</sec>
<sec id="S7">
<title>Author Contributions</title>
<p>FS is the main author of the contribution and was responsible for model conception, implementation, and evaluation. MB made substantial contributions to the proposed work by supervising the work, providing intellectual content, and co-authoring the paper.</p>
</sec>
<sec id="S8">
<title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ack>
<p>We acknowledge support by Deutsche Forschungsgemeinschaft and Open Access Publishing Fund of University of T&#x000FC;bingen. FS has been supported by postgraduate funding of the state of Baden-W&#x000FC;rttemberg (Landesgraduiertenf&#x000F6;rderung Baden-W&#x000FC;rttemberg). The motion tracking data used in this project was obtained from Carnegie Mellon University (<uri xlink:href="http://mocap.cs.cmu.edu/">http://mocap.cs.cmu.edu/</uri>). The database was created with funding from NSF EIA-0196217. The simulation framework used to read and display the data (AMC-Viewer) was written by James L. McCann.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baker</surname> <given-names>C. L.</given-names></name> <name><surname>Saxe</surname> <given-names>R.</given-names></name> <name><surname>Tenenbaum</surname> <given-names>J. B.</given-names></name></person-group> (<year>2009</year>). <article-title>Action understanding as inverse planning</article-title>. <source>Cognition</source> <volume>113</volume>, <fpage>329</fpage>&#x02013;<lpage>349</lpage>.<pub-id pub-id-type="doi">10.1016/j.cognition.2009.07.005</pub-id><pub-id pub-id-type="pmid">19729154</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barsalou</surname> <given-names>L. W.</given-names></name></person-group> (<year>1999</year>). <article-title>Perceptual symbol systems</article-title>. <source>Behav. Brain Sci.</source> <volume>22</volume>, <fpage>577</fpage>&#x02013;<lpage>600</lpage>.<pub-id pub-id-type="doi">10.1017/S0140525X99532147</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barsalou</surname> <given-names>L. W.</given-names></name></person-group> (<year>2008</year>). <article-title>Grounded cognition</article-title>. <source>Annu. Rev. Psychol.</source> <volume>59</volume>, <fpage>617</fpage>&#x02013;<lpage>645</lpage>.<pub-id pub-id-type="doi">10.1146/annurev.psych.59.103006.093639</pub-id><pub-id pub-id-type="pmid">17705682</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bonini</surname> <given-names>L.</given-names></name> <name><surname>Ferrari</surname> <given-names>P. F.</given-names></name></person-group> (<year>2011</year>). <article-title>Evolution of mirror systems: a simple mechanism for complex cognitive functions</article-title>. <source>Ann. N. Y. Acad. Sci.</source> <volume>1225</volume>, <fpage>166</fpage>&#x02013;<lpage>175</lpage>.<pub-id pub-id-type="doi">10.1111/j.1749-6632.2011.06002.x</pub-id><pub-id pub-id-type="pmid">21535003</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bruce</surname> <given-names>C.</given-names></name> <name><surname>Desimone</surname> <given-names>R.</given-names></name> <name><surname>Gross</surname> <given-names>C. G.</given-names></name></person-group> (<year>1981</year>). <article-title>Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque</article-title>. <source>J. Neurophysiol.</source> <volume>46</volume>, <fpage>369</fpage>&#x02013;<lpage>384</lpage>.</citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Buccino</surname> <given-names>G.</given-names></name> <name><surname>Vogt</surname> <given-names>S.</given-names></name> <name><surname>Ritzl</surname> <given-names>A.</given-names></name> <name><surname>Fink</surname> <given-names>G. R.</given-names></name> <name><surname>Zilles</surname> <given-names>K.</given-names></name> <name><surname>Freund</surname> <given-names>H.-J.</given-names></name> <etal/></person-group> (<year>2004</year>). <article-title>Neural circuits underlying imitation learning of hand actions: an event-related fMRI study</article-title>. <source>Neuron</source> <volume>42</volume>, <fpage>323</fpage>&#x02013;<lpage>334</lpage>.<pub-id pub-id-type="doi">10.1016/S0896-6273(04)00181-3</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Calvo-Merino</surname> <given-names>B.</given-names></name> <name><surname>Glaser</surname> <given-names>D. E.</given-names></name> <name><surname>Gr&#x000E8;zes</surname> <given-names>J.</given-names></name> <name><surname>Passingham</surname> <given-names>R. E.</given-names></name> <name><surname>Haggard</surname> <given-names>P.</given-names></name></person-group> (<year>2005</year>). <article-title>Action observation and acquired motor skills: an fMRI study with expert dancers</article-title>. <source>Cereb. Cortex</source> <volume>15</volume>, <fpage>1243</fpage>&#x02013;<lpage>1249</lpage>.<pub-id pub-id-type="doi">10.1093/cercor/bhi007</pub-id><pub-id pub-id-type="pmid">15616133</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Carpenter</surname> <given-names>M.</given-names></name> <name><surname>Nagell</surname> <given-names>K.</given-names></name> <name><surname>Tomasello</surname> <given-names>M.</given-names></name> <name><surname>Butterworth</surname> <given-names>G.</given-names></name> <name><surname>Moore</surname> <given-names>C.</given-names></name></person-group> (<year>1998</year>). <article-title>Social cognition, joint attention, and communicative competence from 9 to 15 months of age</article-title>. <source>Monogr. Soc. Res. Child Dev.</source> <volume>63</volume>, <fpage>i</fpage>&#x02013;<lpage>vi, 1&#x02013;143</lpage>.<pub-id pub-id-type="doi">10.2307/1166214</pub-id><pub-id pub-id-type="pmid">9835078</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Casile</surname> <given-names>A.</given-names></name> <name><surname>Caggiano</surname> <given-names>V.</given-names></name> <name><surname>Ferrari</surname> <given-names>P. F.</given-names></name></person-group> (<year>2011</year>). <article-title>The mirror neuron system: a fresh view</article-title>. <source>Neuroscientist</source> <volume>17</volume>, <fpage>524</fpage>&#x02013;<lpage>538</lpage>.<pub-id pub-id-type="doi">10.1177/1073858410392239</pub-id><pub-id pub-id-type="pmid">21467305</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Catmur</surname> <given-names>C.</given-names></name> <name><surname>Walsh</surname> <given-names>V.</given-names></name> <name><surname>Heyes</surname> <given-names>C.</given-names></name></person-group> (<year>2007</year>). <article-title>Sensorimotor learning configures the human mirror system</article-title>. <source>Curr. Biol.</source> <volume>17</volume>, <fpage>1527</fpage>&#x02013;<lpage>1531</lpage>.<pub-id pub-id-type="doi">10.1016/j.cub.2007.08.006</pub-id><pub-id pub-id-type="pmid">17716898</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Comport</surname> <given-names>A.</given-names></name> <name><surname>Marchand</surname> <given-names>E.</given-names></name> <name><surname>Pressigout</surname> <given-names>M.</given-names></name> <name><surname>Chaumette</surname> <given-names>F.</given-names></name></person-group> (<year>2006</year>). <article-title>Real-time markerless tracking for augmented reality: the virtual visual servoing framework</article-title>. <source>IEEE Trans. Vis. Comput. Graph.</source> <volume>12</volume>, <fpage>615</fpage>&#x02013;<lpage>628</lpage>.<pub-id pub-id-type="doi">10.1109/TVCG.2006.78</pub-id><pub-id pub-id-type="pmid">16805268</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cook</surname> <given-names>R.</given-names></name> <name><surname>Bird</surname> <given-names>G.</given-names></name> <name><surname>Catmur</surname> <given-names>C.</given-names></name> <name><surname>Press</surname> <given-names>C.</given-names></name> <name><surname>Heyes</surname> <given-names>C.</given-names></name></person-group> (<year>2014</year>). <article-title>Mirror neurons: from origin to function</article-title>. <source>Behav. Brain Sci.</source> <volume>37</volume>, <fpage>177</fpage>&#x02013;<lpage>192</lpage>.<pub-id pub-id-type="doi">10.1017/S0140525X13000903</pub-id><pub-id pub-id-type="pmid">24775147</pub-id></citation></ref>
<ref id="B13"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Dautenhahn</surname> <given-names>K.</given-names></name> <name><surname>Nehaniv</surname> <given-names>C. L.</given-names></name></person-group> (<year>2002</year>). <source>The Correspondence Problem</source>. <publisher-loc>Massachusetts</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Elsner</surname> <given-names>B.</given-names></name></person-group> (<year>2007</year>). <article-title>Infants&#x02019; imitation of goal-directed actions: the role of movements and action effects</article-title>. <source>Acta Psychol.</source> <volume>124</volume>, <fpage>44</fpage>&#x02013;<lpage>59</lpage>.<pub-id pub-id-type="doi">10.1016/j.actpsy.2006.09.006</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Engel</surname> <given-names>A. K.</given-names></name> <name><surname>Maye</surname> <given-names>A.</given-names></name> <name><surname>Kurthen</surname> <given-names>M.</given-names></name> <name><surname>K&#x000F6;nig</surname> <given-names>P.</given-names></name></person-group> (<year>2013</year>). <article-title>Where&#x02019;s the action? The pragmatic turn in cognitive science</article-title>. <source>Trends Cogn. Sci.</source> <volume>17</volume>, <fpage>202</fpage>&#x02013;<lpage>209</lpage>.<pub-id pub-id-type="doi">10.1016/j.tics.2013.03.006</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ferrari</surname> <given-names>P. F.</given-names></name> <name><surname>Visalberghi</surname> <given-names>E.</given-names></name> <name><surname>Paukner</surname> <given-names>A.</given-names></name> <name><surname>Fogassi</surname> <given-names>L.</given-names></name> <name><surname>Ruggiero</surname> <given-names>A.</given-names></name> <name><surname>Suomi</surname> <given-names>S. J.</given-names></name></person-group> (<year>2006</year>). <article-title>Neonatal imitation in rhesus macaques</article-title>. <source>PLoS Biol.</source> <volume>4</volume>:<fpage>e302</fpage>.<pub-id pub-id-type="doi">10.1371/journal.pbio.0040302</pub-id><pub-id pub-id-type="pmid">16953662</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friston</surname> <given-names>K.</given-names></name> <name><surname>Mattout</surname> <given-names>J.</given-names></name> <name><surname>Kilner</surname> <given-names>J.</given-names></name></person-group> (<year>2011</year>). <article-title>Action understanding and active inference</article-title>. <source>Biol. Cybern.</source> <volume>104</volume>, <fpage>137</fpage>&#x02013;<lpage>160</lpage>.<pub-id pub-id-type="doi">10.1007/s00422-011-0424-z</pub-id><pub-id pub-id-type="pmid">21327826</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fritzke</surname> <given-names>B.</given-names></name></person-group> (<year>1995</year>). <article-title>A growing neural gas network learns topologies</article-title>. <source>Adv. Neural Inf. Process Syst.</source> <volume>7</volume>, <fpage>625</fpage>&#x02013;<lpage>632</lpage>.</citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Froese</surname> <given-names>T.</given-names></name> <name><surname>Lenay</surname> <given-names>C.</given-names></name> <name><surname>Ikegami</surname> <given-names>T.</given-names></name></person-group> (<year>2012</year>). <article-title>Imitation by social interaction? Analysis of a minimal agent-based model of the correspondence problem</article-title>. <source>Front. Hum. Neurosci.</source> <volume>6</volume>:<fpage>202</fpage>.<pub-id pub-id-type="doi">10.3389/fnhum.2012.00202</pub-id><pub-id pub-id-type="pmid">23060768</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gallese</surname> <given-names>V.</given-names></name></person-group> (<year>2001</year>). <article-title>The &#x02018;shared manifold&#x02019; hypothesis. From mirror neurons to empathy</article-title>. <source>J. Conscious. Stud.</source> <volume>8</volume>, <fpage>33</fpage>&#x02013;<lpage>50</lpage>.</citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gallese</surname> <given-names>V.</given-names></name></person-group> (<year>2007a</year>). <article-title>Before and below &#x02018;theory of mind&#x02019;: embodied simulation and the neural correlates of social cognition</article-title>. <source>Philos. Trans. R. Soc. Lond. B Biol. Sci.</source> <volume>362</volume>, <fpage>659</fpage>&#x02013;<lpage>669</lpage>.<pub-id pub-id-type="doi">10.1098/rstb.2006.2002</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gallese</surname> <given-names>V.</given-names></name></person-group> (<year>2007b</year>). <article-title>Embodied simulation: from mirror neuron systems to interpersonal relations</article-title>. <source>Novartis Found. Symp.</source> <volume>278</volume>, <fpage>3</fpage>&#x02013;<lpage>12</lpage>.<pub-id pub-id-type="doi">10.1002/9780470030585.ch2</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gallese</surname> <given-names>V.</given-names></name> <name><surname>Goldman</surname> <given-names>A.</given-names></name></person-group> (<year>1998</year>). <article-title>Mirror neurons and the simulation theory of mind-reading</article-title>. <source>Trends Cogn. Sci.</source> <volume>2</volume>, <fpage>493</fpage>&#x02013;<lpage>501</lpage>.<pub-id pub-id-type="doi">10.1016/S1364-6613(98)01262-5</pub-id><pub-id pub-id-type="pmid">21227300</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gallese</surname> <given-names>V.</given-names></name> <name><surname>Rochat</surname> <given-names>M.</given-names></name> <name><surname>Cossu</surname> <given-names>G.</given-names></name> <name><surname>Sinigaglia</surname> <given-names>C.</given-names></name></person-group> (<year>2009</year>). <article-title>Motor cognition and its role in the phylogeny and ontogeny of action understanding</article-title>. <source>Dev. Psychol.</source> <volume>45</volume>, <fpage>103</fpage>.<pub-id pub-id-type="doi">10.1037/a0014436</pub-id><pub-id pub-id-type="pmid">19209994</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Garcia</surname> <given-names>J. O.</given-names></name> <name><surname>Grossman</surname> <given-names>E. D.</given-names></name></person-group> (<year>2008</year>). <article-title>Necessary but not sufficient: motion perception is required for perceiving biological motion</article-title>. <source>Vision Res.</source> <volume>48</volume>, <fpage>1144</fpage>&#x02013;<lpage>1149</lpage>.<pub-id pub-id-type="doi">10.1016/j.visres.2008.01.027</pub-id><pub-id pub-id-type="pmid">18346774</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gergely</surname> <given-names>G.</given-names></name> <name><surname>N&#x000E1;dasdy</surname> <given-names>Z.</given-names></name> <name><surname>Csibra</surname> <given-names>G.</given-names></name> <name><surname>Biro</surname> <given-names>S.</given-names></name></person-group> (<year>1995</year>). <article-title>Taking the intentional stance at 12 months of age</article-title>. <source>Cognition</source> <volume>56</volume>, <fpage>165</fpage>&#x02013;<lpage>193</lpage>.<pub-id pub-id-type="doi">10.1016/0010-0277(95)00661-H</pub-id><pub-id pub-id-type="pmid">7554793</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Giese</surname> <given-names>M. A.</given-names></name> <name><surname>Poggio</surname> <given-names>T.</given-names></name></person-group> (<year>2003</year>). <article-title>Neural mechanisms for the recognition of biological movements</article-title>. <source>Nat. Rev. Neurosci.</source> <volume>4</volume>, <fpage>179</fpage>&#x02013;<lpage>192</lpage>.<pub-id pub-id-type="doi">10.1038/nrn1057</pub-id><pub-id pub-id-type="pmid">12612631</pub-id></citation></ref>
<ref id="B28"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Gratal</surname> <given-names>X.</given-names></name> <name><surname>Romero</surname> <given-names>J.</given-names></name> <name><surname>Kragic</surname> <given-names>D.</given-names></name></person-group> (<year>2011</year>). &#x0201C;<article-title>Virtual visual servoing for real-time robot pose estimation</article-title>,&#x0201D; in <source>World Congress</source>, Vol. <volume>18</volume>, eds <person-group person-group-type="editor"><name><surname>Bittanti</surname> <given-names>S.</given-names></name> <name><surname>Cenedese</surname> <given-names>A.</given-names></name> <name><surname>Zampieri</surname> <given-names>S.</given-names></name></person-group> (<publisher-loc>Milano</publisher-loc>: <publisher-name>International Federation of Automatic Control</publisher-name>), <fpage>9017</fpage>&#x02013;<lpage>9022</lpage>.</citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grossberg</surname> <given-names>S.</given-names></name></person-group> (<year>1973</year>). <article-title>Contour enhancement, short-term memory, and constancies in reverberating neural networks</article-title>. <source>Stud. Appl. Math.</source> <volume>52</volume>, <fpage>213</fpage>&#x02013;<lpage>257</lpage>.<pub-id pub-id-type="doi">10.1002/sapm1973523213</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grossberg</surname> <given-names>S.</given-names></name></person-group> (<year>1976a</year>). <article-title>On the development of feature detectors in the visual cortex with applications to learning and reaction-diffusion systems</article-title>. <source>Biol. Cybern.</source> <volume>21</volume>, <fpage>145</fpage>&#x02013;<lpage>159</lpage>.<pub-id pub-id-type="doi">10.1007/BF00337422</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grossberg</surname> <given-names>S.</given-names></name></person-group> (<year>1976b</year>). <article-title>Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors</article-title>. <source>Biol. Cybern.</source> <volume>23</volume>, <fpage>121</fpage>&#x02013;<lpage>134</lpage>.<pub-id pub-id-type="doi">10.1007/BF00344744</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grossberg</surname> <given-names>S.</given-names></name></person-group> (<year>1976c</year>). <article-title>Adaptive pattern classification and universal recoding: II. Feedback, expectation, olfaction, illusions</article-title>. <source>Biol. Cybern.</source> <volume>23</volume>, <fpage>187</fpage>&#x02013;<lpage>202</lpage>.</citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grossman</surname> <given-names>E.</given-names></name> <name><surname>Donnelly</surname> <given-names>M.</given-names></name> <name><surname>Price</surname> <given-names>R.</given-names></name> <name><surname>Pickens</surname> <given-names>D.</given-names></name> <name><surname>Morgan</surname> <given-names>V.</given-names></name> <name><surname>Neighbor</surname> <given-names>G.</given-names></name> <etal/></person-group> (<year>2000</year>). <article-title>Brain areas involved in perception of biological motion</article-title>. <source>J. Cogn. Neurosci.</source> <volume>12</volume>, <fpage>711</fpage>&#x02013;<lpage>720</lpage>.<pub-id pub-id-type="doi">10.1162/089892900562417</pub-id><pub-id pub-id-type="pmid">11054914</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heyes</surname> <given-names>C.</given-names></name></person-group> (<year>2001</year>). <article-title>Causes and consequences of imitation</article-title>. <source>Trends Cogn. Sci.</source> <volume>5</volume>, <fpage>253</fpage>&#x02013;<lpage>261</lpage>.<pub-id pub-id-type="doi">10.1016/S1364-6613(00)01661-2</pub-id><pub-id pub-id-type="pmid">11390296</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heyes</surname> <given-names>C.</given-names></name></person-group> (<year>2010</year>). <article-title>Where do mirror neurons come from?</article-title> <source>Neurosci. Biobehav. Rev.</source> <volume>34</volume>, <fpage>575</fpage>&#x02013;<lpage>583</lpage>.<pub-id pub-id-type="doi">10.1016/j.neubiorev.2009.11.007</pub-id><pub-id pub-id-type="pmid">19914284</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Iacoboni</surname> <given-names>M.</given-names></name></person-group> (<year>2005</year>). <article-title>Neural mechanisms of imitation</article-title>. <source>Curr. Opin. Neurobiol.</source> <volume>15</volume>, <fpage>632</fpage>&#x02013;<lpage>637</lpage>.<pub-id pub-id-type="doi">10.1016/j.conb.2005.10.010</pub-id><pub-id pub-id-type="pmid">16271461</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Iacoboni</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>Imitation, empathy, and mirror neurons</article-title>. <source>Annu. Rev. Psychol.</source> <volume>60</volume>, <fpage>653</fpage>&#x02013;<lpage>670</lpage>.<pub-id pub-id-type="doi">10.1146/annurev.psych.60.110707.163604</pub-id><pub-id pub-id-type="pmid">18793090</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Iacoboni</surname> <given-names>M.</given-names></name> <name><surname>Dapretto</surname> <given-names>M.</given-names></name></person-group> (<year>2006</year>). <article-title>The mirror neuron system and the consequences of its dysfunction</article-title>. <source>Nat. Rev. Neurosci.</source> <volume>7</volume>, <fpage>942</fpage>&#x02013;<lpage>951</lpage>.<pub-id pub-id-type="doi">10.1038/nrn2024</pub-id><pub-id pub-id-type="pmid">17115076</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johansson</surname> <given-names>G.</given-names></name></person-group> (<year>1973</year>). <article-title>Visual perception of biological motion and a model for its analysis</article-title>. <source>Percept. Psychophys.</source> <volume>14</volume>, <fpage>201</fpage>&#x02013;<lpage>211</lpage>.<pub-id pub-id-type="doi">10.3758/BF03212378</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jung</surname> <given-names>M.</given-names></name> <name><surname>Hwang</surname> <given-names>J.</given-names></name> <name><surname>Tani</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>Self-organization of spatio-temporal hierarchy via learning of dynamic visual image patterns on action sequences</article-title>. <source>PLoS ONE</source> <volume>10</volume>:<fpage>e0131214</fpage>.<pub-id pub-id-type="doi">10.1371/journal.pone.0131214</pub-id><pub-id pub-id-type="pmid">26147887</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kilner</surname> <given-names>J. M.</given-names></name></person-group> (<year>2011</year>). <article-title>More than one pathway to action understanding</article-title>. <source>Trends Cogn. Sci.</source> <volume>15</volume>, <fpage>352</fpage>&#x02013;<lpage>357</lpage>.<pub-id pub-id-type="doi">10.1016/j.tics.2011.06.005</pub-id><pub-id pub-id-type="pmid">21775191</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kilner</surname> <given-names>J. M.</given-names></name> <name><surname>Friston</surname> <given-names>K. J.</given-names></name> <name><surname>Frith</surname> <given-names>C. D.</given-names></name></person-group> (<year>2007</year>). <article-title>Predictive coding: an account of the mirror neuron system</article-title>. <source>Cogn. Process.</source> <volume>8</volume>, <fpage>159</fpage>&#x02013;<lpage>166</lpage>.<pub-id pub-id-type="doi">10.1007/s10339-007-0170-2</pub-id><pub-id pub-id-type="pmid">17429704</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kneissler</surname> <given-names>J.</given-names></name> <name><surname>Drugowitsch</surname> <given-names>J.</given-names></name> <name><surname>Friston</surname> <given-names>K.</given-names></name> <name><surname>Butz</surname> <given-names>M. V.</given-names></name></person-group> (<year>2015</year>). <article-title>Simultaneous learning and filtering without delusions: a bayes-optimal combination of predictive inference and adaptive filtering</article-title>. <source>Front. Comput. Neurosci.</source> <volume>9</volume>:<fpage>47</fpage>.<pub-id pub-id-type="doi">10.3389/fncom.2015.00047</pub-id><pub-id pub-id-type="pmid">25983690</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kneissler</surname> <given-names>J.</given-names></name> <name><surname>Stalph</surname> <given-names>P. O.</given-names></name> <name><surname>Drugowitsch</surname> <given-names>J.</given-names></name> <name><surname>Butz</surname> <given-names>M. V.</given-names></name></person-group> (<year>2014</year>). <article-title>Filtering sensory information with XCSF: improving learning robustness and robot arm control performance</article-title>. <source>Evol. Comput.</source> <volume>22</volume>, <fpage>139</fpage>&#x02013;<lpage>158</lpage>.<pub-id pub-id-type="doi">10.1162/EVCO_a_00108</pub-id><pub-id pub-id-type="pmid">23746295</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lallee</surname> <given-names>S.</given-names></name> <name><surname>Dominey</surname> <given-names>P. F.</given-names></name></person-group> (<year>2013</year>). <article-title>Multi-modal convergence maps: from body schema and self-representation to mental imagery</article-title>. <source>Adapt. Behav.</source> <volume>21</volume>, <fpage>274</fpage>&#x02013;<lpage>285</lpage>.<pub-id pub-id-type="doi">10.1177/1059712313488423</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Layher</surname> <given-names>G.</given-names></name> <name><surname>Giese</surname> <given-names>M. A.</given-names></name> <name><surname>Neumann</surname> <given-names>H.</given-names></name></person-group> (<year>2014</year>). <article-title>Learning representations of animated motion sequences &#x02013; A neural model</article-title>. <source>Top. Cogn. Sci.</source> <volume>6</volume>, <fpage>170</fpage>&#x02013;<lpage>182</lpage>.<pub-id pub-id-type="doi">10.1111/tops.12075</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lepage</surname> <given-names>J.-F.</given-names></name> <name><surname>Th&#x000E9;oret</surname> <given-names>H.</given-names></name></person-group> (<year>2007</year>). <article-title>The mirror neuron system: grasping others&#x02019; actions from birth?</article-title> <source>Dev. Sci.</source> <volume>10</volume>, <fpage>513</fpage>&#x02013;<lpage>523</lpage>.<pub-id pub-id-type="doi">10.1111/j.1467-7687.2007.00631.x</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meltzoff</surname> <given-names>A. N.</given-names></name></person-group> (<year>2007</year>). <article-title>&#x02018;Like me&#x02019;: a foundation for social cognition</article-title>. <source>Dev. Sci.</source> <volume>10</volume>, <fpage>126</fpage>&#x02013;<lpage>134</lpage>.<pub-id pub-id-type="doi">10.1111/j.1467-7687.2007.00574.x</pub-id></citation></ref>
<ref id="B49"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Nagai</surname> <given-names>Y.</given-names></name> <name><surname>Kawai</surname> <given-names>Y.</given-names></name> <name><surname>Asada</surname> <given-names>M.</given-names></name></person-group> (<year>2011</year>). &#x0201C;<article-title>Emergence of mirror neuron system: immature vision leads to self-other correspondence</article-title>,&#x0201D; in <conf-name>2011 IEEE International Conference on Development and Learning (ICDL)</conf-name>, Vol. <volume>2</volume> (<conf-loc>Frankfurt am Main</conf-loc>: <conf-sponsor>IEEE</conf-sponsor>), <fpage>1</fpage>&#x02013;<lpage>6</lpage>.</citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Oja</surname> <given-names>E.</given-names></name></person-group> (<year>1989</year>). <article-title>Neural networks, principal components, and subspaces</article-title>. <source>Int. J. Neural Syst.</source> <volume>1</volume>, <fpage>61</fpage>&#x02013;<lpage>68</lpage>.<pub-id pub-id-type="doi">10.1142/S0129065789000475</pub-id></citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Oram</surname> <given-names>M.</given-names></name> <name><surname>Perrett</surname> <given-names>D. I.</given-names></name></person-group> (<year>1994</year>). <article-title>Responses of anterior superior temporal polysensory (STPa) neurons to &#x0201C;biological motion&#x0201D; stimuli</article-title>. <source>J. Cogn. Neurosci.</source> <volume>6</volume>, <fpage>99</fpage>&#x02013;<lpage>116</lpage>.<pub-id pub-id-type="doi">10.1162/jocn.1994.6.2.99</pub-id><pub-id pub-id-type="pmid">23962364</pub-id></citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pavlova</surname> <given-names>M. A.</given-names></name></person-group> (<year>2012</year>). <article-title>Biological motion processing as a hallmark of social cognition</article-title>. <source>Cereb. Cortex</source> <volume>22</volume>, <fpage>981</fpage>&#x02013;<lpage>995</lpage>.<pub-id pub-id-type="doi">10.1093/cercor/bhr156</pub-id><pub-id pub-id-type="pmid">21775676</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Perrett</surname> <given-names>D.</given-names></name> <name><surname>Smith</surname> <given-names>P.</given-names></name> <name><surname>Mistlin</surname> <given-names>A.</given-names></name> <name><surname>Chitty</surname> <given-names>A.</given-names></name> <name><surname>Head</surname> <given-names>A.</given-names></name> <name><surname>Potter</surname> <given-names>D.</given-names></name> <etal/></person-group> (<year>1985</year>). <article-title>Visual analysis of body movements by neurons in the temporal cortex of the macaque monkey: a preliminary report</article-title>. <source>Behav. Brain Res.</source> <volume>16</volume>, <fpage>153</fpage>&#x02013;<lpage>170</lpage>.<pub-id pub-id-type="doi">10.1016/0166-4328(85)90089-0</pub-id></citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pouget</surname> <given-names>A.</given-names></name> <name><surname>Dayan</surname> <given-names>P.</given-names></name> <name><surname>Zemel</surname> <given-names>R.</given-names></name></person-group> (<year>2000</year>). <article-title>Information processing with population codes</article-title>. <source>Nat. Rev. Neurosci.</source> <volume>1</volume>, <fpage>125</fpage>&#x02013;<lpage>132</lpage>.<pub-id pub-id-type="doi">10.1038/35039062</pub-id><pub-id pub-id-type="pmid">11252775</pub-id></citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Puce</surname> <given-names>A.</given-names></name> <name><surname>Perrett</surname> <given-names>D.</given-names></name></person-group> (<year>2003</year>). <article-title>Electrophysiology and brain imaging of biological motion</article-title>. <source>Philos. Trans. R. Soc. Lond. B Biol. Sci.</source> <volume>358</volume>, <fpage>435</fpage>&#x02013;<lpage>445</lpage>.<pub-id pub-id-type="doi">10.1098/rstb.2002.1221</pub-id><pub-id pub-id-type="pmid">12689371</pub-id></citation></ref>
<ref id="B56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rao</surname> <given-names>R. P.</given-names></name> <name><surname>Ballard</surname> <given-names>D. H.</given-names></name></person-group> (<year>1999</year>). <article-title>Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects</article-title>. <source>Nat. Neurosci.</source> <volume>2</volume>, <fpage>79</fpage>&#x02013;<lpage>87</lpage>.<pub-id pub-id-type="doi">10.1038/4580</pub-id><pub-id pub-id-type="pmid">10195184</pub-id></citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rizzolatti</surname> <given-names>G.</given-names></name> <name><surname>Craighero</surname> <given-names>L.</given-names></name></person-group> (<year>2004</year>). <article-title>The mirror-neuron system</article-title>. <source>Annu. Rev. Neurosci.</source> <volume>27</volume>, <fpage>169</fpage>&#x02013;<lpage>192</lpage>.<pub-id pub-id-type="doi">10.1146/annurev.neuro.27.070203.144230</pub-id><pub-id pub-id-type="pmid">15217330</pub-id></citation></ref>
<ref id="B58"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Rizzolatti</surname> <given-names>G.</given-names></name> <name><surname>Craighero</surname> <given-names>L.</given-names></name></person-group> (<year>2005</year>). &#x0201C;<article-title>Mirror neuron: a neurological approach to empathy</article-title>,&#x0201D; in <source>Neurobiology of Human Values</source>, eds <person-group person-group-type="editor"><name><surname>Changeux</surname> <given-names>J.-P.</given-names></name> <name><surname>Damasio</surname> <given-names>A. R.</given-names></name> <name><surname>Singer</surname> <given-names>W.</given-names></name> <name><surname>Christen</surname> <given-names>Y.</given-names></name></person-group> (<publisher-loc>Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>107</fpage>&#x02013;<lpage>123</lpage>.</citation></ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Saby</surname> <given-names>J. N.</given-names></name> <name><surname>Marshall</surname> <given-names>P. J.</given-names></name> <name><surname>Meltzoff</surname> <given-names>A. N.</given-names></name></person-group> (<year>2012</year>). <article-title>Neural correlates of being imitated: an eeg study in preverbal infants</article-title>. <source>Soc. Neurosci.</source> <volume>7</volume>, <fpage>650</fpage>&#x02013;<lpage>661</lpage>.<pub-id pub-id-type="doi">10.1080/17470919.2012.691429</pub-id><pub-id pub-id-type="pmid">22646701</pub-id></citation></ref>
<ref id="B60"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Schrodt</surname> <given-names>F.</given-names></name> <name><surname>Butz</surname> <given-names>M. V.</given-names></name></person-group> (<year>2014</year>). &#x0201C;<article-title>Modeling perspective-taking by forecasting 3D biological motion sequences</article-title>,&#x0201D; in <source>Cognitive Processing, Suppl. KogWis 2014</source>, Vol. <volume>15</volume>, eds <person-group person-group-type="editor"><name><surname>Belardinelli</surname> <given-names>M. O.</given-names></name> <name><surname>Belardinelli</surname> <given-names>A.</given-names></name> <name><surname>Butz</surname> <given-names>M. V.</given-names></name></person-group> (<publisher-loc>T&#x000FC;bingen</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>137</fpage>&#x02013;<lpage>139</lpage>.</citation></ref>
<ref id="B61"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Schrodt</surname> <given-names>F.</given-names></name> <name><surname>Layher</surname> <given-names>G.</given-names></name> <name><surname>Neumann</surname> <given-names>H.</given-names></name> <name><surname>Butz</surname> <given-names>M. V.</given-names></name></person-group> (<year>2014a</year>). &#x0201C;<article-title>Modeling perspective-taking by correlating visual and proprioceptive dynamics</article-title>,&#x0201D; in <conf-name>Proceedings of the 36th Annual Conference of the Cognitive Science Society</conf-name> (<conf-loc>Quebec City</conf-loc>), <fpage>1383</fpage>&#x02013;<lpage>1388</lpage>.</citation></ref>
<ref id="B62"><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Schrodt</surname> <given-names>F.</given-names></name> <name><surname>Layher</surname> <given-names>G.</given-names></name> <name><surname>Neumann</surname> <given-names>H.</given-names></name> <name><surname>Butz</surname> <given-names>M. V.</given-names></name></person-group> (<year>2014b</year>). &#x0201C;<article-title>Modeling perspective-taking upon observation of 3D biological motion</article-title>,&#x0201D; in <conf-name>Proceedings of the 4th International Conference on Development and Learning and on Epigenetic Robotics</conf-name> (<conf-loc>Genoa</conf-loc>), <fpage>328</fpage>&#x02013;<lpage>333</lpage>.</citation></ref>
<ref id="B63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schrodt</surname> <given-names>F.</given-names></name> <name><surname>Layher</surname> <given-names>G.</given-names></name> <name><surname>Neumann</surname> <given-names>H.</given-names></name> <name><surname>Butz</surname> <given-names>M. V.</given-names></name></person-group> (<year>2015</year>). <article-title>Embodied learning of a generative neural model for biological motion perception and inference</article-title>. <source>Front. Comput. Neurosci.</source> <volume>9</volume>:<fpage>79</fpage>.<pub-id pub-id-type="doi">10.3389/fncom.2015.00079</pub-id><pub-id pub-id-type="pmid">26217215</pub-id></citation></ref>
<ref id="B64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stalph</surname> <given-names>P. O.</given-names></name> <name><surname>Llor&#x000E1;</surname> <given-names>X.</given-names></name> <name><surname>Goldberg</surname> <given-names>D. E.</given-names></name> <name><surname>Butz</surname> <given-names>M. V.</given-names></name></person-group> (<year>2012</year>). <article-title>Resource management and scalability of the xcsf learning classifier system</article-title>. <source>Theor. Comp. Sci.</source> <volume>425</volume>, <fpage>126</fpage>&#x02013;<lpage>141</lpage>.<pub-id pub-id-type="doi">10.1016/j.tcs.2010.07.007</pub-id></citation></ref>
<ref id="B65"><citation citation-type="book"><person-group person-group-type="author"><name><surname>Taylor</surname> <given-names>G. W.</given-names></name> <name><surname>Hinton</surname> <given-names>G. E.</given-names></name> <name><surname>Roweis</surname> <given-names>S. T.</given-names></name></person-group> (<year>2006</year>). &#x0201C;<article-title>Modeling human motion using binary latent variables</article-title>,&#x0201D; in <source>Advances in Neural Information Processing Systems 19</source>, eds <person-group person-group-type="editor"><name><surname>Bernhard</surname> <given-names>S.</given-names></name> <name><surname>John</surname> <given-names>P.</given-names></name> <name><surname>Thomas</surname> <given-names>H.</given-names></name></person-group> (<publisher-name>MIT Press</publisher-name>), <fpage>1345</fpage>&#x02013;<lpage>1352</lpage>.</citation></ref>
<ref id="B66"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thurman</surname> <given-names>S. M.</given-names></name> <name><surname>Grossman</surname> <given-names>E. D.</given-names></name></person-group> (<year>2008</year>). <article-title>Temporal &#x02018;bubbles&#x02019; reveal key features for point-light biological motion perception</article-title>. <source>J. Vis.</source> <volume>8</volume>, <fpage>1</fpage>&#x02013;<lpage>11</lpage>.<pub-id pub-id-type="doi">10.1167/8.3.28</pub-id></citation></ref>
<ref id="B67"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tomasello</surname> <given-names>M.</given-names></name></person-group> (<year>1999</year>). <article-title>The human adaptation for culture</article-title>. <source>Annu. Rev. Anthropol.</source> <volume>28</volume>, <fpage>509</fpage>&#x02013;<lpage>529</lpage>.<pub-id pub-id-type="doi">10.1146/annurev.anthro.28.1.509</pub-id></citation></ref>
<ref id="B68"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Turella</surname> <given-names>L.</given-names></name> <name><surname>Wurm</surname> <given-names>M. F.</given-names></name> <name><surname>Tucciarelli</surname> <given-names>R.</given-names></name> <name><surname>Lingnau</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Expertise in action observation: recent neuroimaging findings and future perspectives</article-title>. <source>Front. Hum. Neurosci.</source> <volume>7</volume>:<fpage>637</fpage>.<pub-id pub-id-type="doi">10.3389/fnhum.2013.00637</pub-id></citation></ref>
<ref id="B69"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ulloa</surname> <given-names>E. R.</given-names></name> <name><surname>Pineda</surname> <given-names>J. A.</given-names></name></person-group> (<year>2007</year>). <article-title>Recognition of point-light biological motion: mu rhythms and mirror neuron activity</article-title>. <source>Behav. Brain Res.</source> <volume>183</volume>, <fpage>188</fpage>&#x02013;<lpage>194</lpage>.<pub-id pub-id-type="doi">10.1016/j.bbr.2007.06.007</pub-id><pub-id pub-id-type="pmid">17658625</pub-id></citation></ref>
<ref id="B70"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Want</surname> <given-names>S. C.</given-names></name> <name><surname>Harris</surname> <given-names>P. L.</given-names></name></person-group> (<year>2002</year>). <article-title>How do children ape? Applying concepts from the study of non-human primates to the developmental study of &#x02018;imitation&#x02019; in children</article-title>. <source>Dev. Sci.</source> <volume>5</volume>, <fpage>1</fpage>&#x02013;<lpage>14</lpage>.<pub-id pub-id-type="doi">10.1111/1467-7687.00194</pub-id></citation></ref>
</ref-list>
</back>
</article>