Event Abstract

Learning visual motion and structure as latent explanations from huge motion data sets

  • 1 Goethe-Universität Frankfurt, Bernstein Focus Neurotechnology (BFNT), Germany

The term ‘motion’ denotes a simple explanation for a possibly complicated change of the illumination pattern sensed in an eye or a camera. Motion explains these changes by transformation of the brightness pattern such as e.g. shift, rotation, and scaling. Other important components of these ‘explanations’ are segmentation (= grouping into spatially connected objects) and scene depth.

In that sense, ‘motion’ is an, in an information-theoretic sense, ’cheap’ description of the second and further image in a sequence, conditioned on the first image to be given. The question is, based on which principles, purely based on long-term observation, perception is capable to distill concepts such as motion, depth, and segmentation from the continuous stream of visual input data, without having a priori access to mathematical models of the world and to models of the signals observed which a priori employ these ‘explanatory entities’.

Optical flow is the representation of motion in the image plane, and ‘technical’ equations such as the brightness constancy constraint equation (BCCE) state an explicit relation between the observable entities (spatio-temporal derivates of the image signal), and the unknown explanatory variable, i.e. motion. However: how can such a relation be learnt, instead of constructed on the basis of an already available ‘higher insight’?

We suggest that the emergence of concepts such as motion in a visual perception system is largely supported by (ego)motoric information, and by discovering statistical correlations (not necessarily only linear ones) between motoric data and characteristics of the instantaneous spatio-temporal characteristics of the visual motion field. Both local motion (as it appears in the optical flow equations) as well as global motion (e.g. parametric descriptions of the complete visual motion field) are claimed to be informative ‘latent variables’ that emerge from a statistical analysis of observable sensory information, in particular the spatio-temporal image signal as well as motoric signals.

We are currently exploring this hypothesis on the basis of a large-scale experiment where an autonomous robot continuously ‘explores’ an indoor environment (using standard methods), and collects huge multi-channel video data streams that will be subject to an in-depth analysis in the spirit of the approach sketched above. In contrast to [1], we address specifically to learn (=identify) the latent variables, not primarily to learn the distribution of the parameters. This is an approach that is more in the spirit of [2] where also transformations are learnt.

References

[1] D. Sun, S. Roth, J. P. Lewis, M.J. Black: Learning Optical Flow. Springer LNCS; Vol. 5304, Proc. Eur. Conf. Computer Vision, 2008, pp. 83 - 97

[2] Memisevic, R., Hinton, G.: Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines. Neural Computation, June 2010, Vol. 22, No. 6: pp.1473-1492.

Keywords: computational neuroscience

Conference: Bernstein Conference on Computational Neuroscience, Berlin, Germany, 27 Sep - 1 Oct, 2010.

Presentation Type: Presentation

Topic: Bernstein Conference on Computational Neuroscience

Citation: Mester R, Guevara A, Conrad C and Friedrich H (2010). Learning visual motion and structure as latent explanations from huge motion data sets. Front. Comput. Neurosci. Conference Abstract: Bernstein Conference on Computational Neuroscience. doi: 10.3389/conf.fncom.2010.51.00061

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 16 Sep 2010; Published Online: 23 Sep 2010.

* Correspondence: Dr. Rudolf Mester, Goethe-Universität Frankfurt, Bernstein Focus Neurotechnology (BFNT), Frankfurt, Germany, mester@vsi.cs.uni-frankfurt.de