Learning 3D shape spaces from videos
-
1
Honda Research Institute Europe GmbH, Germany
We introduce an architecture for unsupervised learning of representations of the three-dimensional shape of objects from movies. During the unsupervised learning phase, the system optimizes a slowness learning rule and builds up a pose-invariant and shape-specific representation, i.e., objects of similar shape cluster independently of viewing angle and views of distinct shapes cluster in distinct region of the feature space. Furthermore, the system generalizes to previously unseen shapes that result from 3D-morphing between the training objects. The model consists of four hierarchical converging layers with increasing receptive field sizes. Each layer implements the same optimization of Slow Feature Analysis. The representations n the top layer of the model thus extract those features that on average change slowly or rarely over time. During the training phase, views of objects are presented to the system. The objects are freely rotated in space, either rendered artificially (``rendered') or in videos of physical objects presented to a camera (``video'). For the ``rendered' dataset, these consist of views for five geometric shapes. In the ``video' dataset, views of geometric objects, toys and household objects are recorded with a camera while they are freely rotated in space.
After learning, views of the same object under different perspectives cluster in the generated feature space, which allows a high classification performance. While this property has been reported before, we show here that the system can generalize to views of completely new objects in a meaningful way. After learning on the ``rendered' dataset, the system is tested with morphed views of shapes generated from 3D interpolation between the training shapes. The representations of such morph views form compact volumes between the training object clusters (Figure 1 in Supplemental materials) and encode geometric properties instead of low-level view features. We argue that this representation forms a shape space, i.e., a parametrization of 3d shape from single 2D views. For the ``video' dataset, clusters are less compact but still allow good classification rates.
A shape space representation generated from object views in a biologically plausible model is a step towards unsupervised learning of affordance-based representations. The shape of an object (not its appearance) determines many of its physical properties -- specifically how it can be grasped. The system provides a basis for integrating affordances into object representations, with potential for automated object manipulation in robotic systems. Additionally it provides a new approach for data-driven learning of Geon-like shape primitives from real image data. This model of a converging hierarchy of modules optimizing the slowness function has earlier been successfully applied to many areas, including modeling the early visual system, learning invariant object recognition, and learning of hippocampal codes. Slowness learning might thus be a general principle for sensory processing in the brain.
Conference:
Bernstein Conference on Computational Neuroscience, Frankfurt am Main, Germany, 30 Sep - 2 Oct, 2009.
Presentation Type:
Poster Presentation
Topic:
Sensory processing
Citation:
Franzius
M,
Wersing
H and
Korner
E
(2009). Learning 3D shape spaces from videos.
Front. Comput. Neurosci.
Conference Abstract:
Bernstein Conference on Computational Neuroscience.
doi: 10.3389/conf.neuro.10.2009.14.138
Copyright:
The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers.
They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.
The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.
Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.
For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.
Received:
27 Aug 2009;
Published Online:
27 Aug 2009.
*
Correspondence:
Mathias Franzius, Honda Research Institute Europe GmbH, Offenbach, Germany, mathias.franzius@honda-ri.de