<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Robot. AI</journal-id>
<journal-title>Frontiers in Robotics and AI</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Robot. AI</abbrev-journal-title>
<issn pub-type="epub">2296-9144</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/frobt.2020.600584</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Robotics and AI</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Combining Self-Organizing and Graph Neural Networks for Modeling Deformable Objects in Robotic Manipulation</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Valencia</surname> <given-names>Angel J.</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/1018170/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Payeur</surname> <given-names>Pierre</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1161333/overview"/>
</contrib>
</contrib-group>
<aff><institution>School of Electrical Engineering and Computer Science, University of Ottawa</institution>, <addr-line>Ottawa, ON</addr-line>, <country>Canada</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: David Navarro-Alarcon, Hong Kong Polytechnic University, Hong Kong</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Jihong Zhu, Delft University of Technology, Netherlands; Juan Antonio Corrales Ramon, Sigma Clermont, France</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Pierre Payeur <email>ppayeur&#x00040;uottawa.ca</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Robotic Control Systems, a section of the journal Frontiers in Robotics and AI</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>23</day>
<month>12</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="collection">
<year>2020</year>
</pub-date>
<volume>7</volume>
<elocation-id>600584</elocation-id>
<history>
<date date-type="received">
<day>30</day>
<month>08</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>23</day>
<month>11</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2020 Valencia and Payeur.</copyright-statement>
<copyright-year>2020</copyright-year>
<copyright-holder>Valencia and Payeur</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract><p>Modeling deformable objects is an important preliminary step for performing robotic manipulation tasks with more autonomy and dexterity. Currently, generalization capabilities in unstructured environments using analytical approaches are limited, mainly due to the lack of adaptation to changes in the object shape and properties. Therefore, this paper proposes the design and implementation of a data-driven approach, which combines machine learning techniques on graphs to estimate and predict the state and transition dynamics of deformable objects with initially undefined shape and material characteristics. The learned object model is trained using RGB-D sensor data and evaluated in terms of its ability to estimate the current state of the object shape, in addition to predicting future states with the goal to plan and support the manipulation actions of a robotic hand.</p></abstract>
<kwd-group>
<kwd>deformable objects</kwd>
<kwd>dynamic shape modeling</kwd>
<kwd>manipulation</kwd>
<kwd>robotics</kwd>
<kwd>shape</kwd>
<kwd>sensing</kwd>
</kwd-group>
<counts>
<fig-count count="7"/>
<table-count count="1"/>
<equation-count count="9"/>
<ref-count count="34"/>
<page-count count="11"/>
<word-count count="7612"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>In the context of robotic manipulation, object models are used to provide feedback signals that a robot can control when performing a specific task. For deformable objects, the object pose is not a sufficient state representation (Khalil et al., <xref ref-type="bibr" rid="B14">2010</xref>) to guarantee even low-level manipulation tasks (e.g., pick-and-place), as manipulation actions produce changes in the object shape. Likewise, high-level manipulation tasks (e.g., making a bed or cleaning surfaces) involve knowledge of future behaviors to develop hierarchical plans. Therefore, an object model that integrates shape representation and prediction is required in order to perform a variety of tasks with deformable objects.</p>
<p>Early attempts to estimate the object shape in robotic manipulation mainly adopted an analytical approach, which is commonly adjusted in simulation (Nadon et al., <xref ref-type="bibr" rid="B22">2018</xref>). This comes with some drawbacks in real robotic environments, as simulators are currently not sophisticated enough to provide realistic models of non-rigid objects (Billard and Kragic, <xref ref-type="bibr" rid="B3">2019</xref>), and the support for sensor measurements and hardware in simulators is very limited. Furthermore, certain assumptions about objects are often made (e.g., homogeneous composition or isotropic materials). On the contrary, it is rarely possible to determine these conditions in advance for every new object encountered in the environment. This lack of a general-purpose methodology to estimate the object shape makes it difficult to develop more autonomous and dexterous robotic manipulation systems capable to handle deformable objects (Sanchez et al., <xref ref-type="bibr" rid="B28">2018</xref>).</p>
<p>In this paper, we present a data-driven approach to estimate and predict the state of initially unknown deformable objects without the dependency on simulators or predefined material parameters. The contributions of this work can be summarized as follows: First, we develop a method for shape estimation using Self-Organizing Neural Networks (SONNs). Second, we design and implement an original method for shape prediction using Graph Neural Networks (GNNs) that leverages the initial SONN-based model. Third, we test the combination of the shape estimation and prediction methods as a learned model of deformable objects in real robotic environments. This paper represents a significant extension to previous work (Valencia et al., <xref ref-type="bibr" rid="B31">2019</xref>) that corroborates the learned model across different types of deformable objects with experimental evaluations.</p>
</sec>
<sec id="s2">
<title>2. Related Work</title>
<p>Various methods that explore analytical modeling approaches for non-rigid objects in robotic environments are inspired by physics-based models, extensively studied in computer graphics (Nealen et al., <xref ref-type="bibr" rid="B23">2006</xref>). These include continuous mesh models such as Euler-Bernoulli (EB) (Fugl et al., <xref ref-type="bibr" rid="B10">2012</xref>), linear Finite Element Method (FEM) (Lang et al., <xref ref-type="bibr" rid="B16">2002</xref>; Frank et al., <xref ref-type="bibr" rid="B8">2014</xref>; Jia et al., <xref ref-type="bibr" rid="B13">2014</xref>; Petit et al., <xref ref-type="bibr" rid="B26">2015</xref>; Duenser et al., <xref ref-type="bibr" rid="B7">2018</xref>) and non-linear FEM (Leizea et al., <xref ref-type="bibr" rid="B18">2017</xref>; Sengupta et al., <xref ref-type="bibr" rid="B29">2020</xref>). Also, discrete mesh models such as linear Mass-Spring Systems (MSS) (Leizea et al., <xref ref-type="bibr" rid="B17">2014</xref>) and non-linear MSS (Zaidi et al., <xref ref-type="bibr" rid="B34">2017</xref>) are considered. Additionally, discrete particle models such as Position Based Dynamics (PBD) (G&#x000FC;ler et al., <xref ref-type="bibr" rid="B12">2015</xref>) have been introduced. In these methods, a crucial step is to determine the material parameters of a deformable object (e.g., Young&#x00027;s modulus and Poisson&#x00027;s ratio). This is typically done via specific sensor measurements or assuming prior material information. More generally, these parameters are obtained by simultaneously tracking the shape while applying optimization techniques in the model.</p>
<p>Alternatively, data-driven approaches leverage sensor data to approximate the behavior of deformable objects typically using learning-based models. These include Single-layer Perceptron (SLP) (Cretu et al., <xref ref-type="bibr" rid="B6">2012</xref>; Tawbe and Cretu, <xref ref-type="bibr" rid="B30">2017</xref>). Other methods combine analytical and data-driven approaches in different parts of the modeling pipeline. For example, a Gaussian Process Regression (GPR) is used to estimate the deformability parameter of a PBD model (Caccamo et al., <xref ref-type="bibr" rid="B5">2016</xref>). An Evolutionary Algorithm (EA) is proposed to search for the parameter space of an MSS model (Arriola-Rios and Wyatt, <xref ref-type="bibr" rid="B1">2017</xref>). In these methods, an important aspect for a correct modeling is the information extracted from the sensor measurements. For RGB-D data, these correspond to properties of the shape (e.g., surfaces or feature points) and typically provide a structured representation suitable for the type of deformation model used. As such, B-spline snakes (Arriola-Rios and Wyatt, <xref ref-type="bibr" rid="B1">2017</xref>) can be used to create a mesh-like representation. On the other hand, optical flow (G&#x000FC;ler et al., <xref ref-type="bibr" rid="B12">2015</xref>) and neural gas (Cretu et al., <xref ref-type="bibr" rid="B6">2012</xref>) are used to create a particle-like representation.</p>
<p>Recent learning-based models such as Graph Neural Networks (GNN) have demonstrated the ability to act as a physics engine (Battaglia et al., <xref ref-type="bibr" rid="B2">2016</xref>; Mrowca et al., <xref ref-type="bibr" rid="B21">2018</xref>). Although there is little exploration of training such models using only sensor measurements. The most advanced attempt to model deformable objects beyond simulation is presented in Li et al. (<xref ref-type="bibr" rid="B19">2019a</xref>), where a real robotic gripper performs a shape control task on a deformable object. However, the models are initially trained entirely in simulation. Conversely, while aiming at exploiting real shape measurements for the modeling and prediction stages, this paper expands on the work of Cretu et al. (<xref ref-type="bibr" rid="B6">2012</xref>), as we aim to contribute a general-purpose methodology for modeling deformable objects in real robotic environments. In this way, we extend the latter by exploring recent learning-based models with physical reasoning capabilities (Battaglia et al., <xref ref-type="bibr" rid="B2">2016</xref>) using RGB-D sensor measurements.</p>
</sec>
<sec sec-type="methods" id="s3">
<title>3. Methodology</title>
<p>In this section, the proposed data-driven approach to model deformable objects is introduced (<xref ref-type="fig" rid="F1">Figure 1</xref>). The main components of the learned object model are the shape estimation and prediction methods.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Components of the proposed data-driven approach for deformation modeling. The framework takes as input RGB-D sensor images and robotic hand&#x00027;s fingertips pose. (Data Processing) object and robot manipulation actions are detected. (Learned Object Model) the processed data is combined for the estimation and prediction of the object shape.</p></caption>
<graphic xlink:href="frobt-07-600584-g0001.tif"/>
</fig>
<sec>
<title>3.1. Shape Estimation</title>
<p>A Self-Organizing Neural Network (SONN) based model is proposed to estimate the object state from the sensor measurements. This model is called Batch Continual Growing Neural Gas (BC-GNG) and is an extension of the continual formulation of the Growing Neural Gas (C-GNG) algorithm (Orts-Escolano et al., <xref ref-type="bibr" rid="B24">2015</xref>). C-GNG is extended by implementing a batch training procedure that enables to update the model parameters while avoiding an individual iteration on each sample during the execution of the algorithm. This approach provides benefits such as computational efficiency and faster convergence. First, the core principles of GNG models are described and then the technical details of our proposal are explained.</p>
<p><italic>Growing Neural Gas:</italic> A GNG model (Fritzke, <xref ref-type="bibr" rid="B9">1995</xref>) produces a graph representation <italic>G</italic> &#x0003D; (<italic>O, R</italic>) from a data distribution <italic>P</italic> of size <italic>N</italic>. Where, <inline-formula><mml:math id="M1"><mml:mi>O</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>o</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:math></inline-formula> is the set of nodes with <inline-formula><mml:math id="M2"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> cardinality, and <inline-formula><mml:math id="M3"><mml:mi>R</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>r</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:math></inline-formula> is the set of edges with <inline-formula><mml:math id="M4"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> cardinality, which connects an unordered pair of nodes <italic>u</italic><sub><italic>k</italic></sub> and <italic>v</italic><sub><italic>k</italic></sub>. Also, each node has an associated feature vector <bold>o</bold><sub><italic>i</italic></sub> &#x0003D; {<bold><italic>x</italic></bold><sub><italic>i</italic></sub>, <italic>e</italic><sub><italic>i</italic></sub>}, which contains the position and spatial error, respectively. Likewise, each edge has an associated feature vector <bold>e</bold><sub><italic>k</italic></sub> &#x0003D; {<italic>a</italic><sub><italic>k</italic></sub>}, which contains the connection age. The position is a direct measure of the spatial location of a node with respect to the sample, while the spatial error and connection age serve as measures for the addition and removal processes of nodes and edges from the graph.</p>
<p>The GNG model receives as input distribution, <italic>P</italic>, the current frame of the point cloud data associated to the object and produces the graph, <italic>G</italic>, as an estimation of the object shape. The model is trained following the execution of Algorithm 1. First, the graph is initialized by creating two nodes with position set to random values and spatial error set to zero. In addition, an edge connecting these nodes is created with age set to zero. After initialization, an individual sample, &#x003BE;, is randomly drawn from the distribution, and then the ADAPTATION and GROWING phases are run. During the former, nodes and edges features are sequentially updated, while during the latter and after receiving a certain number of samples, &#x003BB;, new nodes and edges are added to the graph. These phases follow the original algorithm proposed by Fritzke (<xref ref-type="bibr" rid="B9">1995</xref>). In this work, the algorithm is executed until the quantization error (QE) reaches certain limit, which gives more flexibility to control the representation, as during the GROWING phase, nodes are dynamically created in an attempt to best fit the samples available in the input data, but does not require setting a fixed number of nodes. The quantization error is evaluated over the distribution <italic>P</italic> and computes the average difference between the closest node position (i.e., the smallest Euclidean distance) <bold><italic>x</italic></bold><sub><italic>s</italic><sub>1</sub></sub> and the associated sample &#x003BE;.</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">QE</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003BE;</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mi>&#x003BE;</mml:mi><mml:mo>|</mml:mo><mml:mo>|</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<table-wrap position="float">
<label>Algorithm 1</label>
<caption><p>Steps of computation in GNG</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><bold>Input:</bold> &#x000A0;<italic>P</italic></td></tr>
<tr><td align="left" valign="top"><bold>Output:</bold> &#x000A0;<italic>G</italic></td></tr>
<tr><td align="left" valign="top">1: &#x000A0;<italic>G</italic> &#x02190; init_graph(<italic>P</italic>)</td></tr>
<tr><td align="left" valign="top">2: &#x000A0;<bold>while</bold> QE &#x0003E; QE<sub>max</sub> <bold>do</bold></td></tr>
<tr><td align="left" valign="top">3: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <bold>for all</bold> n &#x02208; <italic>N</italic> <bold>do</bold></td></tr>
<tr><td align="left" valign="top">4: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; &#x003BE; &#x0007E; <italic>P</italic></td></tr>
<tr><td align="left" valign="top">5: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <sc>adaptation</sc>(<italic>G</italic>, &#x003BE;)</td></tr>
<tr><td align="left" valign="top">6: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <bold>if</bold> (n <bold>mod</bold> &#x003BB;) &#x0003D; 0 <bold>then</bold></td></tr>
<tr><td align="left" valign="top">7: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <sc>growing</sc>(<italic>G</italic>)</td></tr>
<tr><td align="left" valign="top">8: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <bold>end if</bold></td></tr>
<tr><td align="left" valign="top">9: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <bold>end for</bold></td></tr>
<tr><td align="left" valign="top">10: &#x000A0;<bold>end while</bold></td></tr>
</tbody>
</table>
</table-wrap>
<p><italic>Outlier Regularization:</italic> One problem that limits the use of GNG in problems with time constraints, such as tracking the shape of a deformable object as it evolves, relates to the requirement to retrain the model for every new input distribution collected by sensors. A continual formulation of the Growing Neural Gas (C-GNG) (Orts-Escolano et al., <xref ref-type="bibr" rid="B24">2015</xref>) implements a technique that leverages the knowledge already learned during previous executions. Specifically, the graph from the previous data frame <italic>G</italic><sub><italic>t</italic>&#x02212;1</sub> is used to initialize the graph in the current data frame <italic>G</italic>. This provides a significant practical improvement but makes its formulation more sensitive. For example, outliers can affect the graph by creating nodes that do not adapt to the input distribution. The presence of these dead nodes represents a serious issue for the estimation of the object shape, especially when it is meant to vary over time. Therefore, we propose to regularize the influence of the outliers during the procedure that updates the position feature of each node. During the ADAPTATION phase, the nodes position are updated (Equation 2) for those that are found as closest <bold><italic>x</italic></bold><sub><italic>s</italic><sub>1</sub></sub> or topological neighbors <bold><italic>x</italic></bold><sub><italic>n</italic></sub> to the sample &#x003BE;. The parameters, &#x003F5;<sub><italic>s</italic><sub>1</sub></sub>, and, &#x003F5;<sub><italic>n</italic></sub>, correspond to the learning rates that control the influence of the adjustment of each contribution to the position feature.</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mtable style="text-align:axis;" equalrows="false" columnalign="left" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003BE;</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003BE;</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>We introduce a new term, <italic>w</italic><sub><italic>s</italic><sub>1</sub></sub>, that modifies the learning rate of the closest node position (Equation 3). In this way, those pairs of nodes and samples for which distances are large are penalized due to the possibility of being outliers, whereas those with small distances remain unchanged.</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M7"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003BE;</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>This regularization term (Equation 4) evaluates a 1D Gaussian kernel function with mean equal to the difference between the Euclidean distance ||<bold><italic>x</italic></bold><sub><italic>s</italic><sub>1</sub></sub>&#x02212;&#x003BE;|| and maximum quantization error QE<sub>max</sub>. And, standard deviation proportional to the maximum quantization error QE<sub>max</sub>.</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M8"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable style="text-align:axis;" equalrows="false" columnlines="none" equalcolumns="false" class="array"><mml:mtr><mml:mtd><mml:mn>1</mml:mn><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:mi>&#x003BC;</mml:mi><mml:mo>&#x0003C;</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>K</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003BC;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mi>&#x003BC;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:msup><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:mtext class="textrm" mathvariant="normal">others</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p><italic>Batch Training:</italic> We also introduce a new procedure to update the features of the nodes and edges in batches, which unifies the contributions of a node with respect to its role among the samples. First, the node position is updated by combining the contributions when the node is found as closest and as topological neighbor. Similarly, the age of the edges connecting the closest node with its neighbors is updated by accumulating the times in which the node is found as closest. More specifically, the Euclidean distances between all the samples and nodes position are computed, also finding the two closest nodes at once. With this information, the input distribution can be represented as a set <inline-formula><mml:math id="M9"><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:math></inline-formula>, where <italic>P</italic><sub><italic>i</italic></sub> is the batch data associated with each node found as closest, with size <italic>N</italic><sub><italic>i</italic></sub>. In this way, the contribution of each node as closest is reformulated (Equation 5) as the average of the distances paired with that particular closest node.</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003BE;</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003BE;</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Also, the age of the neighbor edges is reformulated as an increment of the batch data size <italic>N</italic><sub><italic>i</italic></sub>. Since nodes are likely to be connected with more than one edge in the graph, the contribution of each node as neighbor requires an additional consideration. Initially, all the distances between the node and the samples associated due to the connections with all its neighbor nodes are collected, then the average of the collected distances is computed (Equation 6) similarly as in the previous step.</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M11"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x003BE;</mml:mi><mml:mo>&#x02208;</mml:mo><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munder></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x003BE;</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Where, <italic>P</italic><sub><italic>j</italic></sub> is the batch data of each neighbor of the closest node and <italic>D</italic><sub><italic>i</italic></sub> is the number of edges of the closest node. Thus, the contributions of each node as closest and as neighbor are included in a single expression to update the position feature (Equation 7), thus replacing the two-step update process with only an ADAPTATION phase in the online training, as detailed in Algorithm 2. By computing the Euclidean distance for all the samples at once, this procedure is also highly parallelizable as nodes can be updated independently.</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M12"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003F5;</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<table-wrap position="float">
<label>Algorithm 2</label>
<caption><p>Steps of computation in BC-GNG</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><bold>Input:</bold> &#x000A0;<italic>P</italic>, <italic>G</italic><sub><italic>t</italic>&#x02212;1</sub></td></tr>
<tr><td align="left" valign="top"><bold>Output:</bold> &#x000A0;<italic>G</italic><sub><italic>t</italic></sub></td></tr>
<tr><td align="left" valign="top">&#x000A0;1: &#x000A0;<bold>if</bold> <italic>t</italic> &#x0003D; 1 <bold>then</bold></td></tr>
<tr><td align="left" valign="top">&#x000A0;2: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>G</italic> &#x02190; GNG(<italic>P</italic>)</td></tr>
<tr><td align="left" valign="top">&#x000A0;3: &#x000A0;<bold>else</bold></td></tr>
<tr><td align="left" valign="top">&#x000A0;4: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <italic>G</italic> &#x02190; <italic>G</italic><sub><italic>t</italic>&#x02212;1</sub></td></tr>
<tr><td align="left" valign="top">&#x000A0;5: &#x000A0;<bold>end if</bold></td></tr>
<tr><td align="left" valign="top">&#x000A0;6: &#x000A0;<bold>while</bold> QE &#x0003E; QE<sub>max</sub> <bold>do</bold></td></tr>
<tr><td align="left" valign="top">&#x000A0;7: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <sc>adaptation</sc>(<italic>G, P</italic>)</td></tr>
<tr><td align="left" valign="top">&#x000A0;8: &#x000A0;<bold>end while</bold></td></tr> 
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>3.2. Shape Prediction</title>
<p>As described in section 1, shape estimation alone does not provide sufficient information to perform high-level manipulation tasks. Therefore, a prediction phase must be incorporated in order to characterize the future states of a deformable object. With the objective to support the requirements of path planning and dynamic interaction of a robotic hand with a deformable object, Graph Neural Network (GNN) based models are also adapted in our framework to predict the future object state using the information of the current object state and the manipulation actions of the robotic hand. Specifically, we use the Interaction Network (IN) framework (Battaglia et al., <xref ref-type="bibr" rid="B2">2016</xref>) along with its extension known as PropNet (Li et al., <xref ref-type="bibr" rid="B20">2019b</xref>) for supervised learning on graph structures. Unlike standard GNNs, the IN is specifically designed to learn the dynamics of physical interactive systems. This model is characterized by being able to make predictions for future states of the system, and also to extract latent physical properties.</p>
<sec>
<title>3.2.1. Object-Action Representation</title>
<p>A new representation is created to jointly capture the object shape and the manipulation actions. This is defined as a directed graph <italic>G</italic> &#x0003D; &#x02329;<italic>O, R</italic>&#x0232A;. In which, <inline-formula><mml:math id="M13"><mml:mi>O</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>o</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:math></inline-formula> is the set of nodes with <inline-formula><mml:math id="M14"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> cardinality, and associated feature vector <bold>o</bold><sub><italic>i</italic></sub> &#x0003D; {<bold><italic>x</italic></bold><sub><italic>i</italic></sub>, <bold><italic>v</italic></bold><sub><italic>i</italic></sub>}, which contains the object-action state defined as position and velocity. Also, <inline-formula><mml:math id="M15"><mml:mi>R</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>r</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:math></inline-formula> is the set of edges with <inline-formula><mml:math id="M16"><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> cardinality, which due to the graph directionality connects an ordered pair of nodes, defined as sender node <italic>u</italic><sub><italic>k</italic></sub> and receiver node <italic>v</italic><sub><italic>k</italic></sub>.</p>
<p>The object state is the shape estimation <italic>G</italic> produced by the BC-GNG model (section 3.1) and the manipulation actions are included as contact points, which are captured from the fingertips pose of the robotic hand. This means that new nodes are added to the graph with their feature corresponding to the position components of the fingertips pose. Also, edges are created when physical interactions are detected between the fingertips and the object, thus assigning action nodes for the fingertips as senders, and object nodes as receivers in the directed graph. The edge direction adds a causality property, indicating that action nodes produce the displacement of the object nodes and not the opposite. Furthermore, the velocity feature is computed by differentiating the signal obtained by the position feature of the object-action nodes.</p>
</sec>
<sec>
<title>3.2.2. Interaction Networks</title>
<p>An IN model is trained to learn the transition dynamics of the object state. It takes the object-action graph at a certain time step <italic>G</italic><sub><italic>t</italic></sub> and outputs a prediction of the nodes position of the graph for the next time step <inline-formula><mml:math id="M17"><mml:msub><mml:mrow><mml:mover accent="false"><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>. The model is updated following the execution of Algorithm 3, which uses the evaluation of aggregation and update functions (Gilmer et al., <xref ref-type="bibr" rid="B11">2017</xref>) to perform computations with the graph features. The update function, &#x003D5;<sub><italic>R</italic></sub>, is responsible to perform per-edge updates. This function evaluates the collected features of the edge along with the sender and receiver nodes, and thus computes the edge effect. Similarly, the update function, &#x003D5;<sub><italic>O</italic></sub>, is responsible to perform per-node updates. This function evaluates the collected features of the node along with those produced by the update function, and thus computes the node effect. Since the update function produces a variable number of effects associated with each node, these are reduced using an aggregation function, &#x003C1;<sub><italic>R</italic> &#x02192; <italic>O</italic></sub>, in order to produce a single effect.</p>
<table-wrap position="float">
<label>Algorithm 3</label>
<caption><p>Steps of computation in IN</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><bold>Input:</bold> &#x000A0;<italic>G</italic><sub><italic>t</italic></sub></td></tr>
<tr><td align="left" valign="top"><bold>Output:</bold> &#x000A0;<inline-formula><mml:math id="M20"><mml:msub><mml:mrow><mml:mover accent="false"><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td></tr>
<tr><td align="left" valign="top">&#x000A0;1: &#x000A0;<inline-formula><mml:math id="M21"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>r</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>o</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>o</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:math></inline-formula></td></tr>
<tr><td align="left" valign="top">&#x000A0;2: &#x000A0;<inline-formula><mml:math id="M22"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>O</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:math></inline-formula></td></tr>
<tr><td align="left" valign="top">&#x000A0;3: &#x000A0;<inline-formula><mml:math id="M23"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>o</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle class="text"><mml:mtext mathvariant="bold">o</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:math></inline-formula></td></tr> 
</tbody>
</table>
</table-wrap>
<p>The update functions are implemented as Multi-layer Perceptron (MLP) modules while the aggregation function is a summation. The mean squared error (MSE) of the predicted and observed nodes velocity (Equation 8) is used as the loss function to train the models. This statistical metric computes the average of the squared errors between the predicted velocities, <inline-formula><mml:math id="M18"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>v</mml:mi></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, and the observed velocities, <bold><italic>v</italic></bold><sub><italic>i, t</italic>&#x0002B;1</sub>.</p>
<disp-formula id="E8"><label>(8)</label><mml:math id="M19"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">MSE</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover></mml:mstyle><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>v</mml:mi></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>v</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
</sec>
<sec>
<title>3.2.3. Propagation Networks</title>
<p>A limitation of the IN occurs for systems that require long and fast propagation effects, since its formulation only considers local pairwise interactions during each time step. Therefore, several iterations of the algorithm are needed in order to propagate the information on the graph, and thus reach remote nodes. As an extension to IN, the PropNet (Li et al., <xref ref-type="bibr" rid="B20">2019b</xref>) formulation (Algorithm 4) proposes the inclusion of a multi-step propagation phase, which consists of computing the edge and node effects using an additional iterative process, where <italic>l</italic> corresponds to the current propagation step parameter, and is set to a value within the range of 1 &#x02264; <italic>l</italic> &#x02264; <italic>L</italic>. Also, the update functions, <inline-formula><mml:math id="M24"><mml:msubsup><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">enc</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup></mml:math></inline-formula>, <inline-formula><mml:math id="M25"><mml:msubsup><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">enc</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup></mml:math></inline-formula>, are used to encode the input edge and node features, respectively. While the function, <inline-formula><mml:math id="M26"><mml:msubsup><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">dec</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup></mml:math></inline-formula>, is used to decode the output node feature. In this way, these functions learn a latent representation of the graph features, which are also part of the model during training.</p>
<table-wrap position="float">
<label>Algorithm 4</label>
<caption><p>Steps of computation in PropNet</p></caption>
<table frame="hsides" rules="groups">
<tbody>
<tr><td align="left" valign="top"><bold>Input:</bold> &#x000A0;<italic>G</italic><sub><italic>t</italic></sub></td></tr>
<tr><td align="left" valign="top"><bold>Output:</bold> &#x000A0;<inline-formula><mml:math id="M27"><mml:msub><mml:mrow><mml:mover accent="false"><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td></tr>
<tr><td align="left" valign="top">&#x000A0;1: &#x000A0; <inline-formula><mml:math id="M28"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>o</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">enc</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup><mml:mo>&#x02190;</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">enc</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup><mml:msub><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>o</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:math></inline-formula></td></tr>
<tr><td align="left" valign="top">&#x000A0;2: &#x000A0; <inline-formula><mml:math id="M29"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>r</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">enc</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup><mml:mo>&#x02190;</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">enc</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup><mml:msub><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>r</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>o</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>o</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:math></inline-formula></td></tr>
<tr><td align="left" valign="top">&#x000A0;3: &#x000A0; <inline-formula><mml:math id="M30"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x02190;</mml:mo><mml:mstyle mathvariant="bold"><mml:mn>0</mml:mn></mml:mstyle></mml:math></inline-formula></td></tr>
<tr><td align="left" valign="top">&#x000A0;4: &#x000A0; <bold>for all</bold> <italic>l</italic> &#x02208; <italic>L</italic> <bold>do</bold></td></tr>
<tr><td align="left" valign="top">&#x000A0;5: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <inline-formula><mml:math id="M31"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x02190;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>E</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>r</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">enc</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mstyle class="text"><mml:mtext mathvariant="bold">h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:math></inline-formula></td></tr>
<tr><td align="left" valign="top">&#x000A0;6: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <inline-formula><mml:math id="M32"><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x02190;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003C1;</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mo>&#x02192;</mml:mo><mml:mi>O</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:math></inline-formula></td></tr>
<tr><td align="left" valign="top">&#x000A0;7: &#x000A0;&#x000A0;&#x000A0;&#x000A0;&#x000A0; <inline-formula><mml:math id="M33"><mml:msubsup><mml:mrow><mml:mstyle class="text"><mml:mtext mathvariant="bold">h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x02190;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>o</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">enc</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>e</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:math></inline-formula></td></tr>
<tr><td align="left" valign="top">&#x000A0;8: &#x000A0; <bold>end for</bold></td></tr>
<tr><td align="left" valign="top">&#x000A0;9: &#x000A0; <inline-formula><mml:math id="M34"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>o</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02190;</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">dec</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup><mml:msub><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">N</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:math></inline-formula></td></tr> 
</tbody>
</table>
</table-wrap>
</sec>
</sec>
</sec>
<sec id="s4">
<title>4. Experimental Evaluation</title>
<sec>
<title>4.1. Experimental Setup</title>
<p>The configuration of the real robotic environment is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>, which consists of a Barrett BH8-280 robotic hand<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> resting on a flat table, an Intel RealSense SR305 RGB-D sensor<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref> mounted overhead on a tripod, and a deformable object placed on the palm of the robotic hand. The complete set of deformable objects used to construct the datasets is shown in <xref ref-type="fig" rid="F3">Figure 3</xref>.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Configuration of the real robotic environment. The location of the RGB-D sensor, deformable object and three-fingered robotic hand are marked in red.</p></caption>
<graphic xlink:href="frobt-07-600584-g0002.tif"/>
</fig>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Deformable objects used to construct the dataset. (Ball) plastic type found in toy stores. (Sponges) foam type used as cleaning utensils. (Towel) textile type found in warehouses. (Toy) stuffed type found in pillows or stuffed animals.</p></caption>
<graphic xlink:href="frobt-07-600584-g0003.tif"/>
</fig>
<p>All the sensors and hardware components used in the robotic manipulation setup are operated through ROS (Quigley et al., <xref ref-type="bibr" rid="B27">2009</xref>). The data preparation, signal and image processing steps are implemented using SciPy (Virtanen et al., <xref ref-type="bibr" rid="B32">2020</xref>) and OpenCV (Bradski, <xref ref-type="bibr" rid="B4">2000</xref>) libraries. The models are implemented in the Deep Graph Library (Wang et al., <xref ref-type="bibr" rid="B33">2019</xref>) using PyTorch (Paszke et al., <xref ref-type="bibr" rid="B25">2019</xref>) as backend.</p>
</sec>
<sec>
<title>4.2. Sensor Measurements and Data Processing</title>
<p>The RGB-D sensor data is processed in a ROS node to detect the object and generate the point cloud data. Also, another ROS node is used to estimate the robotic hand&#x00027;s fingertips pose to generate the manipulation action information.</p>
<sec>
<title>4.2.1. Object Detection</title>
<p>Classical image segmentation techniques are applied to both aligned color and depth images for the detection of the deformable objects. The color image is transformed to the HSV color space, and then a histogram backprojection technique is applied to obtain a binary mask. Then, the mask is filtered by applying a convolution with threshold operation to obtain a cleaner result. Moreover, the depth image is cropped by volume, truncating the spatial values based on available information about the object position relative to the camera. Thus, the resulting color and depth masks are combined and applied to the depth image to obtain the object of interest. The segmented image is then deprojected to convert the 2D pixels to 3D point clouds. For the small sponge object, this process transforms the RGB-D sensor images from 640 &#x000D7; 480 to approximate 80 &#x000D7; 80 &#x000D7; 3.</p>
</sec>
<sec>
<title>4.2.2. Fingertips Pose Estimation</title>
<p>The data captured on the fingertips correspond to the pose (position and orientation) of each tip. To facilitate the accurate estimation of the pose, a set of AR markers are placed on each tip. The design is based on the ARTags fiducial marker system, and generated according to the following parameters: size of 1.8 cm, margin of 1-bit, and pattern of 25-bit 5 &#x000D7; 5 array. The latter controls the number of tags that can be created based on the marker dictionary. Given the physical dimensions of the robotic hand, this design enables to precisely fit each marker on the tip. In turn, the markers are visible enough to be detected in the images captured by the RGB-D sensor. The fingertips pose corresponds to that estimated by the markers. The pose enables to define the contact points, which is determined by a contact region with spherical shape, centered on the marker and with a radius of 2.3 cm. The latter is measured considering the tip size relative to the marker location.</p>
</sec>
</sec>
<sec>
<title>4.3. Shape and Motion Estimation With GNG-Based Models</title>
<p>These experiments are run on a computer with 1 &#x000D7; Intel Core i5-7300U &#x00040; 2.60 GHz, 16 GB RAM, and GNU/Linux operating system. The parameters of the GNG models are shared as much as possible in order to consistently compare the performance of the different variations. For GNG, an age of 35, learning rate of 0.1 and 0.005, error decay of 0.5 and 0.9995 are used. For C-GNG, an age of 2,000, learning rate of 0.1 and 0.005. And for BC-GNG, an age of 2,000, learning rate of 0.4 and 0.01 are used. The sigma value of the regularization term used in C-GNG and BC-GNG corresponds to 0.6 for the towel and 4 for the rest of the objects.</p>
<p>The fingers trajectory are generated to perform a squeeze-like manipulation with each object. The base joints range is limited to (&#x02212;90&#x000B0;,90&#x000B0;), whereas the spread joint is limited to (&#x02212;45&#x000B0;,45&#x000B0;). Each trajectory is generated taking as final configuration a random joint position within the available moving range for each robotic finger, and using a linear interpolation with 50 points beginning from a predefined rest position of the hand. The trajectories are designed in this manner to produce brief rest periods at the end of each point with the intention of preventing slippage or sliding movements of the object, and thus mainly capturing information associated with the deformation. A dataset is created which consists of a file with 800 samples, using a sampling rate of 30 Hz. Each file stores the data generated in synchronization with the execution of the fingers trajectory, which takes approximately 27 s to complete. Results for a subset of the data frames that progressively reflects various deformation levels using the small sponge as an example of deformable object are shown in <xref ref-type="fig" rid="F4">Figure 4</xref>. We refer the reader to the <xref ref-type="supplementary-material" rid="SM1">Supplementary Material</xref> for additional results with the other deformables objects considered, as per <xref ref-type="fig" rid="F3">Figure 3</xref>.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Qualitative results of the shape estimation method. <bold>(Top)</bold> color image sequence of the scene with the small sponge at different deformation levels. <bold>(Bottom)</bold> point cloud and graph representation of the object shape obtained by the BC-GNG model.</p></caption>
<graphic xlink:href="frobt-07-600584-g0004.tif"/>
</fig>
<p>As mentioned in section 2, the properties extracted from the object shape are the basis for any learned model. This means that motion changes should closely capture the dynamics of the deformation. A motion analysis can be used to determine whether the produced shape estimation is consistent with the deformation and reflects the current state of the deformable object. The latter is formalized by also considering the requirements of real robotic environments.</p>
<sec>
<title>4.3.1. Real-Time Execution</title>
<p>We evaluate the performance of different variations of GNG for real-time shape estimation using point clouds as input data. The runtime of each model is recorded per data frame and the average over the entire manipulation trajectory is computed, as shown in a bar plot in <xref ref-type="fig" rid="F5">Figure 5</xref>. The models are evaluated using three levels of quantization error, which are selected to provide an insight of the precision costs associated to the representation. For GNG, the runtime takes an average of 80.7 s when a quantization error tolerance of QE = 0.005 is imposed, and grows linearly if more precision (lower quantization error) on the shape representation is required. On the other hand, C-GNG runtime is several orders of magnitude faster mainly due to the reuse of the previous graphs over iterations. Although, this formulation is a great improvement, its runtime is not yet suited for real-time applications, at least for low-power CPUs and embedded systems. It takes an average of 7.4 s at each data frame but reveals less sensitive to the tolerance set on the model precision. Finally, the proposed BC-GNG variation that involves batch training considerably speeds up the execution. In this case, the algorithm needs an average of 0.4 s to construct the same graph with only a slight variation in computing time when the desired model accuracy is varied. In certain cases, sudden increase in time is observed when more accuracy is required, as shown in QE = 0.003. This occurs in data frames with high variations, since graphs with a fixed number of nodes cannot always adapt to such levels of accuracy. Therefore, early stopping mechanisms are required to avoid unnecessary iterations.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Average of runtime sequence. GNG models formation computing time with respect to varying tolerance on model accuracy during the manipulation of the small sponge object.</p></caption>
<graphic xlink:href="frobt-07-600584-g0005.tif"/>
</fig>
</sec>
<sec>
<title>4.3.2. Temporal Smoothing</title>
<p>We evaluate the performance of different variations of GNG to generate stable displacements of the nodes that encode the object shape. The path followed by each individual node is measured relative to the centroid coordinate system to mitigate the influence of rigid motions. These local displacements estimate the actual deformation motion of the object shape. The 3-dimensional temporal evolution of local nodes for the small sponge object is shown in <xref ref-type="fig" rid="F6">Figure 6</xref> for a subset of nodes (first 6 out of 34 nodes) extracted from the graph forming the shape model and over the 800 frames that correspond to a manipulation operation.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>Local displacements of nodes encoding the object shape. Three-dimensional coordinates of nodes obtained by the GNG models during the manipulation of the small sponge object.</p></caption>
<graphic xlink:href="frobt-07-600584-g0006.tif"/>
</fig>
<p>The continual models (C-GNG and BC-GNG) clearly produce more stable signals. An interesting property of BC-GNG is the low-pass filter effect that is observed in the signals. This behavior occurs due to the characteristic of the algorithm that uses the average of the nodes position during the update process. Therefore, the node displacements obtained by BC-GNG are much smoother, and desirable to estimate with confidence the motion and deformation quantities of a non-rigid object as its shape is not dominated by noise associated with the individual dynamics of nodes forming the graph-based representation.</p>
</sec>
<sec>
<title>4.3.3. Region Correspondence</title>
<p>Finally, we evaluate the performance of different variations of GNG to produce node displacements (<xref ref-type="fig" rid="F6">Figure 6</xref>) that can be used as features of the object motion. The region correspondence of nodes position is non-existent in GNG due to the stochastic nature of the algorithm, which causes that new nodes are not created around the same location. For C-GNG, the displacements exhibit a localized motion of nodes position that preserve certain regions of the shape. However, there still exists some interference between nodes which causes unrecoverable positions and affects the region correspondence of the representation, more noticeable when large deformations occur. For BC-GNG, these displacements reflect a more localized motion of nodes position, even further to that observed in C-GNG. Interference between nodes is not causing strong deviations in their displacements, hence better preserving their correspondence throughout the manipulation task.</p>
</sec>
</sec>
<sec>
<title>4.4. Deformation Dynamics Prediction With GNN-Based Models</title>
<p>These experiments are run on a cloud instance with 1 &#x000D7; Intel Xeon Processor &#x00040; 2.3GHz, 1 &#x000D7; NVIDIA Tesla K80 GPU with 2,496 CUDA cores, 12 GB RAM GDDR5 VRAM, and GNU/Linux operating system. The training procedure of the GNN models consists of 20 iterations, and using a batch size of 1. The MLP modules are trained using the Adam optimizer (Kingma and Ba, <xref ref-type="bibr" rid="B15">2015</xref>) with learning rate of 0.001 and momentums of 0.9 and 0.9999. A learning rate scheduler with factor of 0.8 and patience 3 is used.</p>
<p>The architecture design of the GNN models follows the configuration presented in Li et al. (<xref ref-type="bibr" rid="B19">2019a</xref>). This configuration is shared among models in order to consistently compare their performances, hence the main difference is the propagation step parameter L, which is 1 for IN and 2 for PropNet. In this way, the encoder functions <inline-formula><mml:math id="M35"><mml:msubsup><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">enc</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup></mml:math></inline-formula>, <inline-formula><mml:math id="M36"><mml:msubsup><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">enc</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup></mml:math></inline-formula> are 2-layer MLPs with hidden sizes of 200 and 300, with output size of 200. The update functions &#x003D5;<sub><italic>R</italic></sub>, &#x003D5;<sub><italic>O</italic></sub> are 1-layer MLPs with hidden size of 200, and output size of 200. And, the decoder function <inline-formula><mml:math id="M37"><mml:msubsup><mml:mrow><mml:mi>&#x003D5;</mml:mi></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow><mml:mrow><mml:mstyle class="text"><mml:mtext class="textrm" mathvariant="normal">dec</mml:mtext></mml:mstyle></mml:mrow></mml:msubsup></mml:math></inline-formula> is 2-layer MLP with hidden size of 200 and output size of 3, the latter corresponds to the components of the predicted velocity. All MLP modules use the rectified linear unit (ReLU) as the activation function. A dataset of graphs per object is created and consists of 20 files, each associated to a different fingers trajectory. This produces 16,000 samples in total. The dataset is divided into 80% for training, 10% for validation and 10% for test, which is equivalent to 16 trajectories (12,800 randomly shuffled samples) and 2 &#x000D7; 2 trajectories (2 &#x000D7; 1,600 samples) respectively. In addition, the dataset is normalized between 0 and 1 due to the varied scales of the position and velocity features.</p>
<p>The GNN models are primarily analyzed in two situations: first evaluating the performance of the predictions for the object deformation in single-step time sequences, and then evaluating the ability to generalize over multi-step time sequences. In order to enable a more direct interpretation of the results, the Root Mean Square Error (RMSE) of the predicted and observed nodes position is used as a metric. Thus, the nodes position of the next frame <inline-formula><mml:math id="M38"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> are calculated via explicit integration of the equation of motion (Equation 9), which uses the predicted velocities of the next frame <inline-formula><mml:math id="M39"><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mover accent="true"><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> and time per frame &#x00394;<italic>t</italic> to update the current position of each node.</p>
<disp-formula id="E9"><label>(9)</label><mml:math id="M40"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mover accent="true"><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold-italic"><mml:mi>x</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x00394;</mml:mi><mml:mi>t</mml:mi><mml:mo>&#x000B7;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mover accent="true"><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p><italic>Single-Step Predictions:</italic> The nodes position are predicted from the most recent observed data at each frame (t &#x0002B; 1). The GNN models obtain a relatively low and consistent error (<xref ref-type="table" rid="T1">Table 1</xref>) of the nodes position throughout the entire range of acquired data frames over the object manipulation duration. These results confirm a stable prediction capability, one step ahead, with the GNN models, as shown on the left of <xref ref-type="fig" rid="F7">Figure 7</xref>.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Prediction error of nodes position.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>GNN models</bold></th>
<th valign="top" align="center" colspan="3" style="border-bottom: thin solid #000000;"><bold>Frame steps</bold></th>
</tr>
<tr>
<th/>
<th valign="top" align="center"><bold>t &#x0002B; 1</bold></th>
<th valign="top" align="center"><bold>t &#x0002B; 5</bold></th>
<th valign="top" align="center"><bold>t &#x0002B; 50</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">IN</td>
<td valign="top" align="center">0.08 &#x000B1; 0.02</td>
<td valign="top" align="center">9.52 &#x000B1; 7.14</td>
<td valign="top" align="center">46.28 &#x000B1; 30.53</td>
</tr>
<tr>
<td valign="top" align="left">PropNet</td>
<td valign="top" align="center">0.08 &#x000B1; 0.02</td>
<td valign="top" align="center">9.53 &#x000B1; 7.14</td>
<td valign="top" align="center">53.66 &#x000B1; 37.08</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>RMSE (10<sup>-5</sup>) values in meters obtained by the GNN models at different time steps during the manipulation of the small sponge</italic>.</p>
</table-wrap-foot>
</table-wrap>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>Qualitative results of the shape prediction method. Graph sequences predicted by the GNN model (PropNet) over different time horizons for the small sponge with three contact points (nodes in red-green-blue). The predicted shape corresponds to the nodes position produced by the model. The observed nodes position are displayed in shaded color.</p></caption>
<graphic xlink:href="frobt-07-600584-g0007.tif"/>
</fig>
<p><italic>Multi-Step Predictions:</italic> The nodes position are predicted for every frame but with updates from observed data fed into the model at different frames (t &#x0003E; 1), which involves a longer-term prediction before new data is made available to the GNN models. The error produced by the models remains relatively low (<xref ref-type="table" rid="T1">Table 1</xref>) over a short range of frames (t &#x0002B; 5), but progressively degrades as the number of frames further increases (t &#x0002B; 50). As a consequence, at some point the models become unable to predict with confidence the nodes position, as shown on the right of <xref ref-type="fig" rid="F7">Figure 7</xref>. The errors from previous iterations cumulate and the prediction diverges, causing the deformable object prediction to enter an unrecoverable state.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s5">
<title>5. Discussion</title>
<sec>
<title>5.1. Shape Estimation Quality</title>
<p>All variants of GNG studied in this research produce a graph sequence that estimates the object shape. However, regardless of maintaining consistency during model training (i.e., shared parameters and stopping criterion), the proposed BC-GNG model performs better in terms of computing time and motion estimation, as demonstrated experimentally in section 4.3. Consider the data frames (<xref ref-type="fig" rid="F4">Figure 4</xref>) where the largest deformation occurs (around <italic>t</italic> &#x0003D; 400). The areas on the shape where the object is compressed more (e.g., around the center and vertices) show a higher and more natural accumulation of nodes. Also, the estimation obtained when no interaction occurs between the fingers and the small sponge (around <italic>t</italic> &#x0003D; 800) produces a more symmetric node density that better resembles the object topology. These characteristics are also observed in the other deformable objects considered in these experiments.</p>
<p>We also observed that BC-GNG still exhibits some difficulty to recover the initial node position for elastic objects. Unlike C-GNG, such variations do not manifest as abrupt changes in the signal due to the smoother characteristic of the displacements. This behavior is more desirable since abrupt changes are directly associated with large deformations, which on the contrary do not correspond to the reality of what the object is experiencing. In particular, local displacements of large volumetric objects are more affected. These might be related to occlusions causing correspondence problems by further reducing the amount of points reported by the sensor when the object is manipulated.</p>
</sec>
<sec>
<title>5.2. Shape Prediction Reliability</title>
<p>The main advantage of combining a GNN predictive model (IN and PropNet) with a self-organizing model (BC-GNG) is the fact that the training data generated by the latter are dynamic graphs with efficient size. As noted in Li et al. (<xref ref-type="bibr" rid="B19">2019a</xref>), training GNNs with large static graphs may overload memory capacity and delay convergence. Furthermore, such models do not perform well in dynamical settings due to unnecessary interactions associated to a fully connected graph topology. The proposed combination of models contributes to overcome these important constraints, which can be detrimental to successful robotic manipulation of deformable objects. Thus, the GNN models trained in combination with the BC-GNG graphs effectively capture the immediate changes of the object shape when evaluated in single-step, or short-term, time sequences and demonstrate potential to produce robust and visually plausible predictions of the deformation dynamics. On the other hand, while their performance tends to degrade over longer term predictions, anticipating an object&#x00027;s shape deformation a few steps ahead is representative of what human beings can realistically achieve, and generally proves sufficient for robotic manipulation supported by modern RGB-D sensors that can now capture point clouds in real-time. Given that the modeling and prediction framework is meant to be part of the robotic hand control loop, new RGB-D data is made available to update the deformable object representation, and provide an updated prediction, at the same frame rate as the robot controller. As a result, long-term prediction is not of essence in this type of application. According to the configuration used, we also notice that the performance of the GNN models are very similar. Although, the latter could be affected by the fact that PropNet shows faster convergence in training than IN due to the multi-step propagation phase.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="s6">
<title>6. Conclusion</title>
<p>This paper presents a first attempt at using graph models to learn the dynamics of deformable objects entirely from RGB-D sensor measurements. The proposed BC-GNG formulation improves the performance over C-GNG by producing graphs with better node stability, correspondence in regions with shape variations and lower computational cost. These properties enable to combine other graph models such as GNNs to predict the deformation dynamics of non-rigid objects.</p>
<p>By combining the relational structure of self-organizing and graph neural networks, the proposed approach successfully captures the object shape and predicts the deformation dynamics when evaluated over single-step or short-term time sequences. In comparison to analytical models, execution time is faster and information on the shape and physical properties of the object does not need to be known or approximated a priori. Therefore, the proposed combination of graph models and their adaptation demonstrate strong potential for characterizing deformable objects&#x00027; shape and dynamics, as required to support advanced dexterous robotic manipulation.</p>
</sec>
<sec sec-type="data-availability-statement" id="s7">
<title>Data Availability Statement</title>
<p>The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.</p>
</sec>
<sec id="s8">
<title>Author Contributions</title>
<p>Both authors contributed to the overall conception of the methodology, experimentation, analysis and manuscript writing.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<sec sec-type="supplementary-material" id="s9">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/frobt.2020.600584/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/frobt.2020.600584/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.PDF" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Arriola-Rios</surname> <given-names>V. E.</given-names></name> <name><surname>Wyatt</surname> <given-names>J. L.</given-names></name></person-group> (<year>2017</year>). <article-title>A multimodal model of object deformation under robotic pushing</article-title>. <source>IEEE Trans. Cogn. Dev. Syst</source>. <volume>9</volume>, <fpage>153</fpage>&#x02013;<lpage>169</lpage>. <pub-id pub-id-type="doi">10.1109/TCDS.2017.2664058</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Battaglia</surname> <given-names>P.</given-names></name> <name><surname>Pascanu</surname> <given-names>R.</given-names></name> <name><surname>Lai</surname> <given-names>M.</given-names></name> <name><surname>Jimenez Rezende</surname> <given-names>D.</given-names></name> <name><surname>Kavukcuoglu</surname> <given-names>K.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Interaction networks for learning about objects, relations and physics,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems 29</source>, eds D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (<publisher-loc>Barcelona</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>), <fpage>4502</fpage>&#x02013;<lpage>4510</lpage>.</citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Billard</surname> <given-names>A.</given-names></name> <name><surname>Kragic</surname> <given-names>D.</given-names></name></person-group> (<year>2019</year>). <article-title>Trends and challenges in robot manipulation</article-title>. <source>Science</source> <volume>364</volume>:<fpage>eaat8414</fpage>. <pub-id pub-id-type="doi">10.1126/science.aat8414</pub-id><pub-id pub-id-type="pmid">31221831</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Bradski</surname> <given-names>G.</given-names></name></person-group> (<year>2000</year>). <article-title>The OpenCV Library</article-title>. <source>Dr Dobbs J. Softw.Tools</source> <volume>25</volume>, <fpage>120</fpage>&#x02013;<lpage>125</lpage>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://github.com/opencv/opencv/wiki/CiteOpenCV">https://github.com/opencv/opencv/wiki/CiteOpenCV</ext-link></citation></ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Caccamo</surname> <given-names>S.</given-names></name> <name><surname>G&#x000FC;ler</surname> <given-names>P.</given-names></name> <name><surname>Kjellstr&#x000F6;m</surname> <given-names>H.</given-names></name> <name><surname>Kragic</surname> <given-names>D.</given-names></name></person-group> (<year>2016</year>). <article-title>&#x0201C;Active perception and modeling of deformable surfaces using Gaussian processes and position-based dynamics,&#x0201D;</article-title> in <source>2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids)</source> (<publisher-loc>Cancun</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>530</fpage>&#x02013;<lpage>537</lpage>. <pub-id pub-id-type="doi">10.1109/HUMANOIDS.2016.7803326</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cretu</surname> <given-names>A.</given-names></name> <name><surname>Payeur</surname> <given-names>P.</given-names></name> <name><surname>Petriu</surname> <given-names>E. M.</given-names></name></person-group> (<year>2012</year>). <article-title>Soft object deformation monitoring and learning for model-based robotic hand manipulation</article-title>. <source>IEEE Trans. Syst. Man Cybernet. B</source> <volume>42</volume>, <fpage>740</fpage>&#x02013;<lpage>753</lpage>. <pub-id pub-id-type="doi">10.1109/TSMCB.2011.2176115</pub-id><pub-id pub-id-type="pmid">22207640</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Duenser</surname> <given-names>S.</given-names></name> <name><surname>Bern</surname> <given-names>J. M.</given-names></name> <name><surname>Poranne</surname> <given-names>R.</given-names></name> <name><surname>Coros</surname> <given-names>S.</given-names></name></person-group> (<year>2018</year>). <article-title>&#x0201C;Interactive robotic manipulation of elastic objects,&#x0201D;</article-title> in <source>2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)</source> (<publisher-loc>Madrid</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>3476</fpage>&#x02013;<lpage>3481</lpage>. <pub-id pub-id-type="doi">10.1109/IROS.2018.8594291</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Frank</surname> <given-names>B.</given-names></name> <name><surname>Stachniss</surname> <given-names>C.</given-names></name> <name><surname>Schmedding</surname> <given-names>R.</given-names></name> <name><surname>Teschner</surname> <given-names>M.</given-names></name> <name><surname>Burgard</surname> <given-names>W.</given-names></name></person-group> (<year>2014</year>). <article-title>Learning object deformation models for robot motion planning</article-title>. <source>Robot. Auton. Syst</source>. <volume>62</volume>, <fpage>1153</fpage>&#x02013;<lpage>1174</lpage>. <pub-id pub-id-type="doi">10.1016/j.robot.2014.04.005</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Fritzke</surname> <given-names>B.</given-names></name></person-group> (<year>1995</year>). <article-title>&#x0201C;A growing neural gas network learns topologies,&#x0201D;</article-title> in <source>Proceedings of the 7th International Conference on Neural Information Processing Systems (NIPS&#x00027;94)</source> (<publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>), <fpage>625</fpage>&#x02013;<lpage>632</lpage>.</citation></ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Fugl</surname> <given-names>A. R.</given-names></name> <name><surname>Jordt</surname> <given-names>A.</given-names></name> <name><surname>Petersen</surname> <given-names>H. G.</given-names></name> <name><surname>Willatzen</surname> <given-names>M.</given-names></name> <name><surname>Koch</surname> <given-names>R.</given-names></name></person-group> (<year>2012</year>). <article-title>&#x0201C;Simultaneous estimation of material properties and pose for deformable objects from depth and color images,&#x0201D;</article-title> in <source>Pattern Recognition</source>, eds A. Pinz, T. Pock, H. Bischof, and F. Leberl (<publisher-loc>Berlin; Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>165</fpage>&#x02013;<lpage>174</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-642-32717-9_17</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gilmer</surname> <given-names>J.</given-names></name> <name><surname>Schoenholz</surname> <given-names>S. S.</given-names></name> <name><surname>Riley</surname> <given-names>P. F.</given-names></name> <name><surname>Vinyals</surname> <given-names>O.</given-names></name> <name><surname>Dahl</surname> <given-names>G. E.</given-names></name></person-group> (<year>2017</year>). <article-title>&#x0201C;Neural message passing for quantum chemistry,&#x0201D;</article-title> in <source>International Conference on Machine Learning</source> (<publisher-loc>Sydney, NSW</publisher-loc>: <publisher-name>PMLR</publisher-name>), <fpage>1263</fpage>&#x02013;<lpage>1272</lpage>.</citation></ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>G&#x000FC;ler</surname> <given-names>P.</given-names></name> <name><surname>Pauwels</surname> <given-names>K.</given-names></name> <name><surname>Pieropan</surname> <given-names>A.</given-names></name> <name><surname>Kjellstr&#x000F6;m</surname> <given-names>H.</given-names></name> <name><surname>Kragic</surname> <given-names>D.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Estimating the deformability of elastic materials using optical flow and position-based dynamics,&#x0201D;</article-title> in <source>2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids)</source> (<publisher-loc>Seoul</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>965</fpage>&#x02013;<lpage>971</lpage>. <pub-id pub-id-type="doi">10.1109/HUMANOIDS.2015.7363486</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jia</surname> <given-names>Y.-B.</given-names></name> <name><surname>Guo</surname> <given-names>F.</given-names></name> <name><surname>Lin</surname> <given-names>H.</given-names></name></person-group> (<year>2014</year>). <article-title>Grasping deformable planar objects: squeeze, stick/slip analysis, and energy-based optimalities</article-title>. <source>Int. J. Robot. Res</source>. <volume>33</volume>, <fpage>866</fpage>&#x02013;<lpage>897</lpage>. <pub-id pub-id-type="doi">10.1177/0278364913512170</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Khalil</surname> <given-names>F. F.</given-names></name> <name><surname>Curtis</surname> <given-names>P.</given-names></name> <name><surname>Payeur</surname> <given-names>P.</given-names></name></person-group> (<year>2010</year>). <article-title>&#x0201C;Visual monitoring of surface deformations on objects manipulated with a robotic hand,&#x0201D;</article-title> in <source>2010 IEEE International Workshop on Robotic and Sensors Environments</source> (<publisher-loc>Phoenix, AZ</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>6</lpage>. <pub-id pub-id-type="doi">10.1109/ROSE.2010.5675327</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kingma</surname> <given-names>D. P.</given-names></name> <name><surname>Ba</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Adam: a method for stochastic optimization,&#x0201D;</article-title> in <source>3rd International Conference for Learning Representations</source> (<publisher-loc>San Diego, CA</publisher-loc>).</citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lang</surname> <given-names>J.</given-names></name> <name><surname>Pai</surname> <given-names>D. K.</given-names></name> <name><surname>Woodham</surname> <given-names>R. J.</given-names></name></person-group> (<year>2002</year>). <article-title>Acquisition of elastic models for interactive simulation</article-title>. <source>Int. J. Robot. Res</source>. <volume>21</volume>, <fpage>713</fpage>&#x02013;<lpage>733</lpage>. <pub-id pub-id-type="doi">10.1177/027836402761412458</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Leizea</surname> <given-names>I.</given-names></name> <name><surname>&#x000C1;lvarez</surname> <given-names>H.</given-names></name> <name><surname>Aguinaga</surname> <given-names>I.</given-names></name> <name><surname>Borro</surname> <given-names>D.</given-names></name></person-group> (<year>2014</year>). <article-title>&#x0201C;Real-time deformation, registration and tracking of solids based on physical simulation,&#x0201D;</article-title> in <source>2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)</source> (<publisher-loc>Munich</publisher-loc>), <fpage>165</fpage>&#x02013;<lpage>170</lpage>. <pub-id pub-id-type="doi">10.1109/ISMAR.2014.6948423</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Leizea</surname> <given-names>I.</given-names></name> <name><surname>Mendizabal</surname> <given-names>A.</given-names></name> <name><surname>Alvarez</surname> <given-names>H.</given-names></name> <name><surname>Aguinaga</surname> <given-names>I.</given-names></name> <name><surname>Borro</surname> <given-names>D.</given-names></name> <name><surname>Sanchez</surname> <given-names>E.</given-names></name></person-group> (<year>2017</year>). <article-title>Real-time visual tracking of deformable objects in robot-assisted surgery</article-title>. <source>IEEE Comput. Graph. Appl</source>. <volume>37</volume>, <fpage>56</fpage>&#x02013;<lpage>68</lpage>. <pub-id pub-id-type="doi">10.1109/MCG.2015.96</pub-id><pub-id pub-id-type="pmid">26441410</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Wu</surname> <given-names>J.</given-names></name> <name><surname>Tedrake</surname> <given-names>R.</given-names></name> <name><surname>Tenenbaum</surname> <given-names>J. B.</given-names></name> <name><surname>Torralba</surname> <given-names>A.</given-names></name></person-group> (<year>2019a</year>). <article-title>&#x0201C;Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids,&#x0201D;</article-title> in <source>International Conference on Learning Representations</source> (<publisher-loc>New Orleans, LA</publisher-loc>).</citation></ref>
<ref id="B20">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Wu</surname> <given-names>J.</given-names></name> <name><surname>Zhu</surname> <given-names>J.-Y.</given-names></name> <name><surname>Tenenbaum</surname> <given-names>J. B.</given-names></name> <name><surname>Torralba</surname> <given-names>A.</given-names></name> <name><surname>Tedrake</surname> <given-names>R.</given-names></name></person-group> (<year>2019b</year>). <article-title>&#x0201C;Propagation networks for model-based control under partial observation,&#x0201D;</article-title> in <source>2019 International Conference on Robotics and Automation (ICRA)</source> (<publisher-loc>Montreal, QC</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1205</fpage>&#x02013;<lpage>1211</lpage>. <pub-id pub-id-type="doi">10.1109/ICRA.2019.8793509</pub-id><pub-id pub-id-type="pmid">27534393</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Mrowca</surname> <given-names>D.</given-names></name> <name><surname>Zhuang</surname> <given-names>C.</given-names></name> <name><surname>Wang</surname> <given-names>E.</given-names></name> <name><surname>Haber</surname> <given-names>N.</given-names></name> <name><surname>Fei-Fei</surname> <given-names>L. F.</given-names></name> <name><surname>Tenenbaum</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2018</year>). <article-title>&#x0201C;Flexible neural representation for physics prediction,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems 31</source> (<publisher-loc>Montreal, QC</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>), <fpage>8799</fpage>&#x02013;<lpage>8810</lpage>.</citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nadon</surname> <given-names>F.</given-names></name> <name><surname>Valencia</surname> <given-names>A. J.</given-names></name> <name><surname>Payeur</surname> <given-names>P.</given-names></name></person-group> (<year>2018</year>). <article-title>Multi-modal sensing and robotic manipulation of non-rigid objects: a survey</article-title>. <source>Robotics</source> <volume>7</volume>:<fpage>74</fpage>. <pub-id pub-id-type="doi">10.3390/robotics7040074</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nealen</surname> <given-names>A.</given-names></name> <name><surname>M&#x000FC;ller</surname> <given-names>M.</given-names></name> <name><surname>Keiser</surname> <given-names>R.</given-names></name> <name><surname>Boxerman</surname> <given-names>E.</given-names></name> <name><surname>Carlson</surname> <given-names>M.</given-names></name></person-group> (<year>2006</year>). <article-title>Physically based deformable models in computer graphics</article-title>. <source>Comput. Graph. Forum</source> <volume>25</volume>, <fpage>809</fpage>&#x02013;<lpage>836</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-8659.2006.01000.x</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Orts-Escolano</surname> <given-names>S.</given-names></name> <name><surname>Garcia-Rodriguez</surname> <given-names>J.</given-names></name> <name><surname>Morell</surname> <given-names>V.</given-names></name> <name><surname>Cazorla</surname> <given-names>M.</given-names></name> <name><surname>Saval</surname> <given-names>M.</given-names></name> <name><surname>Azorin</surname> <given-names>J.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Processing point cloud sequences with Growing Neural Gas,&#x0201D;</article-title> in <source>2015 International Joint Conference on Neural Networks (IJCNN)</source> (<publisher-loc>Killarney</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1109/IJCNN.2015.7280709</pub-id></citation>
</ref>
<ref id="B25">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Paszke</surname> <given-names>A.</given-names></name> <name><surname>Gross</surname> <given-names>S.</given-names></name> <name><surname>Massa</surname> <given-names>F.</given-names></name> <name><surname>Lerer</surname> <given-names>A.</given-names></name> <name><surname>Bradbury</surname> <given-names>J.</given-names></name> <name><surname>Chanan</surname> <given-names>G.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>&#x0201C;PyTorch: an imperative style, high-performance deep learning library,&#x0201D;</article-title> in <source>Advances in Neural Information Processing Systems</source>, ed H. Wallach, H. Larochelle, A. Beygelzimer, F. Alch&#x000E9;-Buc, E. Fox, and R. Garnett (<publisher-loc>Vancouver, BC</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>), <fpage>8024</fpage>&#x02013;<lpage>8035</lpage>.</citation></ref>
<ref id="B26">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Petit</surname> <given-names>A.</given-names></name> <name><surname>Lippiello</surname> <given-names>V.</given-names></name> <name><surname>Siciliano</surname> <given-names>B.</given-names></name></person-group> (<year>2015</year>). <article-title>&#x0201C;Real-time tracking of 3D elastic objects with an RGB-D sensor,&#x0201D;</article-title> in <source>2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)</source> (<publisher-loc>Hamburg</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>3914</fpage>&#x02013;<lpage>3921</lpage>. <pub-id pub-id-type="doi">10.1109/IROS.2015.7353928</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Quigley</surname> <given-names>M.</given-names></name> <name><surname>Conley</surname> <given-names>K.</given-names></name> <name><surname>Gerkey</surname> <given-names>B. P.</given-names></name> <name><surname>Faust</surname> <given-names>J.</given-names></name> <name><surname>Foote</surname> <given-names>T.</given-names></name> <name><surname>Leibs</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>&#x0201C;ROS: an open-source robot operating system,&#x0201D;</article-title> in <source>ICRA Workshop on Open Source Software</source> (<publisher-loc>Kobe</publisher-loc>), <fpage>5</fpage>.</citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sanchez</surname> <given-names>J.</given-names></name> <name><surname>Corrales</surname> <given-names>J.-A.</given-names></name> <name><surname>Bouzgarrou</surname> <given-names>B.-C.</given-names></name> <name><surname>Mezouar</surname> <given-names>Y.</given-names></name></person-group> (<year>2018</year>). <article-title>Robotic manipulation and sensing of deformable objects in domestic and industrial applications: a survey</article-title>. <source>Int. J. Robot. Res</source>. <volume>37</volume>, <fpage>688</fpage>&#x02013;<lpage>716</lpage>. <pub-id pub-id-type="doi">10.1177/0278364918779698</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sengupta</surname> <given-names>A.</given-names></name> <name><surname>Lagneau</surname> <given-names>R.</given-names></name> <name><surname>Krupa</surname> <given-names>A.</given-names></name> <name><surname>Marchand</surname> <given-names>E.</given-names></name> <name><surname>Marchal</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>&#x0201C;Simultaneous tracking and elasticity parameter estimation of deformable objects,&#x0201D;</article-title> in <source>2020 IEEE International Conference on Robotics and Automation (ICRA)</source> (<publisher-loc>Paris</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>10038</fpage>&#x02013;<lpage>10044</lpage>. <pub-id pub-id-type="doi">10.1109/ICRA40945.2020.9196770</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tawbe</surname> <given-names>B.</given-names></name> <name><surname>Cretu</surname> <given-names>A.-M.</given-names></name></person-group> (<year>2017</year>). <article-title>Acquisition and neural network prediction of 3D deformable object shape using a kinect and a force-torque sensor</article-title>. <source>Sensors</source> <volume>17</volume>:<fpage>1083</fpage>. <pub-id pub-id-type="doi">10.3390/s17051083</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Valencia</surname> <given-names>A. J.</given-names></name> <name><surname>Nadon</surname> <given-names>F.</given-names></name> <name><surname>Payeur</surname> <given-names>P.</given-names></name></person-group> (<year>2019</year>). <article-title>&#x0201C;Toward real-time 3D shape tracking of deformable objects for robotic manipulation and shape control,&#x0201D;</article-title> in <source>2019 IEEE Sensors</source> (<publisher-loc>Montreal, QC</publisher-loc>: <publisher-name>IEEE</publisher-name>), <fpage>1</fpage>&#x02013;<lpage>4</lpage>. <pub-id pub-id-type="doi">10.1109/SENSORS43011.2019.8956623</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Virtanen</surname> <given-names>P.</given-names></name> <name><surname>Gommers</surname> <given-names>R.</given-names></name> <name><surname>Oliphant</surname> <given-names>T. E.</given-names></name> <name><surname>Haberland</surname> <given-names>M.</given-names></name> <name><surname>Reddy</surname> <given-names>T.</given-names></name> <name><surname>Cournapeau</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>SciPy 1.0: fundamental algorithms for scientific computing in Python</article-title>. <source>Nat. Methods</source> <volume>17</volume>, <fpage>261</fpage>&#x02013;<lpage>272</lpage>. <pub-id pub-id-type="doi">10.1038/s41592-019-0686-2</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>M.</given-names></name> <name><surname>Yu</surname> <given-names>L.</given-names></name> <name><surname>Zheng</surname> <given-names>D.</given-names></name> <name><surname>Gan</surname> <given-names>Q.</given-names></name> <name><surname>Gai</surname> <given-names>Y.</given-names></name> <name><surname>Ye</surname> <given-names>Z.</given-names></name> <etal/></person-group>. (<year>2019</year>). <article-title>&#x0201C;Deep graph library: towards efficient and scalable deep learning on graphs,&#x0201D;</article-title> in <source>ICLR Workshop on Representation Learning on Graphs and Manifolds</source> (<publisher-loc>New Orleans, LA</publisher-loc>).<pub-id pub-id-type="pmid">27534393</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zaidi</surname> <given-names>L.</given-names></name> <name><surname>Corrales</surname> <given-names>J. A.</given-names></name> <name><surname>Bouzgarrou</surname> <given-names>B. C.</given-names></name> <name><surname>Mezouar</surname> <given-names>Y.</given-names></name> <name><surname>Sabourin</surname> <given-names>L.</given-names></name></person-group> (<year>2017</year>). <article-title>Model-based strategy for grasping 3D deformable objects using a multi-fingered robotic hand</article-title>. <source>Robot. Auton. Syst</source>. <volume>95</volume>, <fpage>196</fpage>&#x02013;<lpage>206</lpage>. <pub-id pub-id-type="doi">10.1016/j.robot.2017.06.011</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup><ext-link ext-link-type="uri" xlink:href="https://advanced.barrett.com/barretthand">https://advanced.barrett.com/barretthand</ext-link></p></fn>
<fn id="fn0002"><p><sup>2</sup><ext-link ext-link-type="uri" xlink:href="https://www.intelrealsense.com/depth-camera-sr305">https://www.intelrealsense.com/depth-camera-sr305</ext-link></p></fn>
</fn-group>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> The authors wish to acknowledged financial support to this research from the Natural Sciences and Engineering Research Council of Canada (NSERC) under research grant &#x00023;RGPIN-2015-05328, the Canada Foundation for Innovation (CFI), and CALDO-SENESCYT scholars program.</p></fn>
</fn-group>
</back>
</article>