A Recurrent Neural-Network-Based Real-Time Dynamic Model for Soft Continuum Manipulators

This paper introduces and validates a real-time dynamic predictive model based on a neural network approach for soft continuum manipulators. The presented model provides a real-time prediction framework using neural-network-based strategies and continuum mechanics principles. A time-space integration scheme is employed to discretize the continuous dynamics and decouple the dynamic equations for translation and rotation for each node of a soft continuum manipulator. Then the resulting architecture is used to develop distributed prediction algorithms using recurrent neural networks. The proposed RNN-based parallel predictive scheme does not rely on computationally intensive algorithms; therefore, it is useful in real-time applications. Furthermore, simulations are shown to illustrate the approach performance on soft continuum elastica, and the approach is also validated through an experiment on a magnetically-actuated soft continuum manipulator. The results demonstrate that the presented model can outperform classical modeling approaches such as the Cosserat rod model while also shows possibilities for being used in practice.


INTRODUCTION
Soft continuum manipulators are flexible and highly deformable robots composed of soft and mostly elastic materials, and can serve as possible substitutes for rigid robots. Advantages of soft manipulator robots such as their compliance, dexterity, and adaptability to complex workspaces are driving the emergent research in this field. By contrast, rigidity of traditional rigid robots limits their use in constrained and confined environments, and reduces the possibilities for safe interaction with humans. Soft continuum manipulators have found applications in many areas, such as dexterous grasping (McMahan et al., 2006;Katzschmann et al., 2015) and assistive devices (Ansari et al., 2017), and particularly in the field of minimally invasive surgeries, such as laryngeal surgery (Simaan et al., 2004), catheter-based endovascular intervention (Grady et al., 2000;Burgner et al., 2013), and cardiovascular surgery (Kesner and Howe, 2011).
Analytical modeling of soft manipulators helps evaluate their motion and determine their workspace, in order to be used for control, motion planning, and animation purposes. Soft manipulators distinguish themselves by having an infinite number of degrees of freedom in any workspace they occupy. This characterization makes modeling complicated for soft manipulators. Several approaches have been investigated thus far in the literature. Most of the approaches consider the kinematic (i.e. static or quasi-static) modeling of the manipulators such as static analysis using virtual-work model (Xu and Simaan, 2010), Cosserat rod theory (Pai, 2002;Jones et al., 2009;Mahvash and Dupont, 2011), and α Lie group formulation (Grazioso et al., 2019). These models do not describe full dynamics of the manipulators, and they may show performance degradation when it comes to high-frequency applications or large and complex deformations. On the other hand, dynamical modeling approaches [e.g. Wen et al. (2012), Jung et al. (2014), Hyatt et al. (2019), Sadati et al. (2019), Till et al. (2019), Tariverdi et al. (2020)], contain dynamics of the manipulators and also take into account time-varying responses of manipulators, including high-frequency modes. However, the dynamic models mostly rely on traditional methods, such as finite elements and finite differences (i.e., quantitative and numerical methods), making the algorithms computationally expensive for real-time applications. In other words, to obtain sufficiently accurate solutions, methods need to deal with fine meshes, which increase memory use and computation time. Another limitation is that their solutions are discrete or not sufficiently differentiable. It is worth noting that in model-based controllers or observers, having a differentiable solution (i.e., a solution that can be evaluated continuously on the workspace) is crucial in the design process. Furthermore, when softer materials are employed for manipulator construction with more complex geometries or large deformations, modeling their behavior analytically becomes challenging. Therefore, there is a need for appropriate datadriven approaches without compromising computational bandwidths and the prediction quality.
Dynamics of soft continuum manipulators have highly nonlinear behavior and are expressed as Partial Differential Equations (PDEs). An effective approach to represent and model PDEs solutions is to use Neural Networks (NN). NNbased solutions of PDEs are infinitely differentiable by eliminating the need for interpolation. Furthermore, compared to finite elements or difference methods, solutions are represented by fewer parameters, which reduces the memory use. There are studies that use machine learning algorithms to find a solution for special types of PDEs such as (Lagaris et al., 1998;Lee and Kang, 1990;Weinan et al., 2017;Raissi et al., 2019). However, to the authors' best knowledge, there is no study that investigates possible NN-based solutions for partial differential equations that describe the full dynamics of continuum manipulators. In this work, inspired by a time-space integration scheme and by using the Lie group variational integration method (Demoures et al., 2015), the dynamic equations for translation and rotation for each node of a soft continuum manipulator are decoupled, providing an appropriate structure aimed at developing a real-time modeling algorithm. Afterward, Recurrent Neural Networks (RNNs)-based models are employed to approximate the high-dimensional discretized equations. Additionally, external torques and forces (e.g., control inputs, friction, and gravity) are incorporated into the model in a real-time manner for control applications.
The ability of RNNs to learn and approximate large classes of nonlinear functions over sequences of inputs accurately makes them prime candidates for use in dynamic modeling of complex nonlinear systems. RNNs with Long Short-Term Memory (LSTM) layers process sequences by iterating through the sequence elements. Using an internal feedback, the network is capable of preserving long-term dependencies. Essentially, LSTM layers prevent older information from gradually vanishing. These networks also have been used for several applications in soft robotics. To name a few, Thuruthel in Thuruthel et al. (2019) proposes a model-free, real-time sensing method for soft robots perception. The authors in Thuruthel et al. (2017) uses RNNs to model and control soft robotic manipulators. Also, force and motion estimation using RNNs has been investigated in Marban et al. (2019) and Turan et al. (2018), respectively. This paper aims to develop a real-time dynamic model for analyzing the dynamics of soft manipulators. Investigation of previous work on the modeling of the continuum manipulators suggests that existing literature focuses primarily on static or quasi-static approaches, or does not provide a real-time model. The contribution of this article is to present a scalable, parallel and real-time modeling algorithm for soft manipulators dynamics. The contributions of this paper are as follows.
• Existing approaches primarily deal with kinematic modeling methods. Nevertheless, in this study, real-time prediction of soft manipulators full spatial dynamics is considered in the proposed RNN-based algorithm by proposing multiple light-weight RNN-based models. • In traditional modeling approaches, there are no systematic methods to obtain knowledge about dissipation forces, in particular friction, in the modeling procedure. The presented algorithm intrinsically takes the dissipation forces into account and incorporates their effects into the model. • Through an experiment, results of the proposed RNN-based model and Cosserat rod theory method are compared, revealing the practical effectiveness of the proposed methodology.
The remainder of this paper is organized as follows: the problem statement is given in Section 2. Section 3 describes the proposed RNN-based algorithm in details. In Section 4 and Section 5, different simulations and experimental validation are presented to demonstrate the efficacy of the proposed RNN-based method, in terms of the model performances and accurately predicting poses of manipulators. Finally, the main conclusions are stated in Section 7. Tariverdi et al. (2020) and Demoures et al. (2015)] in the PDEs form as where M ρ × A (ρ and A are the manipulator constant mass density and its cross-section area), ω ∈ R 3 is the manipulator's angular velocity, H ∈ R 3×3 is the manipulator's inertia matrix, ϕ ∈ R 3 is the position of the manipulator's line of centroids in its workspace, Λ ∈ SO(3) denotes the orientation of moving crosssections at point ϕ. Also, n ∈ R 3 and m ∈ R 3 are the stresses and momenta along the manipulator, f c ∈ R 3 represents conservative forces (e.g. gravity). Furthermore, (·) x , (·) t , and (·) tt denote partial derivatives with respect to position, time, and the second partial derivative with respect to time, respectively. Finally, f ∈ R 3 and τ ∈ R 3 are non-conservative forces and torques (e.g., frictions and control inputs) 1 . Although high fidelity models given in the references can describe continuum manipulators dynamics efficiently, they suffer from limitations that are discussed in Section 1. Inspired by the structure and formulation of the dynamics based on the Lie group variational integration scheme, the aim is to propose distributed deep recurrent neural networks to capture and simulate soft manipulators dynamics in real-time to be able control them more accurately than existing models.

PROPOSED RNN-BASED MODEL
This section is devoted to develop a model based on the time series prediction using RNNs. To solve PDEs numerically using NNs, one approach is to utilize discrete solutions of finite element or difference methods to train an NN. A Lie group variational time integration model is employed to discretize the continuous dynamics of a soft manipulator 2 . The whole manipulator is discretized into an arbitrary number of nodes where the position and orientation equations of each node are decoupled. In our study, we discretize the manipulator with equidistant nodes, but this can be changed depending on the application. Figure 1 demonstrates a soft continuum manipulator at time t where x * is the undeformed length of Node n − 1. The force F(x p , t), torque τ(x p , t) are applied to Node n − 1 at the position ϕ(x p , t). Also, Λ(x p , t) is the orientation matrix from the frame {O} to the frame {O n−1 t } attached to the cross-section of Node n − 1.
The discrete equations suggest an appropriate structure for the RNNs-based model. Given time-sequence inputs (as a first input layer), i.e., poses (positions and orientations) of nodes, and also forces and torques (as a second input layer) applied to each node, the RNN-based model of Node n is depicted in Figure 2A,B. For Node n, the first input layer is a time-sequence series of poses p n−1 , p n , and p n+1 (i.e., poses of Node n and its adjacent nodes n − 1 and n + 1) and the second input layer includes forces and torques of node n at time t, i.e., [F n t , τ n t ] T which are incorporated into the model through dense layers.
The network takes specific size vectors as inputs, which are called input layers. The inputs are transformed through a series of hidden layers (LSTM, dense, or fully connected layers) to produce an output. The output vectors are called an output layer. Dense or fully connected layers perform linear operations (i.e., multiplication and summation) on their inputs. Furthermore, LSTM layers consist of LSTM units, which can process sequences of data of any length, for example, poses of nodes. An LSTM unit controls contributions of each element of the input layer in the output and keeps track of the dependencies between the elements (Hochreiter and Schmidhuber, 1997).
For the training process, data-sets contain time-sequence inputs and forces and torques applied to each node. Also, for each node, the poses of the node and its neighbors are considered features, as shown in Figure 2A,B. The first and second input layers proceed through LSTM layers and dense layers as hidden layers, respectively. Finally, output layers have resulted from fully connected layers.
By augmenting the given models for all nodes (see Figure 2B) as a series, the proposed RNN-based models of the whole continuum manipulator with N nodes with non-conservative forces and torques are depicted in Figure 2C. Output of every node is updated at each time step by using a history (at least two previous time steps) of neighboring node outputs. Therefore, the proposed architecture suggests a suitable framework to construct a parallel modeling algorithm.
FIGURE 1 | A soft manipulator at time t with discretization nodes n and n − 1 are shown. ϕ(x * , t) and Λ(x p , t) denote the position and the orientation of cross-section of Node n − 1, respectively. In addition, the force F(x * , t), torque τ(x * , t), and the conservative force f c (e.g., gravity) are applied to Node n − 1 at the position ϕ(x * , t).
In this section, we consider different examples and evaluate the performance of the proposed RNN-based model in Figure 2C. It is worth mentioning that data-sets play a crucial role in efficiency and accuracy in machine learning-based algorithms. The data acquisition process from a robot in real-world environments is both time and cost-consuming (implementation of multiple sensors, data filtering, and fusion, etc.). As an alternative approach, the required data can be acquired through simulations of high fidelity models. The obtained data can thus be transferred to train the algorithms to be implemented in real-world scenarios. In this section and for the presented examples, required data for the training of the proposed RNNbased model are acquired through simulations of the algorithm presented in Tariverdi et al. (2020), Sec. 2). For clarity, this model Poses of Nodes n − 1, n, and n + 1 are the input layer and no forces or torques are applied to the node. (B): The first input layer is composed of poses p n−1 , p n , and p n+1 at time history horizon [t − η, t − η + 1, . . . , t] and the second input layer includes forces and torques [F n t , τ n t ] T which are incorporated into the model through the Hidden Layers II (dense and flatten layers). (C): Proposed RNN-based models of the continuum manipulator with N nodes including Input, Hidden, and Output Layers. A history of each node output is used as an input for adjacent nodes. Nodes poses (Input Layer I) and forces and torques (Input Layer II) through the Hidden layers I and II are proceed and concatenated together.
Frontiers in Robotics and AI | www.frontiersin.org March 2021 | Volume 8 | Article 631303 is henceforth referred to as the analytical dynamic model. In addition, since thin rods are considered in the examples, orientations of cross-sections are not of any concern. Also, it should be noted that orientations, except the twisting angle, can be reconstructed from manipulators' configuration. Therefore, to obtain a computationally light model, the focus of attention is only on the prediction of positions.

First Simulation: An Ellipse Without External Wrenches
As a first case, a cylindrical rod is bent into a circle and its ends are attached to one another. The rod is then deformed into an elliptical shape and released. Due to potential energies in the ellipse, it starts to move without any external disturbances. The goal is to model the behavior of the ellipse resultant from its internal elastic energy. The ellipse is formed in the xy-plane with the width 0.2 m and height 0.6 m. As boundary conditions, the first and last nodes are fixed to the origin and their orientations are set to R y (18.07°) and R y (341.92°), respectively, where R y (θ) denotes a rotation matrix describing a rotation around the y-axis by θ degrees. The rod properties, simulation parameters, and the structure of the proposed RNN-based model are given in Figure 3A,B. Furthermore, the initial and a few time-evolved configurations are shown in Figure 3C. As seen, the ellipse oscillates back and forth due to its internal elastic energy. Since orientations except the twisting angle can be reconstructed from the configuration of the manipulator, to have a light model and for brevity, positions of the node located at (−0.01, −0.59)-Node 30th-are predicted. The chosen node is the furthest from the origin and would, compared to other nodes, most likely have the largest errors. 50,001 position samples are generated from the analytical model for each node. We augment 1-by-3 position vectors of Node 30th and its adjacent nodes (Nodes 29th and 31st) at each time step. Therefore, the augmentation results in a 1-by-9 vector. Furthermore, the size of history horizons is chosen to be 2. In other words, η 1 in Figure 2A,B. Finally, augmented 2-by-9 tensors are obtained for each time step. The prepared data-set is called Data-set I and 60 percent of it is used for training process.
The architecture in Figure 3A shows the input layer consists of tensors of size 2 × 9. The first dimension of all layers are reserved for batch sizes and for the training, the batch size 1 was chosen. In the architecture of the model in Figure 3A, the Input, Hidden and Output Layers I together with the number of nodes and type of layers are demonstrated according to Figure 2A.
First, we evaluate the model by using unseen data samples in Dataset I and the results are shown in Figure 3D. The maximum and mean absolute error are (1.57 mm, 0.27 mm), (2.27 mm, 0.46 mm), and (0.23 mm, 0.06 mm), or in other words, the percentage of the maximum errors with respect to the length of the manipulator are 0.11, 0.17, and 0.02 in the x, y, and z-axes, respectively. It is worth mentioning that it is prior knowledge that the manipulator does not have any motion in the z-axis and therefore, components of the z-axis in position vectors can be ignored. Furthermore, the evaluation Root-Mean-Square (RMS) errors of the considered node in all axes at every time step is calculated by where predicted positions [x p (t), y p (t), z p (t)] T obtained from the proposed RNN-based model and measurement positions [x m (t), y m (t), z m (t)] T in Data-set I and the results are shown in Figure 3E.
To demonstrate that the model can be extended to different boundary and initial conditions, the cylindrical rod is employed to form a horizontal ellipse with the width 0.6 m and height 0.2 m. The rod properties and the simulation parameters given in Figure 3B are used. As boundary conditions, the first and last nodes are attached to the origin and their orientations are set to the identity. The manipulator with the new boundary and initial conditions is only used for the evaluation of the trained model by predicting positions of the node located at (−0.01, −0.19). Based on the prediction, the maximum and mean absolute errors are (26.33 mm, 3.37 mm), (21.71 mm, 3.70 mm), and (4.92 mm, 4.22 mm) in the x, y, and z-axes, respectively. Furthermore, the maximum/worst-case errors with respect to the length of the manipulator are 1.97%, 1.62%, and 0.37% in each axis, respectively.
Let us assume that the analytical dynamic model is implemented in a parallel scheme, i.e., each node of 59 nodes is handled with a CPU core or different hardware such that there is no latency in communications. Then, the dynamics of each node can be solved in 1.62 × 10 − 4 s on average. In addition, to preserve the convergence of the solver of the analytical dynamic model, the maximum constant time step for this simulation is 10 − 3 s. A minimum criterion to have a real-time performance is that the time required to solve each node dynamics must be less than the constant step simulation. To be more specific, to have a real-time model, the CPU time, i.e., the amount of time spent in a user code must be less than Wall-clock time that measures the time elapsed to run a user code. According to this minimal criterion, as long as the computation-time for simulation of a method/model is less than a user-defined time for the simulation, the model is called a real-time model. It can be shown that in this example and based on the given assumption, the maximum bandwidth for a real-time performance is 3.93 Hz on average (calculation is done on a 16 GB, 1.99 GHz Intel i7 machine running windows 10). It should be pointed out that we use the same machine for calculations in this paper. It will be discussed that even achieving this bandwidth limit is not feasible. On the other hand, for the proposed RNN-based model, the bandwidth of a real-time performance is 65.70 Hz, which can be further improved by optimizing the number of layers and trainable parameters.
It is worth mentioning that the considered assumption is very strict, which cannot be satisfied in reality. First of all, conventional algorithms need a relatively high number of nodes to have numerical stability and an acceptable convergence rate. Furthermore, due to limitations in computation resources, more than one node will be assigned to each core of CPU, and there is always latency in communications between threads in parallel programmings. Therefore, reaching the mentioned bandwidth through the analytical dynamic model is infeasible. However, the real-time performance of the proposed RNN-based model can be applicable in closed-loop control applications.

Second Simulation: A Cylindrical Rod With External Wrenches
In the second example, we simulate a rod with a circular cross section, which is actuated by external forces such that its tip tracks a square in space. In this example, the goal is to model the behavior of the rod which results from applied external forces on its endeffector. For boundary conditions, the first node is fixed to the origin and its orientation is set to the identity for all time steps. The rod properties and simulation parameters, and the structure of the proposed model are given in Figure 4. The trajectory of the endeffector and the applied forces onto it are shown in Figure 5A.
160,001 position and force samples are generated from the analytical model for each node. We augment 1-by-3 position vectors of the last node (end-effector) and its adjacent node at each time step. Therefore, the augmentation results in a 1-by-6 vector. Furthermore, the size of history horizons is chosen to be 2 (η 1). Finally, augmented 2-by-6 tensors are obtained for each time steps which are fed to the model as the Input Layer I. The same preparation process are applied for the force data samples which are used as the Input Layer II. The prepared data-set is called Data-set II and 60% of it is used for training process. The architecture in Figure 4A shows the Input layers I and II consist of tensors of size (Batch Size × 2 × 6) and (Batch Size × 2 × 3), respectively. The first dimension of all layers are reserved for batch sizes and for the training, the batch size 1 was chosen. In the architecture of the model in Figure 4A, the Input, Hidden and Output Layers I and II together with the number of nodes and type of layers are demonstrated according to Figure 2B.
First, unseen data samples in Data-set II are employed to evaluate the model, and tip positions are calculated and the results are shown in Figure 5B. The maximum and mean absolute error are (3.58 mm, 1.70 mm), (1.80 mm, 0.69 mm), and (2.73 mm, 1.41 mm), or in other words, the maximum errors with respect to the length of the manipulator are 0.71%, 0.36%, and 0.54% in x, y, and z-axes, respectively. The RMS errors of the end-effector through Eq. 2 are shown in Figure 5C.
To evaluate the generalizability of the trained model, different profiles of forces are applied to the model aiming at obtaining different position trajectories for the end-effector as depicted in Figure 6A,D. To fulfill the goal of the second example, the new forces are only used for the evaluation of the trained model by predicting the positions of the end-effector. Results of the prediction are plotted in Figure 6B and are as follows: The maximum and mean absolute errors are (10.49 mm, 2.61 mm), (5.54 mm, 1.05 mm), and (5.97 mm, 2.83 mm), furthermore, the percentage of the maximum/worst-case errors with respect to the length of the manipulator are 1.90, 1, and 1.08 in the x, y, and z-axes, respectively. The RMS errors of the end-effector through Eq. 2 are shown in Figure 6C.
In this example, the maximum constant time step for this simulation is 10 − 4 s, to have a convergent numerical solver for the analytical dynamic model. In addition, on average, the time 1.89 × 10 − 4 s is required for solving the dynamics of each node. In other words, the analytical dynamic model can not achieve any real-time performance for this example. However, the proposed model achieves a real-time performance of the bandwidth of 60.30 Hz on average.

Third Simulation: A Cylindrical Rod With and Without External Wrenches
In the last example, we form a semi-circular shape with a cylindrical rod. A force is applied to the middle node-Node 51th-in the -y-axis direction for 0.5 s and then the force is removed. Furthermore, the boundary conditions are as follows: the first and last nodes are fixed to the origin and their orientations are set to the identity and R z (181.81°), respectively, where R z (θ) describes rotation around the z-axis by θ degrees. In this example, the idea is to model the behavior of the rod resulted from applied external forces and internal elastic energy. The structure of the proposed model is given in Figure 7A and the rod properties and simulation parameters are given in Figure 7B. The initial and a few time-evolved configurations together with the applied forces are given in Figure 8A. 20,001 position and force samples are generated from the analytical model for each node. We augment 1-by-3 position vectors of Node 15th and its adjacent nodes at each time step. Therefore, the augmentation results in a 1-by-9 vector. Furthermore, the size of history horizons is chosen to be 2 (η 1). Finally, augmented 2-by-9 tensors are obtained for The same preparation process are applied for the force data samples which are used as the Input Layer II. The prepared dataset is called Data-set III and 60% of the data is used for training process. The architecture in Figure 7A shows the Input layers I and II consist of tensors of size (Batch Size × 2 × 9) and (Batch Size × 2 × 3), respectively. The first dimension of all layers are reserved for batch sizes and for the training, the batch size 1 was chosen. In the architecture of the model in Figure 7A, the Input, Hidden and Output Layers I and II together with the number of nodes and type of layers are demonstrated according to Figure 2B. The positions of Node 51th are predicted using seen and unseen data samples in Data-set III and the results are shown in  1 mm, 1.2 mm), furthermore, the maximum/worst-case errors with respect to the length of the manipulator are 0.62%, 0.33%, and 1.35% in the x, y, and z-axes, respectively.
In this example, the maximum constant time step for this simulation is 10 − 4 s using the analytical model. In other words, the analytical model does not show a real-time performance since, on average, the time 2.22 × 10 − 4 s is required for solving the dynamics of each node. On the other hand, the proposed model can achieve a real-time performance of the bandwidth 58.13 Hz on average.

EXPERIMENTAL RESULTS
This section is devoted to the experimental validation of the presented model. To that end, we fabricated a soft manipulator on which magnetic fields are used to produce necessary forces and torques. Compared to the simulations in which positions are predicted, time-sequence input is composed of orientations of nodes in the experiment. Furthermore, to show the performance of the algorithm, results from the presented method and a Cosserat rod-based theoretical model are compared to show the efficiency of the proposed RNN-based model. The Cosserat rod model of the soft manipulator is detailed in Appendix.

Soft Continuum Manipulator
A soft continuum manipulator is fabricated from a urethane rubber Polymer Matrix Composite 770 (PMC-770, Smooth-On Inc., Macungie, United States) and neodymium (NdFeB) block magnets whose dimensions are given in Figure 9A. When the manipulator is subjected to an external magnetic field, the The PMC-770 has a density ρ 1000 kg/m 3 , Young modulus E 2.5 MPa, and Poisson ratio ] 0.5. The distal and proximal NdFeb magnets have grades N45 and N42, respectively. In addition, they have density ρ 7000 kg/m 3 , Young modulus E 41.4 GPa, and Poisson ratio ] 0.3. It should be pointed out that Young's modulus and densities of the soft manipulator constituent materials were determined using a combination of supplier data and experiments until theoretical results (predicted by the Cosserat rod model) would resemble the experiment results. The magnitude of the magnetic dipoles carried by the manipulator was calculated from the magnets volume and manufacturer-supplied residual flux density.

Experimental Setup
The experimental setup consists of 6 stationary electromagnets surrounding a spherical workspace of 100 mm diameter Sikorski et al. (2017). Figure 9B shows the setup of the experiment. In addition, the final shape of manipulator has been segmented and is shown in the workspace. The continuum manipulator is suspended horizontally (along x) in the workspace and actuated to move in a plane, steering the magnets by manipulating the magnetic field generated by the electromagnets. Orientations are represented using the axis-angle notation. Let k m ∈ R 3 and ϕ m ∈ R denote the axis-and angle-of-rotation, respectively, where m 1, 2 denotes the magnet index counting from the manipulator base. In the 2D experiment, k m y, and ϕ m is defined relative to z. Figure 9C represents the shape reconstruction of the soft manipulator through images coming from two Dalsa Genie Nano C1940 Red-Green-Blue (RGB) cameras (TeledyneDalsa, Waterloo, ON, Canada). The flexible PMC-770 and rigid NdFeb magnets were colored blue and red, respectively. The RGB cameras (horizontal and vertical) that formed a stereo vision setup recorded the workspace during experiments. First, we discretize the actuation workspace into voxels. The silhouette of the continuum manipulator is segmented as binary masks and the manipulator body represented as a 3D spatial point cloud. The manipulator centerline is approximated by N ∈ N discrete segments. A simple iterative shape reconstruction algorithm Sikorski et al. (2019) moves through the voxels to represent the manipulator centerline with N discrete points ({p 0 , p 1 , . . . , p N }) as a function of centerline parameter s ∈ [0, L]. To be specific, with knowledge of the RGB-camera frames, the points are projected onto each camera image. If a point is projected onto both binary masks, the point falls within the manipulator. This process is repeated for all voxels. Subsequently a 3D polynomial fit (P(s)) is made through the points. We assume that magnetically exerted forces and torques are insufficient for the manipulator extension along the centerline, and therefore assume constant positions of the magnets along the centerline, s m ∈ (0, L). The measured magnet position is thereafter obtained from the polynomial fit (p m P(s m )), and its orientation from the local gradient of the polynomial fit (z s P(s m )) relative to a reference z axis, where [·] ∧ represents a normalization. Furthermore, in the experiments performed for this study, camera occlusions did not occur. The magnetic torques and forces were computed from the magnets position p m , magnets dipole moments μ m ∈ R 3 , and electromagnet currents I C ∈ R 6 . The magnets position and orientations were obtained from the stereo vision setup. Afterwards, the orientations are used to compute magnets dipoles. To compute the magnetic field, each electromagnet is associated with a unit-current field and field gradient map (β i (p) ∈ R 3 and β ∇,i (p) ∈ R 3×3 , i 1, . . . , 6), which computes the unit-current contribution of the electromagnet to the field at field point p. We define a map G(β ∇,i ) : R 3×3 → R 5 which takes the five independent gradient terms of the field (Petruska and Nelson, 2015). The field (gradient) at magnet position p m is then given by the superposition principle B m β 1 p m , . . . , β 6 p m I C G B ∇,m G β ∇,1 p m , . . . , G β ∇,6 p m I C .
The torques and forces exerted on the magnets due to the field is given by where M(μ m ) : R 3 × R 3×5 represents a map of the field independent spatial gradients to forces on the dipole μ m (Petruska and Nelson, 2015). The applied magnetic forces and torques together with the initial and a few time-evolved configurations are shown in Figure 10.
For modeling, we consider three nodes located at the locations of the proximal and distal magnets, and the clamped end of the rod. It should be pointed out that the performance of the proposed RNN-based model, unlike conventional algorithms, is independent of the number of nodes considered for the whole manipulator. Therefore, it is sufficient to model points of interest. The idea is to independently manipulate each magnet (actuation point). However, the setup provides us with 8 degrees of freedom, meaning that positions and orientations (12 degrees of freedom) cannot be manipulated at the same time. Therefore, we carried out the experiment to achieve only orientation control. 669 1-by-3 position samples and 1-by-6 augmented wrench samples (i.e., [τ, f ] T ) for the both magnets are obtained. By choosing the size of history horizon 2 (η 1), the augmented 2-by-6 position tensors are reshaped for each time step and fed to the model as the Input Layer I. The same preparation process is applied for the force data samples which are used as the Input Layer II. The prepared data-set is called Data-set IV and 60% of the data is used for training process. We suggest the same model for both moving nodes and the architecture of the model is  Figure 11 which is the same for the proximal and distal nodes. The architecture shows the Input layers I and II consist of tensors of size (Batch Size × 2 × 6). The first dimension of all layers are reserved for batch sizes and for the training, the batch size 1 was chosen. In the architecture of the model in Figure 11, the Input, Hidden and Output Layers I and II together with the number of nodes and type of layers are demonstrated according to Figure 2B.

RESULTS
The distal and proximal node rotations are predicted both by Cosserat rod model and the proposed model, and the results are shown in Figure 12. Also, the maximum and mean absolute errors are stated in an ordered pair in Table 1.
The computation time required to find a solution of the manipulator statics from a Boundary Value Problem (BVP) with Cosserat rod theory depends on the quality of the initial solution guess, i.e., n(s) and m(s) at s 0, the tolerable error (E ∈ R), and the number of nodes (N ∈ N) used to discretize the manipulator.
A tolerable error describes the error between the distal internal forces and moments obtained from forward integration which are called n d f and m d f , and distal boundary condition, i.e., n d b and m d b . The tolerable error can be written as n d Decreasing the tolerable error increases the solution accuracy, but potentially requires more time to solve convex optimizations for the BVP. Increasing the number of nodes is necessary to describe complex manipulator geometries, but should be chosen to minimize the required steps during forward integration.
To visualize how the required computation time changes with the number of nodes and the tolerable error, multiple simulations were performed by assigning known torques τ m and forces f m for m 1, 2, to the manipulator, and finding a valid solution from solving the BVP. Changes to tolerable errors (E) and number of nodes (N) were made manually. For example, an error of 2% in the initial solution guess was obtained by multiplying the valid solution with 0.98. After each change the BVP was solved again fifty times. The obtained mean and standard deviation of the computation times are shown in Figure 13. By taking into account all the aforementioned variables, i.e., number of nodes and the tolerable error, the Cosserat rod model is capable of FIGURE 11 | The model architecture used for the experiment. There are two Input layers, the first one is the poses of the node and the second input is the applied forces on the proximal node. The first dimension of inputs and outputs in each layer are unspecified, and can vary with the size of batches.  Figure 13A shows how the computation time required for solving a solution to the BVP changes with decreasing tolerable error (E) and increasing percentage errors from a valid solution (at 0%), for a constant number of nodes. Also, Figure 13B shows how the computation time required for solving a solution to the BVP changes with an increasing number of nodes (N) and increasing percentage errors from a valid solution, for a constant tolerable error. However, It should be mentioned that the proposed RNN-based model shows a real-time performance with a bandwidth of 60.75 Hz on average for the given architecture in Figure 7, number of epochs 25, and batch size 1. In addition, Figure 14A demonstrates the computation bandwidth required for the prediction of the next step using the trained model with a different number of LSTM units and a different size of time history horizons in Data-set IV. The figure suggests that computation bandwidths are fairly unchanged with the number of LSTM units; however, increasing the length of time history reduces bandwidth. The optimal region maximizing the bandwidth is approximately with time history size in (2, 20) and LSTM unit size in (5, 15). Figure 14B suggests that RMS error of the prediction decreases by increasing the number of LSTM units and the optimal area minimizing RMS errors is approximately with time history size in (2, 20) and LSTM unit size in (20, 25).
To sum up, this experiment demonstrates that not only can the presented RNN-based model outperform classical modeling approaches such as the Cosserat rod model, but also it shows possibilities to use the model in practice for closed-loop control applications.

DISCUSSION
This work suggests a distributed architecture for modeling complex dynamical systems by using multiple light-weight RNN-based models. As a result, the architecture would be easier to design and debug, and also benefits from faster convergence compared to one large network. Furthermore, large networks may take longer times to be trained, and they may not show an acceptable performance and readjusting (hyper-)parameters and restarting the training process might be necessary.
Increasing the size of history horizons in training stages may reduce the error to some extent, but on the other hand, it makes the model slower. Based on conventional dynamical models, the length of the history size should be at least 2. To reach a state-ofthe-art performance, i.e., having less error and faster model simultaneously, one may prefer varied batch sizes in the training and run-time phases. As a suggestion, we can use different batch sizes for training and run-time stages. A model can be trained with appropriate batch sizes such that the model performance suits the given criteria. Afterward, one can create a new network with the pre-trained weights compiled with a batch size of 1.
The performance, i.e., the convergence and stability, of the presented algorithm in this paper, unlike conventional algorithms, is independent of the number of nodes considered for the whole manipulator. To be specific, in the analytical model, there might be a need for several discretization nodes to achieve a convergent solution with a specific tolerable error; however, in the RNN-based model, only specific points/points of interests (e.g., two actuation points in the experiment) are considered. In other words, in the experiment, 13 nodes (4 for each flexible subsection TABLE 1 | The maximum and mean absolute errors around the x, y, and z-axes in ordered pairs for the distal and proximal nodes.
x-axis y-axis z-axis

APPENDIX: COSSERAT ROD THEORY
The Cosserat rod model of the manipulator assigns a position p(s) ∈ R 3 , orientation quaternion q(s) (q r , q i ) ∈ R 4 , internal force n(s) ∈ R 3 , and internal moment m(s) ∈ R 3 to a material cross-section at centerline position s ∈ [0, L], where L ∈ R + is the length of the manipulator, giving a material state vector y [p T , q T , n T , m T ] T . A set of thirteen ordinary differential equations describe how the state vector evolves between centerline positions (Edelmann et al., 2017) p' R q v, where p' ≡ z s p, [·] × : R 3 → R 3×3 a mapping to a skewsymmetric matrix, R(q) ∈ SO(3) the rotation matrix associated with orientation quaternion q, K s , K b ∈ R 3×3 diagonal shear and bending stiffness matrices, I 3 ∈ R 3×3 a unit matrix, v(s) ∈ R 3 and u(s) ∈ R 3 the material strain and bending, and v(s) [0, 0, 1] T and u(s) [0, 0, 0] T the intrinsic material strain and curvature. External forces f (s) ∈ R 3 and torques τ(s) ∈ R 3 determine the shape of the manipulator (Antman, 1995;Rucker and Webster, 2011). The manipulator is subject to a distributed gravity force (not shown in Figure 9) and magnetically exerted distributed forces f m and torques τ m due to interaction of the magnets with the magnetic field B m , where m 1, 2 denotes the magnet index from the base of manipulator. Given the exerted torques and forces, the shape of the manipulator is solved as a Boundary Value Problem (BVP). The base of the manipulator is fixed to a rigid base with constant position p 0 and orientation q 0 , and its distal tip is free with constant internal force n d L and moment m d L . The proximal and distal boundary conditions are then formulated as follows, assuming no tip wrench, we have P(y 0 ) [p T 0 , q T 0 ] T and D(y L ) [n dT L , m dT L ] T 0. The BVP is solved with a forward integration using an explicit Runge-Kutta fourth order method, and convex optimization using Levenberg-Marquardt (Till et al., 2019). The unknown proximal state parameters ξ [n T 0 , m 0 ] T are guessed and subject to the optimization where N ∈ N is the number of discrete steps along s. Then y N are the manipulator distal state parameters obtained from the forward integration using an explicit Runge-Kutta fourth order method. The error between desired distal boundary condition D(y L ) and D(y N ) determines if the solution (ξ) is accepted.