Monitoring weeder robots and anticipating their functioning by using advanced topological data analysis

The present paper aims at analyzing the topological content of the complex trajectories that weeder-autonomous robots follow in operation. We will prove that the topological descriptors of these trajectories are affected by the robot environment as well as by the robot state, with respect to maintenance operations. Topological Data Analysis will be used for extracting the trajectory descriptors, based on homology persistence. Then, appropriate metrics will be applied in order to compare that topological representation of the trajectories, for classifying them or for making efficient pattern recognition.


Introduction
Autonomous robots follow a number of rules introduced into their controllers [1,2,3]. However, when they interact with the environment, small variations may result in long-time unpredictable motion. This behavior is very usual in mechanics, characterizing systems exhibiting deterministic chaos.
In the practical case addressed in the present paper, a weeder robot (usually a float of them) is expected to cover a patch of a vineyard, in an optimal manner. Here, "optimal manner" refers to the path-line that allows covering the whole patch in a minimum time. However, the ground orography has a significant variability, as well as the location of the grapes. Robots are aimed at colliding the grape foots in order to remove the grass around, and then numerous collisions following different directions are needed to ensure that all the grass around the grape foot is adequately removed. Figure 1 depicts one of these robots considered in the present study in operational conditions. All the practical variability (ground, grape location, grass distribution and size, obstacles, ...) as well as the intrinsic sensibility of the dynamics to small variabilities in the physical and operational conditions, makes it impossible to define a deterministic robot trajectory. In these conditions, an almost random motion seems to be the most valuable alternative.
In practice, to avoid under-performances characteristic of fully random motions, that random motion operating at the local scale is combined with a more global deterministic planning that tries to better control the vineyard coverage by sequencing the operation at the different local patches covering the whole domain.
The present work does not aim at addressing such optimized operation conditions that will be addressed in a future publication under progress, but it aims at analyzing the data collected from a robot operating in different patches and arXiv:2108.08570v1 [cs.RO] 19 Aug 2021 under different conditions (with respect to the maintenance operations) in order to identify the existence of patterns able to identify the particular patch in which the robot operates, or to distinguish the different robot states with respect to the maintenance operations.
Having a sort of QR-code or identity card of each robot, when it operates within each patch, in a particular state (healthy or unhealthy), is of major relevance with respect to the predictive or operational maintenance of robots or floats of autonomous robots.
The present paper aims at analyzing the collected data in order to extract the maximum information that could serve for differentiating them, enabling unsupervised clustering and/or supervised classification, prior to any action concerning modeling using adapted regressions.

Methods
Using data clustering is almost straightforward, as soon as data is homogeneous and quantitatively expressible using integer or real numbers, enabling boolean or algebraic operations (addition, multiplication, ...) The interest of organizing data in groups, in a supervised or unsupervised manner, is that it is assumed that data belonging to a given group shares some qualities with the members of the group.
When proceeding in an unsupervised manner, the only information to group the data consists of the distance among them. Data that remain close to each other are expected to share some properties or behavior. This is the rationale considered in the very popular k-means technique [4,5]. However, the notion of proximity, leading to the derived concept of similarity, needs for the definition of a metric for comparison purposes. When data are well defined in a vector space, distances can be defined and data can be compared accordingly. In the case of supervised classification one is looking for the linear (or nonlinear) frontier separating the different groups on the basis of a quality or property that drives the data clustering. In this last case, the best frontier separating two groups of data is the one maximizing the distance of the available data to the frontier, in order to maximize the separation robustness. This is how support vector machine, SVM, works, for instance [6].
In both cases (supervised and unsupervised) the existence of a metric enabling data comparison is assumed. However, very often data could be much more complex, as for example when it concerns heterogeneous information, possibly categorial or qualitative. This is for example the case when a manufactured part is described by its identity card consisting of the name of the employee involved in the operation, the designation of the employed materials (some of them given by its commercial name), the temperature of the oven in which the part was cured and the processing time. In that case, comparing two parts becomes quite controversial if the employed metric is not properly defined. In these circumstances, usually, metrics are learned from the existing training data, as is the case when using decision trees (or its random forest counterpart) [7,8], code-to-vector [9] or neural networks [10].
The situation becomes even more extreme when data have a large and deep topology content. This is the case for example of time series or images of rich microstructures. These are usually encountered in material science when describing metamaterials (also called functional materials), or those exhibiting gradient of properties or mesoscopic architectures. Thus, even in nominal conditions, time series will differ if they are compared from their respective values at each time instant. That is, two time series, even when they describe the same system in similar conditions, never match perfectly. Thus, they differ even if they resemble in a certain metric that should be learned. For example, our electrocardiogram measured during two consecutive minutes will exhibit a resemblance, but certainly both of them are not identical, thus making a perfect match impossible. A small variation will create a misalignment needing for metrics less sensible to these effects. The same rationale applies when comparing two profiles of a rough surface, two images of a foam taken in two close locations, ... they exhibit a resemblance even if they do not perfectly match.
Thus, techniques aiming at aligning data were proposed. In the case of time-series, Dynamic Time Warping, DTW [11,12] has been successfully applied in many domains. The theory of optimal transport arose as a response to similar issues [13].
Another route consists of renouncing to align the data, and focussing on extracting the adequate, goal-oriented descriptors of these complex data, enabling comparison, clustering, classification and modeling (from nonlinear regressions).
A first possibility consists of extracting the main statistical descriptors of time series or images (moments, correlations, covariograms, ...) [14]. Sometimes, data expressed in the usual space and time domains, are transformed into other spaces where their manipulation is expected to be simpler, like Fourier, Laplace, DCT, Wavelet, ... descriptions of data. The most valuable (in the sense given later) descriptions seem to be those maximizing sparsity. These are widely considered when using compressed sensing [15], because it represents a compact, concise and complete way of representing data that seemed much more complex in the usual physical space (space and time).
The present work considers this last route, but uses a description based on the topology of data, described later, and successfully considered in our former works for addressing complex mesostructures [16], time-series [17], rough surfaces [18] and shapes [19], with the aim of classifying and also constructing robust regressions expressing properties or performance from the input data expressed from its topological description.
The present study, when compared with our former developments, addresses a new and complex purpose: how the topology contained in the trajectory that an autonomous robot follows in a cloudy environment (where interactions limits the predictability horizon) can inform on the robot location (which patch into the whole vineyard) or the robot state (with respect to maintenance operations).

Data description
In the study that follows, we consider a dataset consisting of the x and y-coordinates, calculated from the GPS longitudes and latitudes, representing the recorded position of the robot at time t: These coordinates span six different disjoint geographical patches within the whole vineyard, as illustrated in Figure  2, that have been recorded in a period of time T leading to the maps reported in Figure 3 that reflects the robot's trajectory.
Maintenance operations are also known and properly identified in the provided dataset. Thus, the dataset consists of a collection of n discrete, finite and compact two-dimensional trajectories S 1 , . . . , S n .

Geometrical Features
We are interested in extracting the geometrical and topological features of the trajectories in D across different scales. For that purpose, we introduce the so-called Rips filtration. We construct a Rips complex from simplices of varying dimensions that are generalizations of triangles of varying dimensions. More specifically, a d-simplex is the smallest Fig. 4. The so-called abstract simplicial complex is a finite collection of sets that is closed under the subset relation, i.e., if a ∈ A and b ⊂ a, then b ∈ A.
Let S be a trajectory, defined from a finite compact set of points in R 2 , and ≥ 0. The Rips complex of S at scale , R (S), is the abstract simplicial complex consisting of all subsets of diameter up to : where the diameter of a set of points is the maximum distance between any two points in the set.
Geometrically, we can construct the Rips complex by considering balls of radius 2 , centered at each point in S. Whenever d balls have pairwise intersections, we add a d − 1 dimensional simplex. An example of Rips complex is given in Fig. 5.   A filtration of a simplicial complex K is a nested sequence of subcomplexes starting at the empty set and ending with the full simplicial complex ∅ ⊂ K 0 ⊂ · · · ⊂ K.
By varying the value of the scale parameter , from min = 0 to max = diam(S) we get a family of nested Rips complexes known as the Rips filtration.

Persistent homology
In order to have a more exhaustive view on how the features are changing across different scales, the appearance and disappearance of each feature within the filtration is tracked and coded into the homology groups H k (S), where k is the homology dimension. The elements of a Homology Group H k (S) are classes of chain of simplices ("packets") in the Rips complex. The use of homology groups allows us to perform algebraic operations over the simplicial elements. The homology group H 0 (S) represents the vertices, while the homology group H 1 (S) represents the cycles (loops) formed in the simplicial complex. Since our data is in R 2 we are only interested in k = 0 and k = 1.
Given a homology group, we can now define how to track the appearance of the features across different scales, by defining the homology group at a scale , H k (S). It represents the classes of simplices as described previously, but taken from R (S). That is, the elements of R (S) with a filtration value lower than . This approach is known as the persistent homology. It allows to quantify the appearance and disappearance of the features across the different scales (discretized by considering m values related to j , j = 0, ..., m) : • For H 0 (S), the birth scale of all vertices is set to zero, while the death scale is the filtration value at which the vertex has been joined to another one by a segment.
• For H 1 (S), the birth scale of a cycle is the filtration value at which a loop has been formed, while the death scale is the filtration value at which the interior of the loop has been covered.
We can formalize this as follows: • The birth scale b γ of the feature γ b γ = min The persistence of the features throughout the scales can then be represented by the so-called persistence barcode of S. It is a histogram, where the bar associated to each feature starts at the birth scale and ends at the death scale. An example of persistent homology computation is given with the rips complex in Fig. 6, and the associated barcode in Fig. 7. A loop is formed at = 0.9 (birth) and then covered at = 1.8 (death). It is represented by the red bar. A more compact representation of the features persistence is the persistence diagram of S, defined from where b γ and d γ are the birth and death scales associated to the feature γ. In what follows, in the trajectories analysis, we only consider one-dimensional features, i.e., k = 1.
The persistence diagram associated with the Rips complex shown in Fig. 6 is given in Fig. 8. An equivalent representation of the persistence diagram consists in the so-called life-time diagram of S, which is constructed by means of a  In order to use the persistence features in a machine learning approach, we construct the so-called persistent image of S. First, observe that LT (S) is a finite set of p points, {(a 1 , b 1 − a 1 ), . . . , (a p , b p − a p )}, Then, consider a non-negative weighting function given by Finally, we fix M , a natural number, and take a bivariate normal distribution g u (x, y) centered at each point u ∈ LT (S) with a variance σI 2 = bp−ap M I 2 (I 2 is the 2 × 2 identity matrix). A persistence kernel is then defined according to: y).
We associate to a robot trajectory S ∈ R 2 a matrix in R M ×M as follows: let δ > 0 be a non-negative, small enough real number, and then consider a squared region covering the support of ρ S (x, y) up to a certain precision δ, such that Then, we consider two uniform partitions of the intervals Finally, we express Ω S,δ from The persistence image of S associated with the partition P = {P ij } is then described by the R M ×M matrix with elements: An example of persistence computation for a given trajectory is given in Fig.9.

Measuring persistence similarity
Consider two data sets S u and S v representing two trajectories. A matching between two persistence diagrams, PD(S u ) and PD(S v ), is a map ψ, that reads: The map ψ associates each feature from PD(S u ) to a feature from PD(S v ). The optimal matching between PD(S u ) and PD(S v ) is a matchingψψ : PD(S u ) −→ PD(S v ), minimizing the transport cost C to move the features from PD(S u ) to PD(S v ): Then, to measure the degree of similarity between two trajectories S u and S v we consider the Wasserstein distance [13,20] between PD(S u ) and PD(S v ) whereψ is the optimal matching between PD(S u ) and PD(S v ).
An example of matching between the persistence diagrams of two trajectories is given in Fig. 10.

Barycenters of persistence diagrams
Consider now a collection S 1 . . . S n of trajectories with their associated diagrams PD 1 . . . PD n .
Since the space of persistence diagrams equipped with the Wasserstein distance, the Wasserstein space, is not a linear space, the notion of barycenters [21] can be extended for the persistence diagrams using the so-called Frechet mean [22], which always exists in the context of averaging finitely many diagrams.
The Frechet mean of PD 1 . . . PD n is any diagram minimizing the map The computation of the barycenter µ has proven to be challenging, and multiple approaches can be used, such as the Sinkhorn algorithm [23]. We will use the one based on the Hungarian algorithm presented in [22] and consider Partial Optimal Matchings [24], as the diagrams may not be of the same size. In this case, points from the diagonal are matched with the remaining (exceeding) points. 4. Ifμ minimizes E, returnμ. Otherwise, update µ =μ and go back to 2.
An example of a barycenter of three persistence diagrams is given in Fig. 11.

Classification
Image classification is a procedure that is used to automatically categorize images into classes by assigning to each image a label representative of its class. A supervised classification algorithm requires a training sample for each class, that is, a collection of data points whose class of interest is known. Labels are assigned to each class of interest. The classification problem applied to a new observation (data) is thus based on how close a new point is to each training sample. The Euclidean distance is the most common metrics used in low-dimensional datasets. The training samples are representative of the known classes of interest to the analyst. In order to classify the persistence images, we considered the logistic regression algorithm. The training of the L 2 -penalized logistic regression binary classifier is then the minimization of a cost function as described in the following optimization problem: Here ω are the weights we optimize over, c a Bernouilli mean vector of the weights, and C an inverse regularization parameter. Once trained, the model is evaluated on a unseen set of flattened persistence images. The metrics used for the model evaluation is the Accuracy Score defined as the number of correct predictions over the number of samples. 3 Results

Determination of the patch in which the robot is located
We first want to predict whether a robot is in a certain patch. For that purpose we choose one parcel as a target, and train a classification model as described in Section 2.6. The complete dataset consists of daily trajectories for 240 days. For each day a persistence image is computed, that will be used as input for the model (a sample is depicted in Figure 9).
The samples are labelled according to the considered patch, 1 if the robot is in the target patch, and 0 otherwise. The dataset is split into 65% for training and 35% for testing. The proposed classifier achieves an 80% accuracy score in predicting the patch at which the robot is, based on the persistence images.

Maintenance prediction
Then, we consider daily trajectories in the same patch, consisting of 50 samples. For each day, a persistence image is computed, that will be used as input in the classifier. The periods considered here are the ones in between two consecutive maintenance operations of the robot. The samples are labelled 0 if they are associated to a day before the maintenance date, 1 otherwise. The dataset is split into 65% for training and 35% for testing. The model achieves a 90% accuracy score predicting the period associated to the the sampled trajectories, proving that robot trajectories exhibit a topological pattern when maintenance applies, fact that could be used for predictive maintenance purposes. Figure 12 depicts the Wasserstein distance between the persistence diagrams for consecutive daily trajectories, with the maintenance operation emphasized in red, whereas Fig. 13 shows the barycenters of each period between consecutive maintenance operations. As it can be noticed from the persistence images in Fig. 13, maintenance operations affect the topology of the trajectory, as it was expected from the fact that classification performs successfully as just reported.
To better support our hypothesis about the effect of maintenance on the trajectory topology, we consider the first operation interval, the one before the first maintenance, that correspond to the first persistence image in Fig. 13 (left), and divide it in two parts with identical length. Then, the associated barycenters in both half intervals are obtained. Both are represented in Fig. 14. As it can be noticed, both of them resemble very much to the one associated to the whole interval (the first picture in Fig. 13), all them (both in Fig. 14 and the first in Fig. 13) are significantly different to the second image in Fig. 13 that represents the trajectory topology after the first maintenance operation. These results support again our assumption on the effect of maintenance on the trajectory topology.

Conclusions
The characterization of the trajectories followed by the robot based on the geographical location proves to be a reliable method to differentiate between different environments affecting the robot motion. Then, over a single patch, the classification was proved being efficient to detect the changes in the robot signature related to maintenance events.
The proposed topology-based framework for sampled trajectories seems a very pertinent, powerful and intrinsic way of quantifying, characterizing and analyzing the topological and geometrical nature of the robot's pathways. The strength of the framework relies on both the topology description of the trajectory at multiple scales, and the use of metrics features that can be combined with machine learning.