Information Distribution in Multi-Robot Systems: Generic, Utility-Aware Optimization Middleware

This work addresses the problem of what information is worth sending in a multi-robot system under generic constraints, e.g., limited throughput or energy. Our decision method is based on Monte Carlo Tree Search. It is designed as a transparent middleware that can be integrated into existing systems to optimize communication among robots. Furthermore, we introduce techniques to reduce the decision space of this problem to further improve the performance. We evaluate our approach using a simulation study and demonstrate its feasibility in a real-world environment by realizing a proof of concept in ROS 2 on mobile robots.

MCTS is based on classic Monte Carlo methods, where the object of study (e.g., a continuous field) is randomly sampled in order to estimate some properties of this object. In the case of MCTS the object of study is a decision process and it is being randomly sampled by traversing a tree where each node represents a decision.  Figure 1. Example of the selection step. Node D was eventually selected, which represents a state in which message m 1 is sent, m 2 is dropped and the decision about the other messages is not yet made.

A.1 Base of operation
The following description explains the generic base of operation of the basic version of MCTS algorithm. The description is intertwined with an example that presents how this algorithm can be used in order to solve the problem of information distribution in MRS.
The MCTS algorithm starts with a tree consisting of a single root node. This node represents the state where none of the decisions are made.
EXAMPLE. Let us assume we have a set of 4 messages: {m 1 , m 2 , m 3 , m 4 } and we need to decide which ones of them are worth sending. We need to make 4 decisions: for each message we should decide if it should be sent or dropped. This state is represented by the root node.
The method involves running a number of simulations. Each simulation involves testing a different set of options for each decision and examining the outcomes. The more simulations, the better the estimation.
Then, for each simulation the algorithm performs four steps: Selection, Expansion, Simulation, and Backpropagation. The steps are described in the following subsections and summarized in pseudocode in algorithm 1. Additionally in section A.2 we provide references to the Python source code used in our experiments.

A.1.1 Selection
We start from a root node and traverse the tree down until a leaf node is reached. A leaf node is a node without any children. For each node we choose the child to go to based on the UCT formula (given in Eq. 3). The selection step is described with pseudocode in line 8 of algorithm 1 and an example of this step is presented in fig. 1. If in the leaf node there are still some unresolved decisions (e.g., messages for which we did not decide if they should be sent or not), in the expansion step we choose one of them and create as many children as there are options for the considered decision. In principle the decision could be chosen at random, but usually some heuristic is utilized to first consider decisions that are more significant. In our implementations the decisions (i.e., messages) are ordered by generation time to assure deterministic outcomes.

A.1.2 Expansion
Then, one of the newly created children is chosen randomly and considered in the next step. We will call this chosen child the expanded child.
The pseudocode for this step is provided in line 13 of algorithm 1.
EXAMPLE. This example is visualized in fig. 2. Let us assume we are in a leaf node D. There are still two messages for which the decision was not made: m 3 and m 4 generated at, respectively, times 3 s and 4 s. Message m 3 is generated earlier, so we will consider options related to it. We can either send it or not. Hence, we create two children nodes -one representing a situation when m 3 is sent (G), the other when it is not (F ). In both of these children the decision regarding message m 4 is still unknown.

A.1.3 Simulation
The goal of this step is to estimate the value of the expanded child form the previous step. In classic MCTS at this point all of the unresolved decisions will be made randomly. Then, the resulting state would have to be evaluated which would result in a numerical value being assigned to it (for instance, 1 for a won game and 0 for a lost game).
However, often the set of decisions might be big and hence even performing them randomly might be computationally intensive. An alternative approach to this step is to asses the value of a given state without considering future decisions. This assessment could be done, for instance, by utilizing domain-specific expert knowledge or be based on machine learning. This step is presented in line 21 of algorithm 1.
EXAMPLE. Let us assume node F was chosen as the expanded child. We can utilize some evaluation method in order to assess the situation in which message m 1 is sent and messages m 2 and m 3 are dropped. The method could, for instance, assume that bigger messages are more useful. As a result a numerical score should be assigned to node F .
The evaluation model utilized in our implementation is briefly summarized in Section 2.2 of the paper.

A.1.4 Backpropagation
Executing the simulation step provided a new information about the expanded node. This information needs to be propagated to all its ancestors in order to improve their estimated values. The ancestors of node F are marked in fig. 3 and the pseudocode of this procedure starts in line 23 of algorithm 1.  Figure 3. Example of the backpropagation step. All nodes that should be updated are marked with red color.
In order to make the final decision, we start from a root node and then always go to the child that was visited the most times. When the child that represents the decision we are interested in is reached, we have the result. This procedure is described with pseudocode in algorithm 2.

A.2 Implementation
In order to use MCTS in our experiments we have utilized a free and open-source Python implementation available on Github 1 . We have forked it and ported to Cython 2 in order to improve performance. Additionally, we changed the selection step to be deterministic in order to make our experiments reproducible.

B DERIVATION OF EQUATIONS FROM SECTION 3.2.1
F (i) is a function computing number of nodes at the i-th level of the tree. k is a number of messages generated in a window. Only r − 1 messages can be sent in one window. We number the tree starting with level 0 (i.e., in Figure 2 message m 1 is at level 0, m 2 at level 1, etc.). This means that level k represents the decision about k messages. Then, (1) The case i > k is a recursive function, so it is not too useful. Let us try to expand it: The result from above was given in the main text of this paper. Next, we stated that when r ≈ k 2 the number of nodes is approximately 2 i−1 . Because this is only an approximation, in order to simplify calculations, we consider only even k. The order of magnitude of the obtained result is not affected by this assumption. We expand the result above for r ≈ k 2 : Next we try to find out what will happen if the number of messages that can be sent is small compared to k: by applying the binomial theorem we can limit it to: Hence, if the number of messages that can be sent tends to 0 (only a few messages can be sent), the number of nodes tends to being lower than 2 i−k .